FMS 2014 Formal Methods for Security Proceedings of the Formal Methods for Security Workshop A Satellite Event of Petri Nets 2014 Proceedings Editors: Véronique Cortier, CNRS Researcher at LORIA, Lorraine University, France Riadh Robbana, Professor in LIP2 & INSAT, University of Carthage, Tunisia Tunis, Tunisia June 23 rd , 2014 Preface This volume contains the proceedings of the 5th Workshop on Formal Methods for Security (FMS) held in Tunis, Tunisia, June 23, 2014 as satellite event of Petri Nets 2014. FMS is dedicated to the advancement of the theory and practice of formal methods for security. We received 9 submissions. These submissions went through a rigorous review process; each submission was reviewed by at least 3 Program Committee members. The Program Committee members have selected 5 papers for publication. In addition to the five selected papers, two more papers were selected for an informal presentation at the workshop. We also have the chance to welcome one invited talk by Mark Ryan « Du-Vote: Remote Electronic Voting with Untrusted Computers » (joint work with Gurchetan Grewal, Michael Clarkson, Liqun Chen). We hope you will enjoy the FMS 2014 edition! Véronique Cortier and Riadh Robbana FMS 2014 Program Committee Program Committee Myrto Arapinis University of Birmingham Kamel Barkaoui Cedric-Cnam Narjes Ben Rajeb LIP2 - INSAT - University of Carthage Vincent Cheval School of Computer Science, University of Birmingham Stephen Chong Harvard University Veronique Cortier CNRS, Loria Stephanie Delaune CNRS, LSV Susanna Donatelli Dipartimento di Informatica, Universita’ di Torino Sibylle Froeschle University of Oldenburg Pierre-Cyrille Heam LSV, ENS Cachan, INRIA-CNRS Béchir Ktari Laval University Yassine Lakhnech VERIMAG Mahjoub Langar LIP2 - ENIT - Elmanar University Mohamed Mejri laval Riadh Robbana LIP2 - INSAT - University of Carthage Hassen Saidi SRI International Lilia Sfaxi INSAT - University of Carthage Jacques Traore Orange Labs 1 FMS 2014 Additional Reviewers Additional Reviewers Berrima, Mouhebeddine University of Sousse Yu, Jiangshan University of Birmingham 2 FMS 2014 Table of Contents Table of Contents Du-Vote: Remote Electronic Voting with Untrusted Computers(Invited Talk) . . . . . . . . . . . . . . . . . . . . . 4 Mark Ryan Extraction of Insider Attack Scenarios from a Formal Information System Modeling . . . . . . . . . . . . . . 5 Amira Radhouani, Akram Idani, Yves Ledru and Narjes Ben Rajeb Traffic Analysis of Web Browsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Sami Zhioua and Mahjoub Langar Secrecy by Witness-Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Mohamed Mejri, Jaouhar Fattahi and Hanane Houmani Formal Enforcement of Security Policies on Choreographed Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Mahjoub Langar and Karim Dahmani A Timestamping Scheme with Eternal Security in the Bounded Storage Model . . . . . . . . . . . . . . . . . . . 68 Assia Ben Shil and Kaouther Blibech Sinaoui 3 Du-Vote: Remote Electronic Voting with Untrusted Computers (Invited Talk) Mark Ryan, Gurchetan Grewal, Michael Clarkson, and Liqun Chen University of Birmingham, UK m.d.ryan@cs.bham.ac.uk Abstract. Du-Vote is a new remote electronic voting protocol that elim- inates the often-required assumption that voters trust general-purpose computers. Trust is distributed in Du-Vote between a simple hardware token issued to the voter, the voters’s computer, and a server run by elec- tion authorities. Verifiability is guaranteed with statistically high prob- ability even if all these machines are controlled by the adversary, and privacy is guaranteed as long as at least either the voter’s computer or the server is not controlled by the adversary. The design of the Du-Vote protocol is presented in this paper. A new non-interactive zero-knowledge proof is employed to verify the server’s computations. The security of the protocol is analyzed to determine the extent to which, when components of the system are malicious, privacy and verifiability are maintained. Keywords: du-vote, electronic voting protocol 4 Extraction of Insider Attack Scenarios from a Formal Information System Modeling Amira Radhouani1,2,3,5 , Akram Idani1,2 , Yves Ledru1,2 , Narjes Ben Rajeb3,4 1 Univ. of Grenoble Alpes, LIG, F-38000 Grenoble, France 2 CNRS, LIG, F-38000 Grenoble, France 3 LIP2-LR99ES18, 2092, Tunis, Tunisia 4 INSAT - Carthage University, Tunisia 5 FST - Tunis-El Manar University, Tunisia Abstract. The early detection of potential threats during the modelling phase of a Secure Information System is required because it favours the design of a robust access control policy and the prevention of malicious behaviours during the system execution. This paper deals with internal attacks which can be made by people inside the organization. Such at- tacks are difficult to find because insiders have authorized system access and also may be familiar with system policies and procedures. We are in- terested in finding attacks which conform to the access control policy, but lead to unwanted states. These attacks are favoured by policies involving authorization constraints, which grant or deny access depending on the evolution of the functional Information System state. In this context, we propose to model functional requirements and their Role Based Access Control (RBAC) policies using B machines and then to formally reason on both models. In order to extract insider attack scenarios from these B specifications our approach first investigates symbolic behaviours. The use of a model-checking tool allows to exhibit, from a symbolic behaviour, an observable concrete sequence of operations that can be followed by an attacker. In this paper, we show how this combination of symbolic execution and model-checking allows to find out such insider attack sce- narios. keywords: Information System, B-Method, RBAC, attack scenario, Model Checking, Symbolic Search. 1 Introduction Developing secure Information Systems remains an active research area address- ing a wide range of challenges mostly interested in how to prevent from external attacks such as intrusion, code injection, denial of service, identity fraud, etc. Insider attacks are less addressed despite they may cause much more damage because an insider is over all a trusted entity. Intrinsically it is given means to violate a security policy, either by using legitimate access, or by obtaining unauthorized access. This paper deals especially with Role Based Access Con- trol (RBAC) concerns with the aim to exhibit potential insider threats from a formal modelling of secure Information Systems. We are interested in finding 5 attacks which conform to the access control policy, but lead to unwanted states. These attacks are favoured by policies involving authorization constraints, which grant or deny access depending on the evolution of the functional Information System state. This reveals, on the one hand, the need to link the security model to the functional model of the information system, and on the other hand, to build tools taking into account the dynamic evolution of the IS state. Tools such as SecureMova [2] and USE [6] are dedicated to validate security policies related to a functional model. But these tools don’t take into account dynamic evolution of the functional state. In [10], we discussed shortcomings of existing approaches in this context, and showed the advantages of using a formal specification assisted by animation tools. This paper goes a step further than our previous works by taking advantage of model-checking and proof tools in order to automatically find insider attack scenarios composed of a sequence of actions modifying the functional state and breaking the authorization constraint. This paper is organized as follows: section 2 gives an overview of our approach and its underlying methodology. In section 3 we present a simple example that illustrates our contribution. Section 4 defines semantics and technical aspects. In section 5 we propose a symbolic search that automates generation of attack scenarios and we discuss results of its application on the given example. Finally, we draw conclusions and perspectives. 2 Overall Approach Bridging the gap between formal (e.g. Z, B, VDM . . .) techniques and graphical languages such as UML has been a challenge since several years. On the one hand, formal techniques allow automatic reasoning assisted by proof and model- checking tools, and on the other hand, graphical techniques allow visualization and better understanding of the system structure. These complementary aspects are useful to ensure a software development process based on notations with pre- cise syntax and semantics and which allows to structure a system graphically. Most existing research works [1, 5, 7, 13] in this context have been focused only on modelling and validation of functional aspects which are initially described by various kinds of UML diagrams (class, state/transition, sequence, . . .) and then translated into a formal specification. These works have shown the inter- est of linking formal and graphical paradigms and also the feasibility of such translations. In our work, we adopt a similar approach in order to graphically model and formally reason on both functional and security models. We developed the B4MSecure6 platform [9] in order to translate a UML class diagram associated to a SecureUML model into B specifications. The resulting B specifications il- lustrated in figure 1 follow the separation of concerns principles in order to be able to validate both models separately and then validate their interactions. The functional B model on the left hand side of figure 1 is issued from a con- ceptual class diagram. It integrates all basic operations generated automatically 6 http://b4msecure.forge.imag.fr/ 6 (constructors, destructors, setters, getters, . . .) and also additional user-defined operations which are integrated into the graphical model and specified using the B syntax. This functional specification can be further improved by adding invariants and carrying out proof of correction with the help of AtelierB prover. Fig. 1. Validation of functional and security models The security model, on the right hand side of figure 1 is dedicated to control the access to functional operations with respect to access control rules defined in the SecureUML model. In our approach, we don’t deal with administration operations because we make the simplifying assumption that access control rules don’t evolve during the system execution. The security formal model allows to validate RBAC well-formedness rules such as no role hierarchy cycles, and separation of duty properties (SoD) such as assignment of conflicting roles to users. . . This paper assumes that validation of both models in isolation is done: op- erations of functional model don’t violate invariant properties, and the security model is robust. Such validation activities are widely discussed in the literature [12]. However, currently available validation approaches do not take sufficiently into account interactions between both models which result from the fact that constraints expressed in the security model also refer to information of the func- tional model. In fact, security policies often depend on dynamic properties based on the functional system state. For example, a bank customer may transfer funds from his account, but if the amount is greater than some limit the transfer must be approved by his account manager. Access control decisions depend then on the satisfaction of authorization constraints in the current system state. Dy- namic evolution of the functional state impacts these constraints and may lead to a security vulnerability if it opens an unexpected access. In this paper we use validation tools (prover and model-checker) in order to search for malicious sequences of operations by analysing authorization constraints. 3 A Simple Example In this section we use a running example issued from [2] and which deals with a SecureUML model associated to a functional UML class diagram. 7 3.1 Functional Model The functional UML class diagram (presented in figure 2) describes a meeting scheduler dedicated to manage data about two entities: Persons and Meetings. Fig. 2. Functional model of meeting scheduler system A meeting has one and only one owner (association MeetingOwner), a list of participants (association MeetingParticipants), a duration, and a starting date. A person can be the owner of several meetings and may participate to several meetings. Operations notify and cancel are user-defined, and allow respectively to send messages to participants and to delete a meeting after notifying their participants by e-mail. Constructors, setters and getters are implicitly defined for both classes and both associations. 3.2 Access Control Rules The access control model is given in Figure 3. It features three different roles: – SystemUser: defines persons who are registered on the system and then have permission UserMeetingPerm which allow them to create and read meetings. Deletion and modification of meetings (including operation cancel) are granted to system users by means of permission OwnerMeetingPerm, featuring an authorization constraint checking that the user who tries to run these actions is the meeting owner. – Supervisor: defines system users with more privileges because they can run actions notify and cancel on any meeting even if they are not owners. – SystemAdministrator: having a full access on entity Person, an admin- istrator manages system users. Full access grants him the right to create a new person, remove or modify an existing one. Furthermore, a system administrator has only a read access on meetings. 3.3 Validation This example is intended to be validated in [2] based on a set of static queries that query a given system state in order to grasp some useful information like “which user can perform an action on a concrete resource in a given state”. Authorization constraint associated to OwnerMeetingPerm requires in- formation from the functional model because it deals with the MeetingOwner 8 Fig. 3. Security model of meeting scheduler system association. In the rest of this article, we consider three users John, Alice and Bob such that user assignments are as defined by figure 4 and a given initial state in which Alice is owner of meeting m1 , Bob is a participant of m1 . In such a state, the above static query establishes that only Alice is allowed to modify or delete m1 because she is the owner of m1 . Fig. 4. Users assignement In [10, 9] a dynamic analysis approach based on animation of a formal speci- fication showed that validation should not only be based on a given static state, but should search for sequences of actions modifying this state and breaking the authorization constraint. For example, starting from the above state, a static query would only report that John, and also Bob, can’t modify m1 because none 9 of them satisfies the authorization constraint. A dynamic analysis would ask if there exists a sequence of operations enabled by John, or Bob, that allows them to modify m1 . This paper contributes towards automatically finding these ma- licious sequences. To perform these analysis, we applied the B4MSecure tool to the UML and SecureUML diagrams and generated a B specification counting 946 lines. This tool generates automatically a specification for all basic func- tional operations, which is enriched manually by some user-defined operations (i.e. cancel, notify). 4 Proposed Approach 4.1 Trace Semantics for B Specifications In order to find malicious behaviours of an operational secure IS modelling, we rely on the set of finite observable traces of our B specifications. Indeed, B specifications can be approached by means of a trace semantics composed of an initialization substitution init, a set of operations O and a set of state variables V. We note val a possible state predicate allowed by the invariant and op an operation from O. A functional behaviour is an observable sequence Q Q= ˆ init ; op1 ; op2 ; . . . ; opm such that ’i.(i œ 1..m ∆ opi œ O) and there exists a sequence S of state predicates which does not violate invariant properties: S = ˆ val0 ; val1 ; . . . ; valm in which val0 is an initial state, and opi is enabled from state vali≠1 and state vali is reached by opi , starting from state vali≠1 . The security model filters functional behaviours by analysing access control premises which are triplets (u, R, c) where u is a user, R is a set of possible roles assigned to u, and c is an authorization constraint. An observable secure behaviour is a sequence Q, where for every step i, premise (ui , Ri , ci ) is valid (expressed as (ui , Ri , ci ) |= true). This means that roles Ri activated by user ui grant him the right of running operation opi and if a constraint ci exists, then it must be satisfied. The following premises sequence P must be valid for Q: P = ˆ (u1 , R1 , c1 ) ; (u2 , R2 , c2 ) ; . . . ; (um , Rm , cm ) 4.2 Tools to Exhibit Behaviours from B Specifications Model-checking and symbolic proof techniques are of interest in order to exhibit a relevant behaviour from an operational B specification. Proof techniques deal with infinite systems and can prove constraint satisfiability, or establish that some operation can be enabled from an abstract state predicate. Model-checking is based on model exploration of finite systems, and can be used to find a sequence of actions leading to a given state or property. In our approach, we combine both techniques in order to overcome their shortcomings: complexity of proofs for the first one, and state explosion for the second one. In this sub-section, we illustrate both tools. 10 Model checking and animation (the ProB tool). ProB [11] is an animation and a model-checker of B specifications that explores the concrete state space of the specification and generates accessibility graphs. Then, every predicate vali (where i œ {0, 1, . . . , m}) of sequence S is a valuation of variables issued from V. For example, starting from an initial state val0 where: V = {person, meeting, meetingOwner, meetingP articipants} and such that: val0 = ˆ person = ÿ · meeting = ÿ · meetingOwner = ÿ · meetingP articipant = ÿ and having O = {personN ew, meetingN ew, meetingAddP articipants, . . .}, the scenario of table 1 is successfully animated using ProB tool. Column “reached states” gives only modified B variables from the previous step. step Sequence Q Reached states S RBAC premises P John 1 personNew person={Alice} SystemAdministrator no constraint John 2 personNew person={Alice, Bob} SystemAdministrator no constraint Alice meeting={m1 } 3 meetingNew SystemUser meetingOwner={(Alice, m1 )} no constraint Alice SystemUser 4 meetingAddParticipants meetingParticipants={(m1 , Bob)} Constraint: Alice is the owner of m1 Table 1. animation of a normal scenario with ProB In step 1, the tool animates operation personN ew which modifies variable person (initially equal to emptyset) and this action was performed by user John using role SystemAdministrator without need of authorization constraint. In step 4, the tool adds participant Bob to the meeting m1 by animating operation meetingAddParticipants, after validating that authorization constraint is valid for Alice using role SystemUser. Indeed, Alice is the owner of m1 . Symbolic proof (the GeneSyst tool). ProB is useful to animate scenarios identi- fied during requirements analysis, or to exhaustively explore a finite subset of state space. As we are interested in finding malicious scenarios that exhibit a potential internal attack, the ProB technique may be useful only if it explores the right subset of state space in the right direction, which is not obvious for infinite systems. Symbolic proof techniques, such as that of GeneSyst [4], are more interesting because they allow to produce symbolic transition systems that represent a potentially infinite set of values. Such tools reason on the reachability 11 properties of a symbolic state F by some operation op from a symbolic state E. In [4], three reachability properties are defined in terms of the following proof obligations, where E and F are two disjoint state predicates: (1) possibly reached: E · P re(op) ∆ ¬[Action(op)]¬F (2) not reachable: E · P re(op) ∆ [Action(op)]¬F (3) always reached: E · P re(op) ∆ [Action(op)]F In the generalized substitution theory, formula [S]R means that substitution S always establishes predicate R, and ¬[S]¬R means that substitution S may establish predicate R. Hence, proof (1) means that state F can be reached by actions of operation op, when operation precondition is true in state E. Proof (2) means that F is never reached by actions of operation op from state E. Finally, proof (3) means that F is always reached by op from E. Let us consider, for example, the functional operation meetingN ew: meetingN ew(m, p)= ˆ P RE m ”œ meeting · p œ person T HEN meeting := meeting fi {m} || meetingOwner := meetingOwner fi {(m ‘æ p)} EN D This operation adds a new meeting m and links it to an owner p. If we define states E and F such that: E=ˆ meetingOwner[{m1 }] = ÿ F = ˆ meetingOwner[{m1 }] ”= ÿ then proof obligation produced by GeneSyst for property (1) was successfully proved showing that operation meetingN ew when enabled from a state where m1 does not exist and there exists at least one person in the system, may lead to a state where m1 is created and has an owner. Our work will be focused on proof (1) which states the reachability of a target state from an initial one, illustrated above, because it is sufficient to decide wether an operation is potentially useful for a malicious behaviour. Proofs (2) and (3) can be used if one would like to assume that a state can never be reached, or it is always reached, by an operation. 4.3 Malicious Behaviour Based on the security requirements, several operations are identified as criti- cal. For example, security requirements have identified the integrity of meeting information as critical. Therefore, operations which perform unauthorized mod- ifications are identified as critical. A malicious behaviour executed by a user u, regarding authorization con- straints, is an observable secure behaviour Q with m steps such that: 12 – opm is a critical operation to which an authorization constraint cm is asso- ciated. – user u is malicious and would like to run opm by misusing his roles Ru . – val0 : is an initial state where (u, Ru , cm ) |= f alse – for every step i (i œ 1..m) premise (u, Ru , ci ) |= true In other words, malicious user u is not initially allowed to execute a critical operation, but he is able to run a sequence of operations leading to a state from which he can execute this operation. In our investigation we suppose that user u executes this malicious sequence without collusion. This problem will be tackled in a further work. Section 3.3 gave an example where neither Bob nor John are allowed to run a modification operation, such as meetingSetStart which modifies attributes of class Meeting, from the initial state due to the authorization constraint. This initial state is: val0 = ˆ person = {Alice, Bob} · meeting = {m1 } · meetingOwner = {(Alice ‘æ m1 )} · meetingP articipant = {(m1 ‘æ Bob)} In the following, we denote as init0 the sequence of operations leading to val0 such as that presented in table 1. We used the model-checking facility of ProB in order to explore exhaustively the state space and automatically find a path starting from val0 and leading to a state where operation meetingSetStart becomes permitted to John. We asked ProB to find a sequence where John becomes the owner of m1 : meetingOwner(m1 ) = John After exploring more than 1000 states, ProB found a scenario in which John ex- ecutes sequentially operations personNew, personAddMeetingOwner and meet- ingSetStart. Indeed, this dynamic analysis showed that John, as a system ad- ministrator, has a full access to entity Person. This permission allows him to create, modify, read and delete any instance of class Person. First, he creates an instance John of class Person that corresponds to him by running opera- tion personN ew(John). Then he adds meeting m1 to the set of meetings owned by John, by running operation personAddM eetingOwner(John, m1 ) which is a basic modification operation of class Person. These two actions allowed him to become the owner of m1 and then he was able to modify the meeting of Alice. Like all model-checking techniques, when ProB explores exhaustively the state space, it faces the combinatorial explosion problem which depends on the number of operations provided to the tool and the state space size. In order to address this problem, our approach proposes a symbolic search which finds a sequence of potentially useful operations on which the model-checker should be focused. 13 5 Symbolic Search The proposed symbolic search is performed by an algorithm that looks for an observable sequence Q = ˆ init0 ; op1 ; . . . ; opm executed by a user u, and where (u, Ru , cm ) is not valid for a critical operation opm in the initial state val0 but becomes valid for state valm≠1 where opm can be enabled. It is a backward search algorithm, starting from the goal state valm≠1 from which the critical operation opm can be enabled: valm≠1 = ˆ cm · P re(opm ); and working backwards until the initial state val0 is encountered. The algorithm ends when sequence Q is found or when all operations are verified without encountering the initial state. We consider that val0 is a completely valuated state such as that where Alice is the owner of m1 , and Bob is a participant to m1 . This prevents the initial state from being included in both states valm≠1 and valm≠2 , which would never verify the condition of the while loop. Note that each operation occurs at most once in a computed sequence, which ensures the termination of our algorithm. 1. Q =ˆ opm ; 2. valm≠1 =ˆ cm · P re(opm ); 3. valm≠2 =ˆ ¬valm≠1 ; 4. while val0 ”∆ valm≠1 do 5. choose any oi œ O where 6. (u, Ru , ci ) |= true · 7. valm≠2 · P re(oi ) ∆ ¬[Action(oi )]¬valm≠1 8. do 9. Q= ˆ oi ; Q ; 10. valm≠1 = ˆ valm≠2 · P re(oi ); 11. valm≠2 = ˆ valm≠2 · ¬P re(oi ); 12. else 13. raise exception: No sequence found 14. enddo 15. endwhile 16. Q =ˆ init ; Q ; 5.1 Step by Step Illustration We take advantage of abstraction and step by step we refine the valm≠2 symbolic state: 1. At the first step of the algorithm, the state space is represented by two sym- bolic states: the first one valm≠1 includes all states where the authorization constraint cm is true and which are enabling opm , and the second one valm≠2 is the negation of valm≠1 which is then ¬cm ‚ ¬P re(opm ). As they are two disjoint state predicates, we conduct proof (1) in order to find an operation oi that belongs to O and which possibly reaches the first state valm≠1 from the second one valm≠2 and such that premise (u, Ru , ci ) is valid. If oi does not exist, then no sequence could be found for the expected attack and we 14 can try proof (2) for each operation attesting that all operations never reach valm≠1 from valm≠2 . 2. At the second step of the algorithm, if the proof (1) succeeds for some oper- ation opm≠1 , then it may exist an observable sequence leading to the critical operation where access control premise (u, Ru , cm ) is valid, and hence a po- tential symbolic attack scenario can be found. The algorithm looks inside state valm≠2 in order to find out the previous operations that can be in- voked in the attack scenario. State valm≠2 is partitioned into two sub-states which are: valm≠2 · P re(opm≠1 ) © ¬(cm · P re(opm )) · P re(opm≠1 ) valm≠2 · ¬P re(opm≠1 ) © ¬(cm · P re(opm )) · ¬P re(opm≠1 ) Then, we look for operations that reach the first sub-state from the second one. 3. The algorithm proceeds iteratively by partitioning the second state into two sub-states until it finds a state that includes the initial state. In the best case, our algorithm gives some symbolic attack scenario, which consists of sequence (init0 ; opn ; opn+1 ; . . . ; opm ) invoked by the same user u and where: valn≠1 =¬(c ˆ m · P re(opm )) · ¬P re(opm≠1 ) · ¬P re(opm≠2 ) · . . . · P re(opn ) and such that val0 ∆ valn≠1 · ’i.(i œ (n..m) ∆ (u, Ru , ci ) |= true) 5.2 Application We apply our algorithm to the meeting scheduler example starting from the following initial state val0 : val0 = ˆ person = {Alice, Bob} · meeting = {m1 } · meetingOwner = {(Alice ‘æ m1 )} · meetingP articipant = {(m1 ‘æ Bob)} In this state user John is not allowed to modify meeting m1 because the au- thorization constraint allows modification only for the owner of m1 . A malicious scenario would lead to a state where John becomes able to execute a modifica- tion operation such as operation meetingSetStart on meeting m1 . In this state we have to verify: P re(meetingSetStart(m1 , start))= ˆ m1 œ meeting · start œ N AT and (John, SystemUser, M eetingOwner(m1 ) = John) |= true 1. First iteration: considering the following symbolic states valm≠1 = (M eetingOwner(m1 ) = John) · m1 œ M eeting · start œ N AT valm≠2 = ¬valm≠1 15 we have: – val0 ”∆ valm≠1 because, in state val0 , M eetingOwner(m1 ) = Alice, and – proof (1) succeeds for the operations meetingNew and personAddMeetingOwner. Then, we may go on the second iteration of the algorithm for each of these operations. 2. Second iteration: we partition state valm≠2 into two sub-states: valm≠2 = ¬valm≠1 · P re(opm≠1 ) valm≠3 = ¬valm≠1 · ¬P re(opm≠1 ) – case 1: we choose opm≠1 = meetingN ew, and then we have: – P re(meetingN ew) = ˆ m1 œ meeting · John œ person, and – val0 ”∆ valm≠2 because John ”œ person In this case, the algorithm does not find an operation leading to a state where operation meetingN ew becomes enabled. Indeed, no operation satisfied proof obligation (1). Our algorithm concludes that it does not exist an attack scenario invoking meetingN ew at this step. init0 meetingN ew meetingSetStart Fig. 5. No state enables operation meetingN ew is found – case 2: we choose opm≠1 = personAddM eetingOwner and then we have: – P re(personAddM eetingOwner) = m1 œ meeting·John œ person·(John, m1 ) œ / M eetingOwner – val0 ”∆ valm≠2 because John ”œ person In this case, proof (1) succeeds for operation personN ew which means that if personN ew is executed, it may lead to a state where meetingN ew can be enabled. 3. Third iteration: we partition state valm≠3 into two sub-states: valm≠3 = ¬valm≠1 · ¬P re(personAddM eetingOwner) · P re(personN ew) valm≠4 = ¬valm≠1 · ¬P re(personAddM eetingOwner) · ¬P re(personN ew) This stops normally the algorithm because in this case val0 ∆ valm≠3 . Indeed, P re(personN ew) =ˆ John œ / person and in the initial state John does not belong to set person. Figure 6 presents the full symbolic scenario that allows John to modify Alice’s meeting. 16 init0 personN ew personAddM eetingOwner meetingSetStart Fig. 6. Symbolic malicious behaviour for user John. 5.3 Discussion Technically, our approach applies the GeneSyst tool in order to produce proof obligations and then asks the AtelierB prover to discharge them automatically. As the resulting scenarios are symbolic and based on “possibly reached proofs”, the analyst can conclude that attacks may exist but he can not attest their fea- sibility for the concrete system. An interesting contribution of our proof-based symbolic sequences, besides the fact that they draw the analyst’s attention to potential flaws, is that they give useful inputs to the model-checker. Indeed, a model-checking tool can be used to exhibit, from a symbolic behaviour, an ob- servable concrete sequence of operations that can be followed by an attacker. In order to reduce significantly the state space, we can ask ProB to explore only operations found in the symbolic malicious scenarios. For our example, when try- ing only operations personNew, personAddMeetingOwner and meetingSetStart, ProB exhibits a concrete attack scenario after visiting a dozen of states which shows a significant speed up with respect to our initial ProB attempts (involving more than 1000 states). Our technique was able to extract another scenario (figure 7) which can be executed by user Bob from the same initial state, in order to steal the ownership of m1 . In this scenario, Bob first cancels the meeting and then he recreates it before applying the critical operation. init0 meetingCancel meetingN ew meetingSetStart Fig. 7. Symbolic malicious behaviour for user Bob. The first scenario, done by user John, is made possible by the full access per- mission to class Person, associated to role SystemAdministrator, which includes the right to modify association ends. This attack affects meeting integrity. One 17 solution can be to add a SSD constraint between roles SystemAdministrator and SystemUser. John will then still be able to become owner of the meeting, but will not be able to log in as SystemUser in order to modify it. The second scenario done by Bob was possible due to role Supervisor which gives him the right to cancel a meeting, and then, as a SystemUser he can recreate it in order to become its owner. This scenario does not point out a flaw since whenever a meeting is cancelled it should be legitimate that a user can start a new meeting with the same identifier as the cancelled one. 6 Conclusion We described in this paper a symbolic search approach that can extract insider malicious behaviours from a formal Information System modelling. The meet- ing scheduler example was discussed in several articles [3, 2]. However, they do not report the attack scenarios presented in this paper. This is due to the fact that dynamic evolution of the functional state is not taken into account. Con- tributions of this paper showed how dynamic analysis, assisted by proofs and model-checking, is useful to find out potential threats. In addition, thanks to our algorithm, proofs and model checking tools, our method can be fully automated in order to extract attack scenarios breaking authorization constraint. In [10], a dynamic analysis is done interactively with the help of a Z animator, but it is tedious and may miss many possible flaws. We also applied our approach on the case study that has been treated in [8] and we were able to find, automat- ically, the discussed threat. Currently we are looking for application on a real case study, issued from the ANR-Selkis project7 , and which deals with a medical information system involving various authorization constraints. Our approach is automated by exploiting tools B4MSecure8 , GeneSyst9 , Ate- lierB10 and ProB11 . B4MSecure translates functional and security graphical models into B specification, from which we automatically produce proof obli- gations on reachability properties by taking advantage of the GeneSyst tool. Then, these proof obligations are discharged automatically using the AtelierB prover. When a symbolic scenario is found, ProB is used to explore concrete state space focusing on operations issued from the symbolic scenario. The main limitation of our work is that sometimes, when proof obligations are complex, AtelierB fails to prove them automatically. Interactive proofs are then required, but they may be pretty difficult for the analyst. One naive solution is to keep operations for which proofs don’t succeed automatically in order to be exploited further using the model-checker. A more interesting solution is to focus on other kinds of proof obligations. For example, one can try to prove that an operation o is never enabled from a state E and/or o never reaches a state F . Applying 7 http://lacl.univ-paris12.fr/selkis 8 http://b4msecure.forge.imag.fr 9 http://perso.citi.insa-lyon.fr/nstouls/?ZoomSur=GeneSyst 10 http://www.atelierb.eu/ 11 http://www.stups.uni-duesseldorf.de/ProB 18 these proofs to the meeting scheduler example we were able to eliminate half of the operations after proving automatically that they cannot be involved in the attack scenario. We believe that reachability properties can be expressed by means of LTL formula. We started exploring this direction basing on the LTL formula checking facilities of ProB and the first results are promising. References 1. K. Anastasakis, B. Bordbar, G. Georg, and I. Ray. Uml2alloy: A challenging model transformation. Model Driven Engineering Languages and Systems, 2007. 2. D. Basin, M. Clavel, J. Doser, and M. Egea. Automated analysis of security-design models. Information Software Technology, 51, 2009. 3. D. Basin, J. Doser, and T. Lodderstedt. Model driven security: From uml models to access control infrastructures. ACM Trans. Softw. Eng. Methodol., 15(1), 2006. 4. M-L. Potet D. Bert and N. Stouls. GeneSyst: a Tool to Reason about Behavioral Aspects of B Event Specifications. Application to Security Properties. In H. Tre- harne, S. King, M. C. Henson, and S. A. Schneider, editors, ZB 2005: Formal Specification and Development in Z and B, 4th International Conference of B and Z Users, volume 3455 of LNCS. Springer-Verlag, 2005. 5. A. Idani, J. Boulanger, and L. Philippe. Linking paradigms in safety critical sys- tems. International Journal of Computers and their Applications (IJCA), Special Issue on the Application of Computer Technology to Public Safety and Law En- forcement, 16(2), 2009. 6. M. Kuhlmann, K. Sohr, and M. Gogolla. Employing uml and ocl for designing and analysing role-based access control. Mathematical Structures in Computer Science, 23(4), 2013. 7. K. Lano, D. Clark, and K. Androutsopoulos. UML to B: Formal Verification of Object-Oriented Models. In Integrated Formal Methods, volume 2999 of LNCS. Springer, 2004. 8. Y. Ledru, A. Idani, J. Milhau, N. Qamar, R. Laleau, J-L. Richier, and M-A. Labi- adh. Taking into account functional models in the validation of is security policies. In Camille Salinesi and Oscar Pastor, editors, CAiSE Workshops, volume 83 of Lecture Notes in Business Information Processing. Springer, 2011. 9. Y. Ledru, A. Idani, J. Milhau, N. Qamar, Régine Laleau, J-L. Richier, and M-A. Labiadh. Validation of IS security policies featuring authorisation constraints. In- ternational Journal of Information System Modeling and Design (IJISMD), 2014. 10. Y. Ledru, N. Qamar, A. Idani, J-L. Richier, and M-A. Labiadh. Validation of security policies by the animation of z specifications. In Proceedings of the 16th ACM Symposium on Access Control Models and Technologies, SACMAT ’11, New York, NY, USA, 2011. ACM. 11. M. Leuschel and M. Butler. ProB: A Model Checker for B. In FME 2003: Formal Methods Europe, volume 2805 of LNCS. Springer-Verlag, 2003. 12. N. Qamar, Y. Ledru, and A. Idani. Evaluating RBAC Supported Techniques and their Validation and Verification. In 6th International Conference on Availabil- ity, Reliability and Security (ARES 2011), Vienna, Autriche, August 2011. IEEE Computer Society. 13. C. Snook and M. Butler. UML-B: Formal modeling and design aided by UML. ACM Transactions on Software Engineering and Methodology, 15(1), 2006. 19 Traffic Analysis of Web Browsers Sami Zhioua∗ and Mahjoub Langar∗∗ ∗ Information and Computer Sciences Department, KFUPM P.O. Box 958, Dhahran 31261, KSA zhioua@kfupm.edu.sa ∗∗ Ecole Nationale des Ingenieurs de Tunis B.P. 37, Belveder, Tunis mahjoub.langar@gmail.com Abstract. Tor network is currently the most commonly used anonymity system with more than 300,000 users and almost 3000 relays. Attacks against Tor are typically confirmation attacks where the adversary injects easily discernible traffic pattern and observes which clients and/or relays exhibit such patterns. The main limitation of these attacks is that they require a “powerful” adversary. Website fingerprinting is a new breed of attacks that identifies which websites are visited by a Tor client by learn- ing the traffic pattern for each suspected website. Recent works showed that some classifiers can successfully identify 80% of visited websites. In this paper we use a classic classifier, namely, decision trees (C4.5 algo- rithm) and we study to which extent popular web browsers can resist to website fingerprinting attacks. Among four studied web browsers, Google Chrome offers the best resistance to website fingerprinting (5 times bet- ter than the other web browsers). Since most of existing fingerprinting techniques have been evaluated using Firefox web browser, we expect the accuracy results of existing works to be reduced in case Chrome browser is used. 1 Introduction Anonymity systems, such as Tor [1] and Jap [2] are designed primarily to pro- vide privacy and anonymity to Internet users living in oppressive regimes giving them the opportunity to evade censorship. These systems achieve anonymity by embedding user data inside several layers of encryption and by forwarding the traffic through a set of relay nodes/proxies. This makes the job of an eavesdrop- ping adversary much more challenging since by just observing the traffic she cannot deduce who is communicating with whom and what is the type of traffic exchanged. Tor [1] represents the current state-of-the-art in low-latency anonymity sys- tems. The Tor network is currently the largest deployed anonymity network ever, consisting of almost 3000 relays and more than an estimated 300,000 users. Most of attacks against Tor anonymity system were traffic confirmation at- tacks where the idea was to inject easily discernable traffic pattern and observe which potential clients are exhibiting such patterns [3–7]. Most of these attacks 20 require a “powerful” adversary which is assumed to observe the traffic of a sig- nificant number of Tor relays and in some attacks to inject malicious relays in the Tor network. These assumptions are relatively strong and beyond the ca- pabilities of most of attackers including totalitarian regimes. A more practical attack on Tor which does not require strong assumptions is passive traffic anal- ysis. Traffic analysis consists in intercepting and analyzing the traffic messages (usually encrypted) in order to reveal information about the communication (e.g. the identities of the communicating entities, the type of data exchanged, etc.). To carry out the attack, the adversary is assumed to observe the traffic of only one side of the communication (usually the Tor client). This threat model is very common and holds particularly in presence of censorship. Website fingerprinting [8] is a variant of passive traffic analysis that can be carried out by a local eavesdropper or by any entity observing Tor client traffic. In this attack the adversary analyzes the traffic to extract patterns that can reveal the identity of the website accessed by the client. Patterns are constructed from certain features in the traffic such as the size of transferred data, the timing, the order of packets, etc. Website fingerprinting was first used to analyze encrypted HTTP traffic [8– 11]. Most of these attacks were based on tracking the size the objects fetched by the main web page. With the migration to HTTP/1.1 which makes use of persistent connections and pipelining, it is no longer possible to easily distin- guish between single objects fetching. Only few works focused on implementing website fingerprinting on anonymity systems [12–15]. It turned out that website fingerprinting is much challenging when applied on anonymity systems in partic- ular Tor. The reason is that Tor protocol performs some structural modifications in the traffic: restructuring the traffic into fixed size cells, merging small packets together, multiplexing TCP streams, etc. However, despite these challenges, re- cent works showed that the precision of website fingerprinting could be as high as 80% when applied on Tor [15]. Tor protocol might be used with different web browsers1 . Since web browsers use different user agents and process data packets differently, the choice of the web browser should have an impact on the efficiency of website fingerprinting. In this paper we study the impact of the choice the web browser on the anonymity of Tor clients with respect to website fingerprinting attacks. We consider a repre- sentative set of four popular web browsers, namely, Firefox, Chrome, Konqueror, and Internet Explorer, and we empirically analyze to which extent they resist to website fingerprinting. This is the first work in the literature that studies the efficiency of website fingerprinting while using different web browsers. All existing works were focusing on a single web browser, mainly Firefox. The contributions of this paper are two-fold: 1. A detailed and complete survey of existing website fingerprinting approaches in particular targeting anonymity systems. 2. A comparative analysis of the most popular web browsers according to their resistance to website fingerprinting. 1 Provided that the web browser allows to configure the socket proxy 21 2 Related Work Early website fingerprinting techniques were focusing on analyzing simple en- crypted HTTP traffic. Hintz [8], which is the first to use the term “fingerprint- ing” to refer to this type of attack, implemented a simple website fingerprinting attack targeting the SafeWeb encrypting web proxy [16]. The attack was based on tracking the size of objects fetched by a visited website. This was possible because the author did a strong assumption that every web object (image, ads, etc.) is fetched through a separate TCP connection using a different port. His experiment was a simple proof of concept distingushing only 5 websites. He achieved a detection rate between 45 and 75%. Similarly, Sun et al. [9] based their approach on the size of fetched objects. Objects are isolated in the encrypted traffic by counting the number of packets between blocks of requests. The fingerprint is expressed as a multiset of object lengths. An unknown traffic sequence is then evaluated against website finger- prints using a measure of similarity (Jaccard’s Similarity [17]). A similarity value more than a threshold c indicates a matching. In their empirical analysis, Sun et al. constructed a database of 2000 website fingerprints and then tried to dis- tinguish these same 2000 websites out of a set of 100,000 websites. The optimal accuracy was obtained with the threshold c equal to 0.7 where 75% of the 2000 websites were correctly identified with a false positive rate of 1.5%. The strong assumption that web objects can be distinguished by observing the different TCP connections does not hold anymore since with the migration to HTTP/1.1, no TCP connection is opened for each object as was the case in HTTP/1.0. Bissias et al. [10] were the first to use IP packet sizes and Inter-Packet-Time (IPT) instead of the size of fetched objects to fingerprint websites. From every website visit they extract two traces: one size trace and one time trace. The size trace is the sequence of packet sizes while the time trace is the sequence of IPT times. Then all traces corresponding to a given website are merged into a web- site profile2 by computing the arithmetic mean at every time step. Once a bank of profiles is constructed, an unknown network traffic is matched with each one of the constructed profiles using Cross Correlation [18]. The empirical analysis was based on the 100 websites most visited in the authors’s department (Uni- versity of Massachusetts) and showed that size profiles are much more efficient in identifying visited websites than time profiles. With size profiles, 20% of the analyzed traces are correctly identified after one guess and 65% after 10 guesses while with time profiles 8% of websites were correctly identified after one guess and 27% after 10 guesses. The analysis showed also that the time gap between the training phase (constructing the profiles) and the testing phase has only a small impact on the accuracy: a one hour gap is only 5% better than a 168 hours gap. Liberatore et al. [11] obtained much better results by focusing only on packet sizes. In their work, they represented a traffic trace as a vector of packet size 2 Actually two profiles: a size profile and a time profile 22 frequencies: each visit will result in a histogram of packet size frequencies. They tried two classification techniques: Jaccard’s Similarity [17]3 and Naive Bayes [19]. The empirical analysis was also performed using the University of Massachusetts’s typical traffic by filtering the top 2000 visited websites. Jac- card’s based classification was slightly better than Naive based one with 73% of website visits correctly identified. Experiments showed also that the training set need not be very large since a training set of size 4 resulted in almost the same accuracy of training set of size 12. All above works focused on website fingerprinting typical encrypted HTTP traffic. As anonymity systems became popular, recent website fingerprinting con- tributions focused on attacking those systems, in particular Tor [12–15]. Shi et al. [13] detailed a website fingerprinting attack on Tor. They adapted Hintz [8] and Sun et al. [9] techniques for Tor since instead of tracking the size of fetched objects (which is not possible in Tor), they tracked the number of packets sent or received in every interval4 . A traffic trace is then represented by a vector specifying the number of intervals with 2 packets, the number of intervals with 3 packets, etc. Once a profile is built for a website (after several visits), the similarity between a profile and a traffic trace is computed using cosine similarity. The technique was evaluated using the traffic from the top 20 websites in Japan. They could identify successfully 50% of the visited websites. Federrath et al. [12] used a Multinomial Naive Bayes (MNB) classifier to website fingerprint 6 anonymity systems: 4 single-hop proxy VPNs and 2 multi- hop systems: Tor and JonDonym [2]. As in Liberatore et al. [11] a traffic trace is represented as a histogram of packet size frequencies distribution without taking into consideration the packet ordering nor packet timing. They improved the efficiency of the MNB classifier by using text mining optimizations [19] such as Term Frequency Transformation, Inverse Document Frequency, etc. The evalua- tion was based on the top 2000 websites extracted from the log files of medium- range proxy server used by 50 schools. These 2000 websites has been filtered to 775 websites. The accuracy of their technique was very good for single-hop anonymity systems where 94% of website visits were correctly identified while it was relatively poor for multi-hop anonymity systems: 20% for JonDonym and only 3% for Tor. This shows once again that website fingerprinting is much more challenging with anonymity systems than with typical encrypted HTTP traffic. Panchenko et al. [14] focused only on Tor and JonDonym and used SVMs (Support Vector Machines) for classification. They represented a traffic trace as a sequence of packet lengths where input and output packets are distinguished by using negative and positive values. In addition, they inject some features in these sequences to help in the classification such as size markers (whenever flow direction changes, insert the size of packets in the interval), number markers 3 To use Jaccard’s Similarity as a classifier, they turned the metric value into a class membership probability by dividing it by the sum of all metric values in the traininig set. 4 An interval refers to the time period without packet flow change. Moving from one interval to the other happens when the direction of the flow changes. 23 (number of packets in every interval), total transmitted bytes, etc. They used Weka tool [20] to fine-tune the SVM parameters. The proposed technique has been evaluated using two experiments: Closed-world and Open-world. In the closed-world experiment, the same set of 775 websites of Federrath et al. [12] as well as ten-fold cross validation have been used to estimate the accuracy. As of open-world experiment, 5000 websites have been randomly chosen among the top one million websites according to Alexa [21] in addition to 5 censored websites. The closed-world experiment showed that the SVM technique resulted in an accuracy of 30% for the basic variant and 54% when all features are used5 . The open-world experiment showed that, censored websites were successfully identified with a true positives rate between 56 and 73% while the false positives rate was less than 1%. The most recent contribution was by Cai et al. [15]. As in Panchenko et al. [14], they represented a traffic trace as a sequence of (negative and positive) packet lengths. The training and testing is based on an SVM with a distance- based kernel. They tried several variants of parameters and distances and ob- tained the best accuracy with a Damerau-Levenshtein edit distance [22, 23]. The use of this distance is motivated by the fact that it is the length of the short- est sequence of character insertions, deletions, substitutions, and transpositions required to transform a trace t to t′ . These operations correspond to discarding and reordering of packets in a stream. In order to compute the distance between two traces of different lengths, the Damerau-Levenshtein distance is normalized with respect to the length of the shortest trace between the two. For evaluation they used the top 1000 websites according to Alexa which are then filtered to 800 websites. Using the basic version of Tor, they could successfully identify 80% of visited websites. However, when random cover packets are added to the traffic, the accuracy falls to 50%. Decreasing the size of the training set from 36 to 4 samples decreased the accuracy with 20%. In all aformentioned works, without exception, the website fingerprinting ap- proaches have been evaluated using only one web browser, mainly, Firefox. Since web browser use different browser engines and user agents and consequentely fetch pages differently, we strongly think that the chosen web browser has an impact on the estimated accuracy. In this paper, we consider a representative set of popular web browsers and repeat the data collection and experiments for every one of them. Our aim is to compare the resistance of web browsers to website fingerprinting attacks. 3 Threat Model The typical threat model for anonymity systems is a global passive adversary that can observe all the traffic of the network. However, since Tor is a low- latency anonymity system, it has been designed to protect against a weaker form of adversary. Indeed, it is assumed that the adversary can observe only 5 The feature with the highest contribution was the total number of packets in the traffic trace. 24 some fraction of the network traffic; who can generate, modify, delete, or delay traffic; who can operate onion routers of his own; and who can compromise some fraction of the onion routers [1]. In this paper we assume a weaker threat model where the attacker can only access the encrypted traffic between the client and the first Tor relay. The attacker does not generate, modify, delete or delay any traffic which makes the attack completely stealth. Except the attacker own Tor node, no other Tor relay or server is compromised which makes the attack easily deployable in practice. 4 Tor Traffic Capture Fig. 1. Tor Traffic Capture Traffic analysis typically starts by intercepting the traffic packets. In a typ- ical scenario, only the victim and the gateway can capture the data flowing between them. In practice, however, several entities might have access to the traffic packets. The administrator of the LAN has access to the traffic of all endpoints in the network. The ISP (Internet Service Provider) can monitor the traffic of any of its subscribers. A Law Enforcement Agency, after approval from the ISP, can observe and record the traffic of any internet user which is referred to as lawful intercept. A censoring entity can observe the traffic of any user in its “juridiction”. In addition to these entities, a malicious user in the LAN can carry out a MITM (Man-In-The-Middle) attack between the victim and the gateway and make all the traffic pass through her. The MITM attack can be easily per- formed using ARP spoofing/poisoning. Cain & Abel [24] and Ettercap [25] are two popular tools for ARP Spoofing/Poisoning. Assuming that the attacker can intercept the Tor traffic packets using one of the above scenarios, the Tor traffic capture goes through three stages as shown in Figure 1. First the traffic is dumped in a file using a simple tool like tcpdump6 . 6 Alternative tools like Wireshark/tshark or Omnipeek can be used as well. 25 Then the raw traffic is filtered to keep only Tor related traffic. Finally, the traffic is classified into streams. Since the list of Tor relays is public, we use it to filter Tor traffic from the rest of the traffic. The list of Tor relays can be extracted from the Tor status files, in particular the cached-consensus file. These files can be downloaded manually from one of the authorities or they can be accessed directly in the Tor status local folder. Tor automatically updates those files once they are no more fresh. In our setting, the attacker, which is also a Tor client, uses her own Tor status files to extract the list of Tor relays. The next step in the Tor traffic capture is to classify the packets into streams at the TCP protocol level7 . Every stream is then tracked using IP addresses, ports, TCP flag bits, Sequence and Acknowledgment numbers. The stream is closed after a TCP connection termination (FIN, ACK-FIN and ACK). Typically, Tor creates two types of streams: short living streams and long living streams. Short living streams are streams to download either router de- scriptors or directory consenus. They last around 3 minutes because the down- load takes a couple of seconds and then the stream stays idle until it hits the maximum stream idle period which is set to 3 minutes in Tor. Typically, only one circuit is established during a short livinig stream which is a single-hop cir- cuit. The long living streams are for data communications and typically several 3-hop circuits are established during the lifetime of such streams. Besides, since all Tor communications are encrypted, all Tor streams initiate a TLS handshake just after the TCP three-way handshake. Fig. 2. Tor stream initial packets generated by Wireshark 7 It is important to note that the classified streams at this stage are streams on top of which Tor communications are conveyed. These are different from the TCP streams tunneled through Tor. Hence, there are two levels of TCP streams: one level below the Tor protocol and one level on top of Tor protocol. 26 Figure 2 shows the initial packets in a Tor stream. The three first packets are for establishing the TCP connection then a TLS handshake follows. The illustrated stream is a short-living stream whose goal is to download a set of routers descriptors. Therefore, it creates a single hop circuit with the directory server which is performed using a CREATE and CREATED cells as shown in the Figure. 5 Data Collection In order to evaluate the accuracy of the website fingerprinting for each web browser, 100 websites have been used. The set of 100 websites is composed of 90 randomly chosen websites from the top 1000 visited websites worldwide according to Alexa [21] and 10 censored websites in some countries of the Mid- dle East. All censored websites are related to anonymizers and proxy services8 . Website traffic traces are collected in 24 hours sessions. In each session, web- sites are fetched several times in a round-robin fashion. For visiting websites, we used two lab machines running Ubuntu Linux. Another Windows 7 machine is used to fetch websites through Internet Explorer. Traffic packets are dumped using tcpdump version 4.1.1 and libpcap version 1.1.1. Only packet headers are dumped in the dump file9 . The experiments were performed using Tor version 0.2.2.39. We wrote a python script to automate the fetching of websites. For each website, the script proceeds as follows: (1) it records the system time, (2) requests the website url, (3) waits for 50 seconds (the time to load the website10 , (4) stops the website connection, (5) records the system time, (6) waits for 10 seconds (to have a time gap between every two visits). Time snapshots are taken just before and after a website visit so that they can be intersected with the dump file in order to isolate the traffic for every website visit. Once isolated, the taffic corresponding to each visit is represented as a se- quence of packet sizes where a positive value refers to an inbound packet while a negative value refers to an outbound packet. This representation captures three features about the trace, namely, the size and direction of each packet and the order of packets11 . The traffic representation is the same as recent works [14, 15]. For every visit, we keep only the first 500 packets of the traffic so that all obtained sequences have the same length12 . Interestingly, parsing Tor traffic during a website visit shows that packets are flowing through several TCP streams not just one. One reason is that Tor needs to update the Tor relays status regularly by fetching fresh data from the directory servers and also to send dummy cells to keep some circuits open. TCP streams 8 Examples of these websites include: torpoject.org, unblock-proxy.net, etc. 9 tcpdump is launched with options -n -tttt. 10 If a website takes more than 50 seconds to load, the sample sequence will be incom- plete. 11 As in most of related work, acknowledgement packets are ignored. 12 This is a requirement for the classification algorithm. 27 used for fetching directory servers data can be easily distinguished from TCP streams used for data communication since the number of packets exchanged does not exceed 100 packets. Interestingly, ruling out these short TCP streams, the traffic resulting from visiting some websites is carried through two TCP streams with more than 200 packets each. This shows that Tor does not always multiplex the traffic of a website in a single stream. For those website visits, the corresponding traffic sequence is obtained by merging both streams into one. Similarly to normal visits, only the first 500 packets of the merged stream are kept. In order to avoid the noise introduced by active content (Flash, etc.) and the browser cache, active content and caching are disabled. For instance, Chrome internet browser is used with the “incognito“ mode while firefox is used with the ”private“ mode. Four Web browsers have been used for fetching websites, namely, Chrome 18.0, Firefox 15.0.1, Internet Explorer 9.0 and Konqueror 4.8.5. The same experiment is performed four times, each with a different browser. The experiment consists in 10 iterations. Every iteration consists in visiting all 100 websites once. Hence, every website is visited 10 times using the same web browser yielding a maximum of 10 samples for very website13 . 6 Classification Algorithm The goal of this paper is to compare popular web browsers with respect to their capabilities to resist to web site fingerprinting. To this end, we use a classical classifier, namely, decision trees. It is important to mention that recent related work showed that better fingerprinting results can be obtained using other clas- sifiers in particular Support Vector Machine (SVM) based. The next paragraphs give an overview of the decision tree classifier and the techniques used to evaluate the accuracy of the classifier for every web browser. 6.1 Decision Tree Classifier Decision tree learning is a well known and classic classification technique [26]. It is very popular because it is self-explanatory, easy to understand and to use since it requires few parameter settings. It has been successfully used for classification in several diverse areas. Overall, it is well suited for exploratory knowledge dis- covery. In this paper we use a decision tree known as C4.5 [27] to classify website visits traffic sequences. A decision tree can be learned typically using a top-down approach where each node corresponds to one of the input variables, each edge corresponds to possible values of each variable, and each leaf correspond to a class label. Every data set is split into subsets based on attributed values. This process is repeated 13 Some website visits resulted in less than 500 packets. These sequences are discarded from the data set. 28 recursively and is called recursive patitioning. The recursion is completed when splitting adds nothing to the prediction. Inducing a decision tree using a top- down approach requires dealing with three other issues apart from selecting the best attribute to use at each node in the tree. Firstly, one has to choose a splitting threshold to form the two children for each node. Second, one needs a criterion to determine when to stop growing the tree. Thirdly, the final issue is how to decide what class label to assign for the terminal (leaf) node. Classic strategies for splitting mainly focus on the use of impurity criteria, e.g., information gain, gain ratio and gini index. C4.5 decision tree uses gain ratio to select the best attribute and choose the optimal splitting threshold. In this approach, the attribute value that provides the best gain ratio is chosen as splitting threshold. To address the second issue discussed above, i.e., to avoid difficulties in choosing a stopping rule, most decision tree induction algorithms grow the tree to its maximum size where the terminal nodes are pure or almost pure, and then selectively prune the tree. The class label of each of the terminal nodes are typically decided based on the majority voting, i.e. the class label of data instances that are major in terms of counting compared to the other classes that contain in the respective terminal node. 6.2 Cross-Validation To achieve a generalized performance of the decision tree used in this paper a cross-validation (CV) scheme is applied. CV is a well known method to test the performance of a classifier by varying training and test datasets [28]. CV is a standard test commonly used to test the ability of the classification system using various combinations of the testing and training data sets [29, 28, 30]. In this method, classification is measured by systematically excluding some data instances during the training process and testing the trained model using the excluded instances [31]. The process is repeated to cover all the dataset as testing dataset. In this paper, we have chosen 10-fold CV scheme where each time data in 1 fold are applied as test data and the rest 9 folds are used to train the model. 6.3 Performance Metric Classification accuracy is one of the widely used performance metric to evaluate a classifier. Classification accuracy (ACC) is defined as the ratio of the number of all samples that are classified correctly over the total number of samples available (N). ACC = (T P + T N )/N (1) where, T P (True Positives) = the total data instances from positive class that are classified as positive by the classifier; T N (True Negatives) = the total data instances from negative class that are classified as negative by the classifier. 29 7 Web Browser Resistance to Fingerprinting Popular web browsers differ in several aspects, in particular, they use different web browser engines. The engine does most of the work of a web browser since it retrieves the document corresponding to a given URL and handles links, cookies, scripting, plug-ins loading, etc. The type of the web browser engine has an impact on the shape of the observed (encrypted) traffic. For example, some web browser engines may wait until all data is received before rendering a page while others may begin rendering before all data is received. In order to compare the resistance of popular web browsers to website fin- gerprinting attacks, data is collected using different web browsers and then a C4.5 decision tree classifier is used to evaluate the accuracy of the website fin- gerprinting. More precisely, once the data about website visits is collected, we evaluated the accuracy of website fingerprinting in four scenarios: – Basic Packets Sequence: The traffic trace is the first 500 packets of the TCP stream with the largest number of packets. – Merged Streams: The traffic trace is the first 500 packets obtained by merging all TCP streams with more than 200 packets. If only one TCP stream has more than 200 packets, this scenario is the same as the first one. – Rounded Packet Sizes: The same as the first scenario but the packet size values are rounded to multiples of 600. For instance, a packet size of 512 is rounded to 600 while a packet size of 743 is rounded to 1200. – Merged Streams and Rounded Packet Sizes: This scenario is the com- bination of the two previous scenarios. 14 Basic Packets Sequence Merged Streams Rounded Packet Sizes 12 Merged Streams and Rounded Packet Sizes 10 Classification Accuracy 8 6 4 2 0 Chrome Firefox IExplorer Konqueror Fig. 3. Website Fingerprinting Accuracy for Common Web Browsers 30 Figure 3 shows the accuracy of the C4.5 classifier for each browser and for each scenario. Using Firefox, Internet Explorer and Konqueror, more than 9% of websites have been correctly identified by the classifier. With Chrome, however, only 2% of websites have been successfully identified. The histogram shows also that Rounding packet size values improves the efficiency of our classifier for all browsers. Merging TCP streams, on the other hand, improved the efficiency of the classifier only for Firefox and Konqueror. Merging streams resulted in lower accuracy than the basic scenario for Chrome and Internet Explorer. The most important result illustrated by the histogram is that Chrome browser offers a better resistance to website fingerprinting than the other studied browsers. The advantage Chrome browser has on the other studied browsers is expected to be preserved in case a more efficient classifier (e.g. [15]) is used. 8 Conclusion Website fingerprinting is a new attack on Tor anonymity system that tries to reveal the identities of visited websites by recognizing patterns in the Tor traf- fic. Compared to previous attacks on Tor, in particular confirmation attacks, website fingerprinting does not require an attacker with extended capabilities. Only the ability to sniff the Tor client encryted traffic is required. In this pa- per we presented a detailed survey on website fingerprinting techniques which recently reached high accuracy rates (80%) [15] on Tor anonymity system. The main contribution of this paper, however, is an empirical analysis of how much resistance popular web browsers provide against website fingerprinting. Four notable web browsers have been considered, namely, Firefox, Chrome, Internet Explorer, and Konqueror. The analysis showed that the resistance of Chrome to website fingerprinting is five times better than the remaining web browsers. Since most of existing fingerprinting techniques have been evaluated using Fire- fox web browser [12, 14, 15], we expect the accuracy results to be reduced in case Chrome browser is used. There are two main mitigation approaches for the website fingerprinting at- tack. The first and the most commonly used approach is padding where the sender appends some random (dummy) bits to the actual data to obtain, for instance, fixed size packets. It has been shown that padding reduces the accu- racy of fingerprinting techniques only slightly [14, 15]. The second mitigation approach is traffic camouflage which can be implemented in two ways. The first variant is to obfuscate the actual traffic by loading several pages simultaneously. Panchenko et al. [14] load a randomly chosen page whenever an actual website is to be visited. The second variant is to disguise the actual traffic within a typical encrypted cover protocol such as Skype voice over IP. The StegoTorus proxy [32] constructs a database of pre-recorded packet traces from real sessions of a given cover protocol. Then, when a Tor client visits a website, it choses randomly a pre-recorded trace from the database which is used as a template to reshape the actual traffic. The packet sizes of the pre-recorded trace are matched exactly and the packet timings are matched “to the nearest millisecond”. 31 References 1. Dingledine, R., Mathewson, N., Syverson, P.: Tor : the second-generation onion router. In: Proceedings of the 13th Usenix Security Symposium. (August 2004) 2. Berthold, O., Federrath, H., Köpsell, S.: Web MIXes: A system for anonymous and unobservable Internet access. In: Proceedings of Designing Privacy Enhancing Technologies, Springer-Verlag, LNCS 2009 (July 2000) 115–129 3. Murdoch, S., Danezis, G.: Low-cost traffic analysis of Tor. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy, IEEE CS (May 2005) 4. Murdoch, S.: Hot or not: Revealing hidden services by their clock skew. In: Proceedings of CCS 2006. (October 2006) 5. Hopper, N., Vasserman, E., Chan-Tin, E.: How much anonymity does network la- tency leak? ACM Transactions on Information and System Security 13(2) (Febru- ary 2010) 6. Evans, N., Dingledine, R., Grothoff, C.: A practical congestion attack on tor using long paths. In: Proceedings of the 18th USENIX Security Symposium. (August 2009) 7. Mittal, P., Khurshid, A., Juen, J., Caesar, M., Borisov, N.: Stealthy traffic analy- sis of low-latency anonymous communication using throughput fingerprinting. In: Proceedings of the 18th ACM conference on Computer and communications secu- rity. CCS ’11, New York, NY, USA, ACM (2011) 215–226 8. Hintz, A.: Fingerprinting websites using traffic analysis. In: Privacy Enhancing Technologies (PETS),LNCS. Volume 2482., Springer (2002) 171–178 9. Sun, Q., Simon, D.R., Wang, Y.M., Russell, W., Padmanabhan, V.N., Qiu, L.: Statistical identification of encrypted web browsing traffic. In: Proceedings of the 2002 IEEE Symposium on Security and Privacy. SP ’02, Washington, DC, USA, IEEE Computer Society (2002) 19– 10. Bissias, G.D., Liberatore, M., Jensen, D., Levine, B.N.: Privacy vulnerabilities in encrypted http streams. In: Proceedings of the 5th international conference on Privacy Enhancing Technologies. PET’05, Berlin, Heidelberg, Springer-Verlag (2006) 1–11 11. Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of the 13th ACM conference on Computer and communications security. CCS ’06, New York, NY, USA, ACM (2006) 255–263 12. Herrmann, D., Wendolsky, R., Federrath, H.: Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In: Proceedings of the 2009 ACM workshop on Cloud computing security. CCSW ’09, New York, NY, USA, ACM (2009) 31–42 13. Shi, Y., Matsuura, K.: Fingerprinting attack on the tor anonymity system. In Qing, S., Mitchell, C., Wang, G., eds.: Information and Communications Security. Volume 5927 of Lecture Notes in Computer Science., Springer Berlin Heidelberg (2009) 425–438 14. Panchenko, A., Niessen, L., Zinnen, A., Engel, T.: Website fingerprinting in onion routing based anonymization networks. In: Proceedings of the 10th annual ACM workshop on Privacy in the electronic society. WPES ’11, New York, NY, USA, ACM (2011) 103–114 15. Cai, X., Zhang, X.C., Joshi, B., Johnson, R.: Touching from a distance: website fingerprinting attacks and defenses. In: Proceedings of the 2012 ACM conference on Computer and communications security. CCS ’12, New York, NY, USA, ACM (2012) 605–616 32 16. Safe Web: Safeweb proxy. ”http://www.safeweb.com” 17. Rijsbergen, C.J.V.: Information Retrieval. 2nd edn. Butterworth-Heinemann, New- ton, MA, USA (1979) 18. Bracewell, R.: Pentagram Notation for Cross Correlation. The Fourier Transform and Its Application. McGraw-Hill, New York, USA (1965) 19. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech- niques. Second edn. Morgan Kaufmann (June 2005) 20. Weka Tool: Weka: Data mining software in java. ”www.cs.waikato.ac.nz/ml/ weka” 21. Alexa Website: Alexa: The web information company. ”www.alexa.com” 22. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and re- versals. Soviet Physics Doklay (1966) 23. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33 (1999) 2001 24. Massimiliano Montoro: Cain & abel. ”http://www.oxid.it/cain.htm” 25. ALor and NaGA: Ettercap. ”http://ettercap.sourceforge.net” 26. Kotsiantis, S.B.: Supervised machine learning: A review of classification techniques. Informatica 31 (2007) 249–268 27. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) 28. Barton, J., Lees, A.: An application of neural networks for distinguishing gait patterns on the basis of hip-knee joint angle diagrams. Gait & Posture 5(1) (1997) 28 – 33 29. Ding, C.H.Q., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349–358 30. Hassan, M.R., Begg, R., Taylor, S., Kumar, D.K.: Hmm-fuzzy model for recognition of gait changes due to trip-related falls. In: Proceeding of the IEEE Eng Med Biol Soc., Springer Berlin Heidelberg (2006) 1216–1219 31. Begg, R.K., Palaniswami, M., Owen, B.: Support vector machines for automated gait classification. IEEE Transactions on Biomedical Engineering 52 (2005) 828– 838 32. Weinberg, Z., Wang, J., Yegneswaran, V., Briesemeister, L., Cheung, S., Wang, F., Boneh, D.: Stegotorus: a camouflage proxy for the tor anonymity system. In: Proceedings of the 2012 ACM conference on Computer and communications security, New York, NY, USA, ACM (2012) 109–120 33 1 1 2 1 2 34 C = hM, ⇠, |=, K, Lw , p.qi • M hN ⌃i N ⌃ enc dec pair M = ThN ,⌃i (X ) X ! M A M, A(m) m I I k 1 k (k 1 ) 1 = k • ⇠ ⌃ dec(enc(x, y), y 1 ) = x • |= M m M |= m m M ⇢ |= m m ⇢ • K I M KC (I) K(I) • Lw (L, w, t, u, ?, >) (2I , ✓, \, [, I, ;) ↵ • p.q M M m pM q w pmq 9m0 2 M.pm0 q w pmq p RG (p) p I N SL m1 : A ! B : {Na .A}kb m2 : B ! A : {B.Na }ka .{B.Nb }ka m3 : A ! B : A.B.{Nb }kb RG ( )= {AG , AG , BG , BG } AG = i.1 A ! I(B) {Nai .A}kb AG = i.1 A ! I(B) {Nai .A}kb i.2 I(B) !A {B.Nai }ka .{B.X}ka i.3 A ! I(B) A.B.{X}kb BG = i.1 I(A) !B {Y.A}kb i.2 B ! I(A) {B.Y }ka .{B.Nbi }ka BG = i.1 I(A) !B {Y.A}kb i.2 B ! I(A) {B.Y }ka .{B.Nbi }ka i.3 I(A) !B A.B.{Nbi }kb 35 MGp RG (p) Mp MGp R+ R R M m R r R.r r [[p]] p C F F C 8M, M1 , M2 ✓ M, 8↵ 2 A(M) 8 < F (↵, {↵}) =? F (↵, M1 [ M2 ) = F (↵, M1 ) u F (↵, M2 ) : F (↵, M ) = >, ↵ 2 / A(M ) ↵ M ? ↵ M u > ↵ M m1 = {↵}, m2 = { .A}kab M = {m1 , m2 } F • F (↵, M ) = F (↵, {m1 } [ {m2 }) = F (↵, {↵}) u F (↵, { .A}kab )= ? u > = ? • F ( , M )= F (↵, {m1 } [ {m2 })= F ( , {↵}) u F ( , { .A}kab )= > u F ( , { .A}kab ) = F ( , { .A}kab ) F F C 8M ✓ M, m 2 M.M |=C m ) 8↵ 2 A(m).(F (↵, m) w F (↵, M )) _ (pK(I)q w p↵q) F ↵ M m M ↵ F (↵, m) w F (↵, M ) ↵ pK(I)q w p↵q 36 F C F C F F C. F F p p F C 8R.r 2 RG (p), 8 2 : X ! Mp 8↵ 2 A(Mp ).F (↵, r+ ) w p↵q u F (↵, R ) F F F F p p ↵ 2 A(M) C 9⇢ 2 [[p]].(⇢ |=C ↵) ^ (pK(I)q 6w p↵q) [[p]] K(I) C ↵ pK(I)q 6w p↵q F C p F 8m 2 M.[[p]] |=C m ) 8↵ 2 A(m).(F (↵, m) w p↵q) _ (pK(I)q w p↵q). ⌅ ↵ F C p F p C ⌅ 37 EC EC ⌃\EC C keys ↵ m M ✓M f 2⌃ m2M Keys : A ⇥ M ! P(P(A)) 8t1 , t2 ...tn m Keys(↵, ↵) = {;} Keys(↵, ) =; ↵ 6= 2A n Keys(↵, fk (t1 , ..., tn )) = {k} ⌦ [ Keys(↵, ti ), fk 2 EC i=1 n Keys(↵, f (t1 , ..., tn )) = [ Keys(↵, ti ), f 2 EC i=1 Keys 8M ✓ M.Keys(↵, M ) = [ Keys(↵, m) Keys(↵, ;) = ;. m2M ⇠ !⇠ 8m 2 M, 8↵ 2 A(m), 8l ! r 2!⇠ , Keys(↵, r) ✓ Keys(↵, l) m+ m !⇠ 38 e(k, d(k 1 , m)) ! m Access Access(↵, m) ↵ m M ✓M f 2⌃ m2M Access : A ⇥ M ! P(P(A)) 8t1 , t2 ...tn m Access(↵, ↵) = {;} Access(↵, ) =; ↵ 6= 2A n 1 Access(↵, fk (t1 , ..., tn )) = {k } ⌦ [ Access(↵, ti ), fk 2 EC fk (t1 , ..., tn ) = fk (t1 , ..., tn )+ i=1 n Access(↵, f (t1 , ..., tn )) = [ Access(↵, ti ), f 2 EC fk (t1 , ..., tn ) = fk (t1 , ..., tn )+ i=1 Access(↵, f (t1 , ..., tn )) = Access(↵, f (t1 , ..., tn )+ ) Access 8M ✓ M.Access(↵, M ) = [ Access(↵, m) Access(↵, ;) = ;. m2M m m = {{A.D.↵}kab .↵.{A.E.{C.↵}kef }kab }kac ; Access(↵, m) = {{kac1 , kab1 }, {kac1 }, {kac1 , kab1 , kef1 }} ↵ p↵q A ? k pk 1 q w p↵q C m2M M ✓M ↵ 2 A(m) p↵q A ? ↵ m 8K 2 Access(↵, m).pKq w p↵q ↵ M 8m 2 M.↵ m m C 8↵ 2 A(m).↵ m M C 8m 2 M.m C. Clear(m) Clear(m) m 39 Clear m2M M ⇢M Clear(m) = {↵ 2 A(m)| ; 2 Access(↵, m)} Clear(M ) = [ Clear(m) m2M M M M |=C m ) 8↵ 2 A(m).(↵ m) _ (pK(I)q w p↵q) ⌅ pK(I)q 6w p↵q p↵q A ? k pk 1 q w p↵q k 1 ↵ k 1 M k0 pk q w pk 1 q 0 1 M M ↵ M M |=C ↵ ) pK(I)q w p↵q. ⌅ F p p F Mp F Mp Mp ↵ m Mp p↵q F (↵, m) = ? M, M1 , M2 ✓ M M, M1 M2 S : A ⇥ M 7 ! 2A S C 8 < S(↵, {↵}) = A, S(↵, M1 [ M2 ) = S(↵, M1 ) [ S(↵, M2 ), : S(↵, M ) =; ↵2 / A(M ) 40 ↵ M M M = {↵} M M ✓M M S : A ⇥ M 7 ! 2A S C 8M ✓ M, m 2 M, M |=C m ) 8↵ 2 A(m).(S(↵, m) ✓ S(↵, M )) _ (pK(I)q w p↵q) m M M m M M S : A ⇥ M 7 ! 2A C S C S S C. EK SGen C EK SGen EK SGen S • S(↵, ↵) = A; • S(↵, m) = ; ↵ 62 A(m); • 8↵ 2 A(m), m = fk (m1 , ..., mn ) : S(↵, m) ✓ ( [ A(mi )[{k 1 }\{↵}) fk 2 EC pk 1 q w p↵q m = m+ 1in • 8↵ 2 A(m), m = f (m1 , ..., mn ) : 8 > > [ S(↵, mi ) fk 2 EC pk 1 q 6w p↵q m = m+ <1in S(↵, m) = [ S(↵, mi ) f 2 EC m = m+ > > 1in : S(↵, m+ ) m 6= m+ • S(↵, {m} [ M ) = S(↵, m) [ S(↵, M ) 41 ↵ m = fk (m1 , ..., mn ) S ✓ ↵ m k k 1 ↵ [ A(mi )[{k 1 }\{↵} 1in pk 1 q w p↵q ↵ m M EK SGen EK S 2 SGen C !⇠ 8m 2 M, 8↵ 2 A(m) ^ ↵ 62 Clear(m), 8l ! r 2!⇠ , S(↵, r) ✓ S(↵, l) S C ⌅ !⇠ S S m = {↵.C}kab {↵.C}kab = {↵}kab .{C}kab {↵.C}kab S(↵, {↵.C}kab ) C {↵}kab .{C}kab S(↵, {↵}kab .{C}kab ) !⇠ {↵}kab .{C}kab {↵.C}kab C S EK SGen 42 S ↵ m p↵q = {A, B} m = {A.C.↵.D}kab S1 , S2 S3 S1 (↵, m) = {kab1 } S2 (↵, m) = {A, C, kab1 } S3 (↵, m) = {A, C, D, kab1 } C EK SGen EK SGen EK SGen EK SGen EK EK SM AX : SGen m k k 1 EK EK SEK : SGen m EK EK SN : SGen ↵ m p↵q = {A, C} m ={{{↵.C}kab .B}kac .D}kad EK 1 EK 1 EK SM AX (↵, m) = {C, B, kac }; SEK (↵, m) = {kac }; SN (↵, m) = {C, B} C (2A )✓ 7! ( Lw > M =; M 7! u (↵) ↵2M ⇢ {↵} ↵2I (↵) = p↵q EK EK EK EK FM AX = SM AX FEK = SEK FNEK = EK SN C ⌅ 43 ✓ w [ u S S EK SGen ↵ m kab p↵q = {A, B} m = {A.C.↵.D}kab kab1 = kab pkab q = {A, B} EK SEK (↵, m) = {kab1 } SNEK (↵, m) = {A, C, D} SMEK 1 AX (↵, m) = {A, C, D, kab } EK FEK (↵, m) = SEK (↵, m) = pkab q = {A, B} FN (↵, m) = EK 1 EK EK SN (↵, m) = {A, C, D} AX (↵, m) = {A, C, D}upkab q = {A, C, D} u {A, B} = {A, C, D, B} EK EK 1 FM AX (↵, m) = SM m, m1 , m2 2 M X m = V ar(m) S1 , S2 ✓ 2Xm ↵ 2 A(m); X, Y 2 Xm ✏ @X ✏ =✏ @X ↵ =↵ @X X =✏ @X Y =Y @X f (m) = f (@X m), f 2 EC [ E C @{X} m = @X m @[X]m = @{Xm \X} m @S1 [S2 m = @S2 [S1 m @S1 @S2 m @S2 @S1 m @m @ Xm m @ 44 @X m X m @[X]m X m X m @m m m = {A.X.Y }kab A kab X Y @m = {A}kab ; @X m = {A.Y }kab ; @[X]m = {A.X}kab m 2 MGp X 2 Xm m ↵ 2 A(m ) 2 8 <> ↵ 62 A(m ), F (↵, @[↵]m ) = F (↵, @m) ↵ 2 A(@m), : F (X, @[X]m) ↵ 2 A(X ) ^ ↵ 2 / A(@m). m ↵ @m F (↵, @[↵]m ) m F (↵, @m) X m F (↵, @[↵]m ) F (X, @[X]m) m EK F SGen F (↵, @[↵]m ) m ↵ F (↵, m ) F (↵, @[↵]m ) m1 m2 p m1 = {↵.B.X}kad m2 = {↵.Y.C}kad p↵q = {A, D} m = {↵.B.C}kad p ( EK {B, A, D} m X C m1 FM AX (↵, @[↵]m) = {A, D, C} m Y B m2 EK FM AX (↵, @[↵]m) m m 2 MGp X 2 Xm m p F C Wp,F ↵ 2 A(m ) 2 Wp,F (↵, m ) = u F (↵, @[↵]m0 0 ) m0 2MG p 9 0 2 .m0 0 =m Wp,F F EK SGen . MGp p MGp 45 MGp m fk 2 E C m {m0 2 MGp |9 0 2 .m0 0 = m } MGp F Wp,F Wp,F F ⌅ F EK SGen EK F SGen MGp = {{↵.B.X}kad , {↵.Y.C}kad , {A.Z}kbc }; m1 = {↵.B.C}kad V ar(MGp ) = {X, Y, Z}. Wp,FM EK (↵, m1 ) AX = EK 0 0 EK 0 0 u FM AX (↵, @[↵]m )= u FM AX (↵, @[↵]m ) m0 2MG p {{↵.B.X}k ad ,{↵.Y.C}k ad } 9 0 2 .m0 0 =m 0 ={X7 !C,Y 7 !B} 1 = {Wp,FMEK AX EK EK FM AX (↵, @[↵]{↵.B.X}kad [X 7 ! C])uFM AX (↵, @[↵]{↵.Y.C}kad [Y 7 ! B]) = EK EK FM AX (↵, {↵.B}kad ) u FM AX (↵, {↵.C}kad ) EK = FM AX } {B, A, D} [ {C, A, D} = {B, A, D, C} m 2 MGp Wp,F 8 2 , 8↵ 2 A(Mp ) F (↵, @[↵]m) w Wp,F (↵, m ) w u F (↵, @[↵]m0 0 ) m0 2MG p 9 0 =mgu(m0 ,m) 46 ⌅ Wp,F p 8R.r 2 RG (p), 8↵ 2 A(r+ ) u F (↵, @[↵]m0 0 ) w p↵q u F (↵, @[↵]R ) m0 2MG p 9 0 =mgu(m0 ,r + ) ⌅ N SL pAq = ? pBq = ? pNai q = {A, B} A B pNbi q = {A, B} A B pka 1 q = {A} pkb q = {B} (L, w, t, u, ?, >) = (2 , ✓, \, [, I, ;) I = {I, A, B, A1 , A2 ,B1 , B2 , ...} 1 I MGp = {{NA1 .A1 }kB1 , {B2 .NA2 }kA2 , {B3 .X1 }kA3 , {X2 }kB4 , {Y1 .A4 }kB5 , {B6 .Y2 }kA5 , {B7 .NB7 }kA6 , {NB8 }kB8 } X 1 , X2 , Y 1 Y2 N A 1 A1 k B 1 B 2 N A 2 k A 2 B 3 k A 3 k B 4 A4 k B 5 B 6 k A 5 B 7 N B7 k A6 N B 8 k B8 MGp = {{NA1 .A1 }kB1 , {B2 .NA2 }kA2 , {B3 .X1 }kA3 , {X2 }kB4 , {Y1 .A4 }kB5 , {B7 .NB7 }kA6 , {NB8 }kB8 } EK p = N SL F = FM AX Wp,F (↵, m ) = u F (↵, @[↵]m0 0 ) m0 2MG p 9 0 .m0 0 =m 0 Wp,F (↵, r+ ) = u F (↵, @[↵]m0 0 ) m0 2MG p 9 0 =mgu(m0 ,r + ) 47 A p A Si Sj j >i Si A {Nai .A}kb Sj {B.Nai }ka .{B.X}ka A.B.{X}kb 2 {B.Nai }ka .{B.X}ka Si : i Sj : {Na .A}kb A.B.{X}kb Si Nai rS+i = {Nai .A}kb 8Na .{m 2 Mp |9 0 = mgu(m0 , rS+i )} i 0 G = 8Nai .{m0 2 MGp |9 0 = mgu(m0 , {Nai .A}kb )} = {({NA1 .A1 }kB1 , 10 ), ({X2 }kB4 , 20 ), ({Y1 .A4 }kB5 , 0 3 )} 8 0 i < 1 = {NA1 7 ! Na , A1 7 ! A, kB1 7 ! kb } 0 i 2 = {X2 7 ! Na .A, kB4 7 ! kb } : 0 i 3 = {Y1 7 ! Na , A4 7 ! A, kB5 7 ! kb } 0 Wp,F (Nai , {Nai .A}kb ) ={ } F (Nai , @[Nai ]{NA1 .A1 }kB1 10 ) u F (Nai , @[Nai ]{X2 }kB4 20 )u F (Nai , @[Nai ]{Y1 .A4 }kB5 0 3) ={ } F (Nai , @[Nai ]{Nai .A}kb 10 ) u F (Nai , @[Nai ]{X2 }kb 20 )u F (Nai , @[Nai ]{Y1 .A}kb 30 ) ={ 43} F (Nai , {Nai .A}kb ) u F (X2 , @[X2 ]{X2 }kb )u F (Y1 , @[Y1 ]{Y1 .A}kb ) ={ 41} F (Nai , {Nai .A}kb ) u F (X2 , {X2 }kb ) u F (Y1 , {Y1 .A}kb ) EK ={ F = FM AX } {A, B} [ {B} [ {A, B} = {A, B} RS i = ; F (Nai , @[Nai ];) = F (Nai , ;) = > 0 Wp,F (Nai , {Nai .A}kb ) = {A, B} w pNai q u F (Nai , ;) = {A, B} Si Sj X rS+j = A.B.{X}kb 0 0 0 0 0 Wp,F (X, A.B.{X}kb ) = Wp,F (X, A)uWp,F (X, B)uWp,F (X, {X}kb ) = >u>uWp,F (X, {X}kb ) = 0 Wp,F (X, {X}kb ) 8X.{m0 2 MGp |9 0 = mgu(m0 , {X}kb )} = {({X2 }kB4 , 10 )} 0 1 = {X2 7 ! X, kB4 7 ! kb } 0 Wp,F (X, {X}kb ) ={ } 48 F (X, @[X]{X2 }kB4 10 ) ={ } F (X, @[X]{X2 }kb 10 ) ={ 43} F (X2 , @[X2 ]{X2 }kb ) ={ 41} F (X2 , {X2 }kb ) EK ={ F = FM AX } {B} RS j = {B.Nai }ka .{B.X}ka F (X, @[X]{B.Na }ka .{B.X}ka ) =F (X, @[X]{B.Nai }ka ) u F (X, @[X]{B.X}ka ) = i F (X, {B.Nai }ka ) u F (X, {B.X}ka ) = > u {A, B} = {A, B} 0 Wp,F (X, A.B.{X}kb ) = {B} w pXq u F (X, @[X]{B.Nai }ka .{B.X}ka ) = pXq [ {A, B} Sj A B i p B S0 {Y.A}kb {B.Y }ka .{B.Nbi }ka i {Y.A}kb S0 : {B.Y }ka .{B.Nbi }ka Nbi r+0 i = {B.Y }ka .{B.Nbi }ka S 0 0 0 Wp,F (Nbi , {B.Y }ka .{B.Nbi }ka ) =Wp,F (Nbi , {B.Y }ka ) u Wp,F (Nbi , {B.Nbi }ka ) = 0 i i 0 i i > u Wp,F (Nb , {B.Nb }ka ) = Wp,F (Nb , {B.Nb }ka ) 8Nbi .{m0 2 MGp |9 0 = mgu(m0 , {B.Nbi }ka )} = {({B3 .X1 }kA3 , 10 ), ({X2 }kB4 , 20 ), ({B7 .NB7 }kA6 , 30 )} 8 0 < 1 = {B3 7 ! B, X1 7 ! Nbi , kA3 7 ! ka } 0 2 = {X2 7 ! B.Nbi , kB4 7 ! ka } : 0 3 = {B7 7 ! B, NB7 7 ! Nbi , kA6 7 ! ka } 0 Wp,F (Nbi , {B.Nbi }ka ) ={ } F (Nbi , @[Nbi ]{B3 .X1 }kA3 10 ) u F (Nbi , @[Nbi ]{X2 }kB4 20 )uF (Nbi , @[Nbi ]{B7 .NB7 }kA6 0 3) ={ } F (Nb , @[Nb ]{B.X1 }ka 1 ) u F (Nb , @[Nbi ]{X2 }ka 20 )uF (Nbi , @[Nbi ]{B.Nbi }ka 30 ) i i 0 i ={ 43} F (X1 , @[X1 ]{B.X1 }ka ) u F (X2 , @[X2 ]{X2 }ka )uF (Nbi , @[Nbi ]{B.Nbi }ka ) ={ 41} F (X1 , {B.X1 }ka ) u F (X2 , {X2 }ka ) u F (Nbi , {B.Nbi }ka ) EK ={ F = FM AX } {A, B} [ {A} [ {A, B} = {A, B} 49 R i = {Y.A}kb S0 F (Nbi , @[Nbi ]{Y.A}kb ) = > Y r+0 i = {B.Y }ka .{B.Nbi }ka S 0 0 0 Wp,F (Y, {B.Y }ka .{B.Nbi }ka ) = Wp,F (Y, {B.Y }ka ) u Wp,F (Y, {B.Nbi }ka ) = 0 0 Wp,F (Y, {B.Y }ka ) u > = Wp,F (Y, {B.Y }ka ) 8Y.{m0 2 MGp |9 0 = mgu(m0 , {B.Y }ka )} = {({B3 .X1 }kA3 , 1 ), ({X2 }kB4 , 2 )} ⇢ 0 1 = {B3 7 ! B, X1 7 ! Y, kA3 7 ! ka } 0 2 = {X2 7 ! B.Y, kB4 7 ! ka } 0 Wp,F (Y, {B.Y }ka ) ={ } F (Y, @[Y ]{B3 .X1 }kA3 10 ) uF (Y, @[Y ]{X2 }kB4 20 ) ={ } F (Y, @[Y ]{B.X1 }ka 10 ) uF (Y, @[Y ]{X2 }ka 20 ) = ={ 43} F (X1 , @[X1 ]{B.X1 }ka ) uF (X2 , @[X2 ]{X2 }ka ) ={ 41} F (X1 , {B.X1 }ka ) u F (X2 , {X2 }ka ) EK ={ F = FM AX } {A, B} [ {A} = {A, B} R 0 i = {Y.A}kb S F (Y, @[Y ]{Y.A}kb ) = {A, B} 0 Wp,F (Nbi , {B.Y }ka .{B.Nbi }ka ) = {A, B} w pNbi q u F (Nbi , @[Nbi ]{Y.A}kb ) = {A, B} 0 Wp,F (Y, {B.Y }ka .{B.Nbi }ka ) = {A, B} w pY q u F (Y, @[Y ]{Y.A}kb ) = pY q u {A, B} i S0 B N SL 50 1 1 S(↵, {{E.↵}kAB }kCD ) = {E, kAB , kCD } 51 52 Formal Enforcement of Security Policies on Choreographed Services Mahjoub Langar and Karim Dahmani LIP2 Research Laboratory, Facult◆e des Sciences de Tunis, Tunisia Abstract. Web services are software systems that support distributed applications composed of independent processes which communicate by message passing. To realize the full potential of web services, we need to compose existent web services in order to get more functionalities. However the composition of web services should be secure. In this paper we propose an automatic formal approach to monitor the execution of a choreography of web services and we prove its correctness. We introduce the syntax and semantic rules of a new operator which takes as input a choreography and a security policy and produces as output a secure version of this choreography which behaves like the original one and does not violate the security policy. Keywords: Choreographed services, web service security, formal veri cation, runtime veri cation 1 Introduction With the explosive growth of the Internet, intranet and electronic commerce, the concept of web services has emerged in the world to be exploited by most cross-organizational boundaries. In fact, web services provide a suitable techni- cal foundation for making business processes accessible within and across en- terprises. Moreover web services are software systems that support distributed applications composed of independent processes which communicate by message passing. But for an appropriate exploitation of web services, we need to com- pose them. Indeed, individual web services often o↵er limited capabilities. In some situations, a client's request cannot be satis ed by a single web service. However, when composed with other web services, they can satisfy the client demand. To realize the full potential of web services, we need to compose exis- tent web services in order to get more functionalities. Web service composition rules deal with how di↵erent services are composed into a coherent global ser- vice. In particular, they specify the order in which services are invoked, and the conditions under which a certain service may or may not be invoked. Among the approaches investigated in service composition, we distinguish orchestra- tion and choreography. The orchestration composes available services and adds a central coordinator (the orchestrater) which is responsible for invoking and composing the single sub-activities. However the second one, referred to as web 53 service choreography, does not assume the exploitation of a central coordinator but it de nes complex tasks via the de nition of the conversation that should be undertaken by each participant. While several proposals exist for orchestration languages (e.g. Business Process Execution Language (BPEL)[1], Web Services Flow Language (WSFL)[2]), choreography languages are still in a preliminary stage of de nition. A proposal, Web Service Choreography Description Language (WS-CDL)[3], was issued by the World Wide Web Consortium (W3C) in De- cember 2004. The need of secure web service composition has led to a great interest from researchers of the last decade. However most of the works focus on how to guar- antee desired properties such as correctness, deadlock avoidance, etc. In our work, in addition to these properties, we introduce an automatic approach to monitor a choreography by specifying and enforcing security policies on web ser- vices and we prove its correctness using formal methods. Some works such as [4,5] have enforced security policies on concurrent systems using process alge- bra. Formal methods are mathematical techniques that are well-suited to address the aforementioned issues. A variety of proposals to formally describe, compose and verify web service compositions using the formal methods exists. In [6], a Petri-net based design and veri cation framework for web service composition is proposed. In [7], the authors introduce a Petri-net based algebra to compose web services based on control ows. In [8], authors use Milner's process algebra CCS to specify and compose web services as processes, and the Concurrency Workbench to validate properties such as correct web service composition. In [9], the authors provide an encoding of BPEL processes into web service timed state transition systems, a formalism that is closely related to timed automata and discuss a framework in which timed assumptions can be model checked. In this paper, we propose an enforcement of security policies over web ser- vices using formal methods. We present an automated approach for monitoring the behavior of a service in a choreography. More precisely, we de ne an operator ⌦ that takes as input a process P and a security policy P and generates a new process P 0 = P ⌦ P which respects the following conditions : ✏ P 0 js P ; i.e. P 0 satis es the security policy P : ✏ P 0 v P; i.e. behaviors of P 0 are also behaviors of P: ✏ 8Q : ((Q js P ) and (Q v P )) =) Q v P 0 , i.e. all good behaviors of P are also behaviors of P 0 : 54 P P ⌦ P0 = P ⌦ P The rest of the paper is structured as follows. In section 2, we brie y describe the global calculus. Section 3 presents the formalism used to specify security poli- cies. Section 4 illustrates the formalism used to specify choreographed services. In section 5, we present our security framework for monitoring a choreography. In section 6, we discuss some related works while conclusions are in section 7. 2 Global Calculus The global calculus [10] is distilled from WS-CDL. It describes the global ow of interactions between the participants. The syntax of the global calculus is given by the standard BNF. 2.1 Syntax of the Global Calculus I ::= A ! B : ch(⌫ s~):I (Init) j A ! B : shop; e; yi:I (Comm) j x@A := e:I (Assign) j if e@A then I1 else I2 (If T henElse) j I1 + I2 (Sum) j I1 jI2 (P ar) j (⌫s)I (N ew) j XA (recV ar) j rec X A :I (Rec) j0 (Inaction) The rst rule (Init) says that the participant A invokes a service ch located at B and initiates new freshly generated session channels s~: The second rule (Comm) expresses the sending action by A whose message consists of a selected operator op and an expression e to the participant B which will store the value of e at the variable y: (Assign) is a local construct which updates the variable x located at A with e: (P ar) and (Sum) are respectively the parallel composition and the non-deterministic choice of interactions. (If T henElse) is the standard conditional operation, while (recV ar) and (Rec) are used to express recursive behavior of interactions. 0 is the inactive interaction. 55 2.2 Semantics of the Global Calculus The formal semantics of the global calculus is de ned in terms of con gurations ( ; I): The notation ( ; I) ! ( 0 ; I 0 ) says that the global description I at a state (which is the collection of all local states of the participants) will reduce to I 0 at a new state 0 : Samples of reduction rules are presented here while the overall operational semantic is detailed in [10]. (Init) ( ; A ! B : ch(⌫ s~):I) ! ( ; (⌫ s~)I) ` e@A + v (Comm) ( ; A ! B : shop; e; xi:I) ! ( [x@B 7! v]; I) ` e@A + v ( ; I1 ) ! ( 0 ; I10 ) (Assign) (P ar) ( ; x@A := e:I) ! ( [x@A 7! v]; I) ( ; I1 jI2 ) ! ( 0 ; I10 jI2 ) ( ; I1 ) ! ( 0 ; I10 ) ( ; I[rec X A :I=X A ]) ! ( 0 ; I 0 ) (Sum) 0 0 (Rec) ( ; I 1 + I 2 ) ! ( ; I1 ) ( ; rec X A :I) ! ( 0 ; I 0 ) The rst rule is for initiation : A invokes a service located at B and initiates session channels s~ which will be shared locally by A and B: The second rule is for communication. The notation ` e + v is an evaluation judgment : evaluates the expression e to the value v: [x@B 7! v] is the resulting state of updating the local variable x at B by v: The evaluation judgment is also used in the assignment rule. The fourth rule is for parallelism. The (Sum) rule describes the behavior of a non-deterministic choice. The (Rec) rule says that if the unfolding of rec X A :I under reduces to I 0 then rec X A :I under will reach ( 0 ; I 0 ): Example 1. This example shows a communication between a buyer and a seller. These participants share new freshly generated session channels B2Sch and S2Bch: Through S2Bch; the seller o↵ers a quote to the buyer. Through B2Sch; the buyer selects one of the two options o↵ered by the seller, QuoteAccept and QuoteReject. If the rst option is selected, the buyer sends the quote "100" which will be stored in x by Seller and continues with the interaction I1 : In the other case, the seller sends the abort number stored in the variable xAbortN o which will be stored in y by the Seller and terminates. Buyer ! Seller : ch(⌫B2Sch; S2Bch): Seller ! Buyer : S2Bchhquote; 100; yi: if y@Buyer  1000 then f Buyer ! Seller : B2SchhquoteAccept; 100; xi:I1 g else f Buyer ! Seller : B2SchhquoteReject; xabortN o ; xi:0 g 3 Security Policy Speci cation The language that we will use for the speci cation of security policies is EP C which is a subset of EP C (End-point calculus). The End-Point Calculus pre- cisely identi es a local behavior of each participant in a web service. End-Point 56 Calculus is inspired from the ⇡-calculus. It is an applied variant of ⇡-calculus augmented with the notion of participants and their local states. In [10], au- thors have established a projection from the global calculus into the end-point calculus. So descriptions of participants behaviors in end-point calculus are not extracted directly from the choreography, but projected from the global calculus. Details of this theory of end-point projection are presented in [10]. Since in the proposed enforcement approach we transform the security policy to a monitor, we reach immediately this goal by choosing EP C since it is a subset of EP C: 3.1 Syntax of EP C We show the syntax of EP C ; where s denote session channels, e an expression which can be an atomic value (such as a natural number or a boolean value) or a function (such as arithmetic functions and boolean operators) or a variable. op1 ; op2 ; ::: range over operations. x is a variable. The rst construct is a receiving action, it is the invocation of one of the operators opi and reception of an expres- sion which will be evaluated and stored in xi : The second construct is a sending action, it is the invocation of operator op with expression e: Furthermore, the operator " " represents the alternative composition. Finally, recursion is used for representing unbounded repetition. For representing recursive behaviors, we use term variables X; Y; : : : The operator used for recursion is rec X:P: Each occurrence of X in P denotes a recurring point. rec X:P behaves as P until an occurrence of X is found in the execution of P; then it will return to rec X:P: s B ⌃ opi (xi ):Pi ; s C ophei:P; P1 P2 ; X; rec X:P; 0 i 3.2 Semantics of EP C Note that processes do not evolve alone. A process is located in a participant which synchronize with another one to evolve. A participant A with its behavior P at a local state is called a network and denoted by A[P ] : Syntax of networks is given by the following grammar : N ::= A[P ] (Participant) j N jM (Parallel-NW) j (⌫s)N (Res-NW) j ✏ (Inaction-NW) where A[P ] is a participant with its behavior as shown in the rst construct. Two communicating participants are represented by two networks combined by parallel composition. When two participants initiate a session channel for com- munication, this session channel must be restricted to these two participants, this is given by (Res-NW): ✏ denote the lack of networks. Reduction rules of networks are given by : M ⌘ M0 M0 ! N0 N0 ⌘ N Struct-NW M !N 57 M ! M0 M ! M0 Par-NW Res-NW M jN ! M 0 jN (⌫s)M ! (⌫s)M 0 Semantics of EP C is then given by : 2 `e+v A[s C opj hei:P ] 1 jB[s B ⌃ opi (xi ):Qi ] 2 ! A[P ] 1 jB[Qj ] 2 [xj 7!v] i A[P1 ] ! A[P10 ] 0 A[P [rec X:P=X]] ! A[P 0 ] 0 A[P1 P2 ] ! A[P10 ] 0 A[rec X:P ] ! A[P 0 ] 0 Security policies are usually speci ed in logic-based languages. Such languages employ mainly three operators to compose properties : and, or and not: The or operator is given here by : The not operator is de ned here as follows : assume we have two participants A and B communicating and we want to apply the property "A should not send op1 to participant B", then we write it using EP C as follows : P = rec X:(sB C opi hei i:X sB B ⌃ opi (xi ):X) i6=1 i Example 2. Assume we have a client wanting to check its account details within a bank. The bank should not send him back his account details if he is not yet authenticated. A client is said to be authenticated if he has received from the bank an acceptance for his authentication's attempt using the operation accept: The bank answers the client for his account request through the operation resAccount: The security property will then be written as follows : P = rec X:(s B ⌃ opi (xi ):X i sC opi hei:X opi 2faccept;resAccountg = s C accept:rec Y:(s C opi hei i:Y s B ⌃ opi (xi ):Y )) i i The security property is expressed using the recursion operator. In the recursion block, we have 3 behaviors combined with the internal choice which we denote by P 1 ; P 2 and P 3 where P 1 = s B ⌃ opi (xi ):X i P 2 =sC opi hei:X opi 2faccept;resAccountg = P 3 = s C accept:rec Y:(s C opi hei i:Y s B ⌃ opi (xi ):Y ) i i The block P 1 expresses the fact that the property allows all receiving actions. The block P 2 inhibits sending accept or resAccount to the client. The block P 3 intercepts the accept sending action and then no more restrictions are imposed since the authentication of the client is accepted. So, an intruder who wants to check an account's details without authentication cannot achieve its goal since the resAccount is only permitted in the block following the accept sending action. 58 4 Choreography Speci cation The technique used in this paper for describing the composition of web services is the choreography. The WS-CDL is based on two engineering principles : Service Channel Principle corresponds to the repeated availability of service channels. Session Principle is a basic principle in many communication-centred programs which says a sequence of conversations belonging to a protocol should not be confused with other concurrent runs of this or other protocols by the participants. As a speci cation of the choreography description language WS-CDL, we introduce a modi ed version of the end-point calculus which is a formalism presented in [10]. The end-point calculus describes the behavior of a participant in a choreography from its end-point view. 4.1 Secured End-Point Calculus The secured end-point calculus EP CS is a variant of EP C (End-Point Calculus)[10]. EP CS has the particularity of explicitly handling the monitoring concept through its operator @P : For instance, the process @P (P ) can execute only actions that could be executed by both the controller P and the process P: We describe hereafter the formal syntax and dynamic semantics of the secured end-point calculus. Syntax of EP CS Since EP C is a subset of EP C and EP CS is an extension of EP C; a part of the syntax of EP CS is presented in the last section. So in this section we will present the overall syntax of EP CS but we will describe only constructs that have not been described before. P ::= !ch(~ s):P j ch(⌫ s~):Q j s B ⌃ opi (xi ):Pi i j s C ophei:Q j x := e:P j if e then P else Q j P Q j P jQ j (⌫s)P j X j rec X:P j 0 j @P (P ) 59 The two rst constructs represent session initiation. !ch(~ s):P is used for input and ch(⌫ s~):Q for output. !ch(~ s) says that the service channel ch; which is avail- able to public, is ready to receive an unbounded number of invocations, o↵ering a communication via its freshly generated session channels s 2 s~: ch(⌫ s~) is an invocation of a service located at the service channel ch and an initiation of a communication session which will occur through session channels s~: After a ses- sion has been initiated between two participants and freshly generated session channels have been shared between them, they can communicate using the com- munication constructs. The assignment operator is x := e:P . if e then P else Q is a choice based on the evaluation of the expression e: The parallel composi- tion is given by P jQ: The restriction construct (⌫s)P indicates that the session channel s is local to P: 0 denotes the lack of actions. Finally, @P (P ) is the en- forcement operator which controls the execution of P by allowing it to evolve if P is satis ed. Semantics of EP CS Initiation/Communication simulation Let P be a process stipulating the local behavior of a participant A in a web service. We observe in the end-point cal- culus reduction rules that processes always evolve without an external factor unless in initiation and communication. In fact, a participant initiating a com- munication or communicating with an another participant needs to synchronize with him to realize the interaction . Therefore, for enforcing security policies on a participant's behavior, we need to have a totally local description of its interactions. So we need to de ne a simulation relation for the initiation and the communication of processes so that we can simulate the evolution of a process locally, i.e. without synchronizing with an another participant . We will de ne the normal form of a process then introduce the simulation relation. De nition 1 (Normal form of a process). Every process representing the local behavior of a participant in a web service can be written as an internal sum of processes, which we call the normal form of a process : 8P 2 P; P = a i Pi i where P denotes the set of processes, ai range over atomic actions and Pi range over processes in P: De nition 2 (Simulation Relation). We de ne a simulation relation over a networks, denoted by A[P ] A[P 0 ] 0 ; which says that process P in A at the state is able to execute the action a and reduce to P 0 with a new local state 0 : The simulation relation is de ned following this rule P = a i Pi 9i 2 f1; : : : ; ng : a = ai i a A[P ] A[Pi ] This rule says that when P is written in its normal form and one of the constituting processes is able to do an action a then A[P ] is also able to do it. 60 Semantics of EP CS : Init s):P j P 0 ] A j B[ch(⌫ s~):Q j Q0 ] B ! (⌫ s~)(A[!ch(~ A[!ch(~ s):P jP jP 0 ] A j B[QjQ0 ] B ) A `e+v Assign A[x := e:P jP 0 ] A ! A[P jP 0 ] A [x7!v] A ` e + tt IfTrue A[if e then P1 else P2 jP 0 ] A ! A[P1 jP 0 ] A A ` e + ff IfFalse A[if e then P1 else P2 jP 0 ] A ! A[P2 jP 0 ] A A[P1 jR] A ! A[P10 jR] A0 A[P ] A ! A[P 0 ] A0 Par Res A[P1 jP2 jR] A ! A[P10 jP2 jR] A0 A[(⌫s)P ] A ! A[(⌫s)P 0 ] A0 Init-In-Sec s):P )] A jB[ch(⌫ s~):QjR] B ! (⌫ s~)(A[@P (P )j@P (!ch(~ A[@P (!ch(~ s):P )] A jB[QjR]) Init-Out-Sec A[@P (ch(⌫ s~):P )] A jB[!ch(~ s):QjR] B ! (⌫ s~)(A[@P (P )] A jB[!ch(~ s):QjQjR] B ) sBop(x) sBop(x) A[P ] A A[P 0 ] A A[P ] A A[P 0 ] A A `e+v Comm-In-Sec A[@P (P )] A jB[s C ophei:QjR] B ! A[@P (P )] A [x7!v] jB[QjR] B 0 0 Comm-Out-Sec sCopj hei sCopj hei A[P ] A A[P 0 ] A A[P ] A A[P 0 ] A B `e+v A[@P (P )] A jB[s B ⌃ opi (xi ):Qi jR] B ! A[@P 0 (P 0 )] A jB[Qj jR] B [xj 7!v] i A `e+v Assign-Sec A[@P (x := e:P )] A ! A[@P (P )] A [x7!v] A[@P (P1 )] ! A[@P 0 (P10 )] 0 Par-Sec A[@P (P1 jP2 )] ! A[@P 0 (P10 jP2 )] 0 ` e + tt IfTrue-Sec A[@P (P jif e then P1 else P2 )] ! A[@P (P jP1 )] ` e + ff IfFalse-Sec A[@P (P jif e then P1 else P2 )] ! A[@P (P jP2 )] 61 A[@P (P )] ! A[@P 0 (P 0 )] 0 Res-Sec A[(⌫ s~)(@P (P ))] ! A[(⌫ s~)(@P 0 (P 0 )] 0 ) The Init-rule shows how two participants initiate a session by sharing new freshly generated session channels s~: These session channels are restricted to par- ticipants A and B by the binding operator (⌫): Assignment is a local construct. Assign-rule evaluates an expression e and assigns the result of this evaluation to the variable x in A; then A behaves as P: The Res-rule restricts the use of session channels s~ to the process P in A: Init-In-Sec and Init-Out-Sec are the rules for communication initiation. We do not control session initiations but we control communication messages between the participants. Communication rules say if P and P are able to send or receive through the same session channel the same operation and become respectively P 0 and P 0 then @P (P ) do this ac- tion and becomes @P 0 (P 0 ): Secured assignment rule says that assignment is not considered by enforcement. The secured parallel composition rule says that the security operator applied to P1 jP2 can evolve into (@P 0 (P10 jP2 )) if P1 can evolve simultaneously with the policy security P into respectively P10 and P 0 : For the restriction rule, binding session channels does not a↵ect the enforcement. Finally, the conditional rules say that enforcement is not a↵ected by conditionals. 5 Choreography Monitoring The goal of this research is to enforce security policies over a choreography. Some of the important features of the enforcement operator is that it allows us to enforce only concerned participants by the security policies. In this in-lined monitoring framework, we do not modify the original behaviors of participants in a choreography. But if the security policy is not veri ed the evolution of the choreography will stop. In this section, we prove the correctness of our theory by de ning rst some notions such as the partial order over processes and satis- faction notions. De nition 3 (Partial Order over Processes). Let A[P1 ] ; A[P2 ] be two networks. We say A[P1 ] v A[P2 ] if the following condition hold : a a A[P1 ] A[P10 ] =) A[P2 ] A[P20 ] and A[P10 ] v A[P20 ] : De nition 4 (Satisfaction Notions). Let P be a security policy and ⇠ a trace. Symbols ✏ and js are de ned as follows : { We say that ⇠ satis es P ; denoted by ⇠ ✏ P ; if ⇠ 2 JP K where JP K denotes the set of traces of P : { We say that ⇠ could satisfy P ; denoted by ⇠ js P ; if it exists a trace ⇠ 0 such that ⇠:⇠ 0 ✏ P : Theorem 1. Let P be a process and P a policy security. The following prop- erties hold : 62 1. @P (P ) js P ; 2. A[@P (P )] v A[P ] ; 3. 8P 0 : ((P 0 js P ) and (A[P 0 ] v A[P ] )) =) A[P 0 ] v A[@P (P )] : Proof. 1. The process @P (P ) is de ned so that it can evolve into another pro- cess only if the security policy P is satis ed. 2. As well as the rst property, the proof is obtained directly from the reduction rules of the security operator and the de nition of the partial order. 3. Consider a process P 0 2 P such that P 0 js P and A[P 0 ] v A[P ] : Suppose a A[P 0 ] A[P10 ] 0 : Since A[P 0 ] v A[P ] 0 ; we conclude directly from the a de nition of v that A[P ] A[P1 ] 0 : Since P 0 js P ; we can also conclude a from the de nition of js; that a js P : Finally, as A[P ] A[P1 ] 0 and a a js P so A[@P (P )] A[@P 0 (P1 )] 0 and then A[P 0 ] v A[@P (P )] : Example 3 (Buyer-Seller Protocol). The buyer initiates a communication with the seller and requests for a quote. The seller sends back the quote. If the buyer rejects it then the protocol terminates. Otherwise, the seller sends to the buyer an order con rmation and the buyer con rms his command. In this case, the seller contacts the shipper and asks for delivery details which he transfer to the buyer and the protocol terminates. The security property that we want to ap- ply in this protocol is : "the seller should communicate with the shipper only if the buyer con rms his command". As a consequence, the seller won't send to buyer the delivery details if he has not con rmed his command. The protocol is depicted in Fig.1. Critical actions that we have to supervise are in dark grey in Fig.1. The choreography's description in global calculus is the following : Buyer ! Seller : B2SCh(s): Seller ! Buyer : shackSessioni: Buyer ! Seller : shreqQuotei: Seller ! Buyer : shrespQuote; vquote ; xquote i: if resonable(xquote )@Buyer then Buyer ! Seller : shquoteAccepti: Seller ! Buyer : shorderConf irmi: Buyer ! Seller : shackConf irmi: Seller ! Shipper : B2SHCh(s0 ): Shipper ! Seller : s0 hackSessioni: Seller ! Shipper : s0 hreqDelivDeti: Shipper ! Seller : s0 hdelivDet; vdelivDet ; xdelivDet i: Seller ! Buyer : shdelivDet; xdelivDet ; xdelivDet i:0 else Buyer ! Seller : shquoteRejecti:0 The end-point projection gives the end-point behaviors. The buyer's behavior is the following : Buyer[B2SCh(⌫s):s B ackSession:s C reqQuote:s B respQuote(xquote ) 63 ackSession reqQuote respQuote quoteAcc Buyer Seller quoteRej orderCon rm ackCon rm delivDet n sio es kS et ac D iv el qD et re D liv de Shipper Fig. 1. Buyer Seller Protocol :if reasonable(vquote ) then s C quoteAcc:s B orderConf irm:s C ackConf irm :s B delivDet(xdelivDet ):0 else s C quoteRej:0] The seller's behavior is the following : Seller[B2SCh(s):s C ackSession:s B reqQuote:s C respQuotehequote i: (s B quoteAcc:s C orderConf irm:s B ackConf irm: B2SHCh(⌫s0 ):s0 B ackSession:s0 C reqDelivDethebuyer i:s0 B delivDet(xdelivDet ) :s C delivDet(xdelivDet ):0 ) + (s B quoteRej:0)] The shipper's behavior is the following : Shipper[B2SHCh(s0 ):s0 C ackSession:s0 B reqDelivDet(xbuyer ) :s0 C delivDethedelivDet i:0] 64 The security property has the following behavior : P = rec X:( s C opi hei:X op6=delivDet sB ⌃ opi (xi ):X op6=ackConf irm s B ackConf irm:rec Y:(s0 C opi hei:Y s0 B ⌃ opi (xi ):Y i i sC opi hei:Y ) i ) The security property P is written using the recursion operator. In the recursion block, there are 3 processes combined with the internal choice. The rst process is : sC opi hei:X which says that, through the session channel s (shared op6=delivDet between the buyer and the seller), the buyer can execute any sending action un- less delivery details which should be done after the seller has received a con r- mation from the buyer. The second process is : sB ⌃ opi (xi ):X which op6=ackConf irm says that the seller can receive any operation other than ackConf irm. This ac- tion is intercepted in the third process s B ackConf irm:rec Y:(s0 C opi hei:Y i s0 B ⌃ opi (xi ):Y s C opi hei:Y ): In this process, after the seller has received a i i con rmation from the buyer, he can communicate with the shipper through the session channel s0 without any restriction and he has also no restriction on his sending actions to the buyer, so he can send him the delivery details. 6 Related Works Web services veri cation have been a subject of interest of several research ef- forts. Some of the relevant contributions in this domain are cited in this section. Most of formal approaches introduced a monitor which does not stop the pro- gram when a violation is detected. Moreover, these contributions implement a monitor as a web service in addition to other web services. The originality of our work is the introduction of the monitor within concerned participants processes. In [11], a run-time event-based approach to deal with the problem of monitoring conformance of interaction sequences is presented. When a violation is detected, the program shows errors in dashboards. In [12], authors introduce an approach to verify the conformance of a web service implementation against a behavioral speci cation, through the application of testing. The Stream X-machines are used as an intuitive modeling formalism for constructing the behavioral speci - cation of a stateful web service and a method for deriving test cases from that speci cation in an automated way. The test generation method produces com- plete sets of test cases that, under certain assumptions, are guaranteed to reveal all non-conformance faults in a service implementation under test. However, this approach only returns non-conformance faults and does not react dynamically against these errors. While, in [13], authors propose a monitoring framework of a choreographed service which supports the early detection of faults and decide 65 whether it is still possible to continue the service. Authors in [?] have proposed service automata as a framework for enforcing security policies in distributed systems. They encapsulate the program in a service automaton composed of the monitored program, an interceptor, an enforcer, a coordinator and a local policy. The interceptor intercepts critical actions and passes them to the coor- dinator that determines whether the action complies the security policy or not and decides upon possible countermeasures then the enforcer implements these decisions. However the authors do not precise how to detect critical actions. 7 Conclusion In this paper, we have introduced a formal approach allowing to automatically enforce a security policy on choreographed services. Indeed, we introduced a new calculus with an enforcement operator @P : The semantics of the proposed calculus insure that choreographed services can evolve only if it does not violate the enforced security policy. The originality of our work consists on the fact that we do not add a new web service as a monitor but rather we wrap the security policy inside the choreographed services. Future work will focus on the de nition of a complete mapping between WS- CDL and global calculus. Moreover, we will seek means to optimize the enforced choreographed services so that we reduce as much as we can the overhead due to the enforcement operator. References 1. Corporation, I.: Business process execution language for web services bpel-4ws. http://www.ibm.com/developerworks/library/ws-bpel/ (2002) 2. F.Leymann: Web services ow language (ws ) version 1.0. Technical Report, International Business Machines Corporation (IBM) (May 2001) 3. Kavantzas, N., Burdett, D., Ritzinger, G., Fletcher, T., Lafon, Y.: Web services choreography description language version 1.0. W3C Working Draft (December 2004) 4. Langar, M., Mejri, M., Adi, K.: Formal enforcement of security policies on con- current systems. J. Symb. Comput. 46(9) (2011) 997{1016 5. Khoury, R., Tawbi, N.: Corrective enforcement: A new paradigm of security policy enforcement by monitors. ACM Trans. Inf. Syst. Secur. 15(2) (2012) 10 6. Yi, X., Kochut, K.: A cp-nets-based design and veri cation framework for web services composition. In: ICWS. (2004) 756{760 7. Hamadi, R., Benatallah, B.: A petri net-based model for web service composition. In: ADC. (2003) 191{200 8. Sala un, G., Bordeaux, L., Schaerf, M.: Describing and reasoning on web services using process algebra. In: ICWS. (2004) 43{ 9. Kazhamiakin, R., Pandya, P.K., Pistore, M.: Timed modelling and analysis in web service compositions. In: ARES. (2006) 840{846 10. Carbone, M., Honda, K., Yoshida, N.: Structured communication-centred pro- gramming for web services. In: ESOP. (2007) 2{17 66 11. Baouab, A., Perrin, O., Godart, C.: An optimized derivation of event queries to monitor choreography violations. In: ICSOC. (2012) 222{236 12. Dranidis, D., Ramollari, E., Kourtesis, D.: Run-time veri cation of behavioural conformance for conversational web services. In: ECOWS. (2009) 139{147 13. Ardissono, L., Furnari, R., Goy, A., Petrone, G., Segnan, M.: Monitoring chore- ographed services. In: Innovations and Advanced Techniques in Computer and Information Sciences and Engineering. (2007) 283{288 14. Fu, X., Bultan, T., Su, J.: Analysis of interacting bpel web services. In: WWW. (2004) 621{630 67 A Timestamping Scheme with Eternal Security in the Bounded Storage Model Assia Ben Shil1 Faculty of Sciences of Bizerte University of Carthage Email: essia.benshil@gmail.com and Kaouther Blibech Sinaoui2 ISSAT of Mateur University of Carthage Email: kaouther.blibech@gmail.com 1,2 LIP2 Laboratory, Tunis, Tunisia Abstract. Digital timestamping is a cryptographic technique allowing affixing a reliable date to a digital document. The security of most ex- isting timestamping systems is based on the security of the used cryp- tographic techniques as hash functions. However, a hash function has a limited lifetime. In this context, we provide a non-interactive timestamp- ing scheme in the bounded storage model (BSM) whose security is not related to the life of any cryptographic technique. We prove, in fact, that our timestamping scheme is eternally secure even against an adversary with unlimited computing power. Keywords: Timestamping, bounded storage model, computing power, eternal security. 1 Introduction In the last years, digital documents are beginning to replace paper documents in several areas. For this reason, it is important to locate a digital document in time to prove its existence and integrity since a given date. This task is achieved through a timestamping system that operates in two phases: a timestamping phase allowing one or more Timestamping Authority (TSA) to affix a reliable date to a document and generate the associated timestamp and a verification phase allowing any potential verifier to verify the correctness of the produced timestamp. The security of most existing timestamping systems [2-10] is based on the security of the used cryptographic techniques as hash functions. These tech- niques are secure under the assumption saying that the users’ computing power 68 is limited. However, nowadays, the computing power of computers grows expo- nentially. Therefore, the used hash functions are not secure all the time [17]. So, the existing timestamping systems cannot be considered secure forever. To propose timestamping systems producing timestamps with eternal validity, we are placed in the Bounded Storage Model (BSM) [13, 14]. In this model, Maurer assumes that users’ storage capacity is limited, while their computing power can be unlimited. In this model, the ciphers are eternally secure [1]. The principle is that a very long random string is transmitted in every round t (the interval be- tween the time t and the time t+1). At the time of the transmission of this string, no participant has sufficient storage capacity to store it fully.Even if later, his storage capacity becomes sufficient to fully store the string transmitted in t, the user has already lost he access to this string. Thus, the performed encryptions are valid forever. Moran proposed in [15] a non-interactive timestamping scheme in the bounded storage model. Indeed, in the scheme of Moran, each stamper or document owner can timestamp his document locally without communicat- ing with any third party [15], unlike conventional timestamping systems that involve at least one TSA. Thus, the non-interactive timestamping ensures to- tal confidentiality and hides even the fact that a timestamping occurred. These characteristics have made the proposition of Moran very interesting, but the timestamping system he proposed su↵ers from a lack of precision and practical details. In this context, we present a non-interactive timestamping scheme in the BSM. This paper is organized as follows: After introducing the BSM in section 2, we present firstly Moran’s timestamping system, and then our non-interactive timestamping scheme in the BSM in section 3. Then, we detail more formally our timestamping scheme. Finally, we formally prove the eternal security of our timestamping solution. 2 The Bounded Storage Model Generally, in cryptography, the proposed systems and the used functions are secure under the assumption that there is a limit of the computational power of any user or adversary. In the bounded storage model, the proposed systems must be secure even against an adversary with an unlimited computational power. The BSM was proposed by Maurer in 1992 [13] for the development of encryption keys. It aims to generate, from a short secret key K, a key with a large size X [14] that can be used as encryption key. The system operates as follows: In this model, it is assumed that the storage capacity is unlimited, no assumptions about the computing power was made. Let s be the assumed limit on a user’s storage capacity. Ciphers in the BSM use a very long string R called randomizer. The latter may for example be a random sequence of bits transmitted by a satellite. r If R is a random string of r bits, the space of R is {0, 1} . Notice that r s is required to ensure that no user can fully store R. Having a secret key K of size k k r k in the space {0, 1} , we can use a known function f : {0, 1} ⇥{0, 1} , x ! {0, 1} to generate the derived key X = f (R, K) of size x bits. The function f must use only a small part of R so that we do not need to fully read R. Maurer’s system 69 has been the subject of intensive studies. Indeed, many key generation systems have been proposed in the BSM [11]. The BSM was used for timestamping by Moran in [15]. In [3], we proposed an improvement of Moran’s timestamping system. In the following section, we present the timestamping system of Moran and then our proposition is presented. 3 Timestamping Solutions in the BSM In the BSM, we assume that a long random string R of r bits is transmitted during the round t (between t and t + 1). If s is the maximum storage capacity r of any entity in t, then s ⌧ r. Notice the space of R is {0, 1} where r is the size of the string R. Similarly, we consider that a document D of size d has a d value in {0, 1} . In the timestamping scheme proposed by Moran, to timestamp a document D, its content is used to select a few blocks from R whose values will be inserted in the timestamp T . Any verifier must save randomly some blocks of R (using a function named Sketch(R)) in order to verify, later, the validity of any timestamp made during the round t. Verifying the validity of a timestamp associated to a document D is performed at a later date by a verifier who has simply to verify that there are no conflicts between his sketch and the timestamp of D. However, in this solution, a timestamp includes some additional values of R. If the verifier cannot store during the round t more than s bits of R, he may at the verification time store s0 with s0 > s. Each verification of a timestamp generated in the round t can lead him to discover new blocks of R. After a number of verification processes, he can reconstruct partially, if not entirely R and backdate any document. To remedy this problem, we propose a timestamping protocol allowing veri- fying the timestamp of a document D without learning additional values of R. To this aim, instead of inserting the blocks of R in the timestamp, these blocks will be the secret of the stamper. The verifier must then prove that he has the value of a given block in his sketch and the stamper has to prove that he has used this value to create the timestamp. In the verification process of this scheme, a verifier may accept or reject a timestamp without discovering any additional information about the string R transmitted during the round t. The idea is to timestamp the document D locally using the secret sharing scheme of Shamir [18] in order to divide D into n shares Di . Assuming that the blocks of R are indexed by a number beginning from 1 till the number of blocks, the values RDi , indexed by the shares Di for 1  i  n are recovered. The polynomial that passes through the points Pi (RDi , Di ) for 1  i  n represents the timestamp of D. Note Share(D) the set of shares Di . Any user of the system saves a random subset of R named Sketch(R) formed by a number of couples of values (index of block, value of the block). To verify the validity of the timestamp T of D for a random string R the verifier who stored Sketch(R) proceeds as follows: For each index in both Share(D) and Sketch(R) he recovers the associated block Ri of R, computes its associated index by the polynomial given in the timestamp and verifies that the associated value in Sketch(R) is Ri . In the following sections, 70 we provide a formal representation of our timestamping system and we prove its eternal security. To this aim, we will show that once the string R is transmitted, the probability of forging a fake timestamp for a document D is close to zero. 4 Definitions and Notations In this section, we introduce some notations that we will use later in this paper. - Randomizer: We call ”randomizer” and we denote R the random string transmitted during a round t. This string is divided into n blocks of size b bits indexed from 1 to n and denoted R1 , . . . , Rn . Knowing that the length of R is r bits and the size of a block is b bits, r = n.b. For each subset S ✓ I(R) where I(R) is the set of indexes of blocks Ri (1  i  n), we denote R|S the set of blocks of R 8 i 2 S. - Hamming Distance: A vector being a set of blocks, we define the Ham- ming Distance between two vectors c1 and c2 denoted DH as the number of blocks for which the two vectors di↵er. - Threshold secret sharing: The threshold secret sharing is a technique for dividing a secret S into l shares such that the coalition of at least shares is necessary to reconstruct the secret while the coalition of at most 1 shares do not reveal even partially the secret ( < l). A -out of-l threshold secret sharing scheme is denoted SSS( , l). - Shamir secret sharing scheme: The secret sharing scheme based on Shamir’s polynomial interpolation scheme is a SSS( , l). We denote it SSSS ( , l). The principle is to fix a polynomial P of degree 1 and X a set of values (X = X1 , . . . , Xl ). The secret sharing function denoted Share takes as input the secret S and generates the set of shares Share(S) = (S1 , . . . , Sl ) such that 8i, 1  i  l, Si = P (Xi ). The Reconstruction function denoted Share 1 take as input a subset of X denoted XS = [XS1 , . . . , XS ] such that |XS | = and the associated shares: Share 1 (XS , SXS1 , . . . , SXS ) = P , with P a polynomial such that 8 Xi 2 XS and Si 2 [SXS1 , . . . , SXS ], P (Xi ) = Si . The secret S is computed as follows: S = P (0). 5 Presentation of our Timestamping Scheme 5.1 Timestamping Phase A stamper is represented by the two following functions: - Store(D, R) that uses Shamir’s secret sharing process to compute Share(D) for a given document D. Then, It computes the vector R|Share(D) and stores it. More formally, Store(D, R) consists in computing Share(D) = (D1 , . . . ,Dl ), where Di is the ith indexspecif iedbyD.T othevector(RD1 , . . . , RDl ) is as- sociated the vector Share(D) where RDi is the block of R indexed by Di . We denote this vector R|Share(D) , where the notation R|I means the values of blocks of R indexed by I1 , . . . , In , with I = I1 , . . . , In . 71 Definition 1. Store(D, R) = R|Share(D) . - Stamp(D, Store(D, R)) uses Shamir’s reconstruction process to find the unique polynomial passing through the points Pi (x, y) where x is a block of the vector R|Share(D) and y the associated blocks in Share(D). This polynomial is the timestamp T . Definition 2. T = Share 1 (R|Share(D) , Share(D)). 5.2 Verification Phase A verifier is represented by the two following functions: - Sketch(R) allows choosing, randomly, a set of indexes denoted H with H ⇢ I(R), computing R|H and storing this vector. Definition 3. Sketch(R) = (H, R|H ). - V erif y(Sketch(R), D, T ) allows verifying that there are no conflicts between Sketch(R) and T . Definition 4. V erif y(Sketch(R), D, T ) allows verifying the following equality : T (R|H\Share(D) ) = H \ Share(D) In other words: Definition 5. V erif y(Sketch(R), D, T ) allows verifying that DH(T (R|H \ Share(D)), H \ Share(D)) = 0 for a timestamp T . If this equality is veri- fied, the timestamp T is accepted by the verifier and is said ”valid”. 5.3 The behavior of an adversary An adversary consists in the two following functions: - Store⇤ (R) which saves a subset of R called C. The di↵erence be- tween Sketch(R) and Store⇤ (R) is that Sketch(R) is computed ”online” while Store⇤ (R) function may not be. - Stamp⇤ (D, C) that given a document D and a string C tries to produce a timestamp T ⇤ of D. Definition 6. Stamp⇤ (D, C) = R⇤ |Share(D) . Definition 7. T ⇤ = Share 1 (R⇤ |Share(D) , Share(D)). Where R⇤ |Share(D) is the vector of blocks associated to Share(D) according to T ⇤ . If an adversary A produces for a document D and a randomizer R, a times- tamp T ⇤ that is equal to the timestamp T produced by Stamp(D, Store(D, R)), we say that he backdates ”correctly” the document. More formally: Definition 8. An adversary backdates a document D with a success probability for a given randomizer R if : P r[V erif y(Sketch(R), D, Stamp⇤ (D, Store⇤ (R)))] 72 Definition 9. An adversary backdates correctly a document D for a given randomizer R if DH(V1 , V2 ) = 0 with V1 = R⇤ |Share(D) and V2 = R|Share(D) . Definition 10. An adversary backdates correctly a document D for a given randomizer R with at most err errors, if DH(V1 , V2 )  err, with V1 = R⇤ |Share(D) and V2 = R|Share(D) . 6 Security Proofs of our Timestamping Scheme Given the following parameters: - s : the storage capacity of the most powerful adversary. - r : the size of the random string R transmitted during a round t. - b: the size of a block of the random string R transmitted during a round t. - l: the number of indexes specified by a given document. - n: the number of blocks of the random string R transmitted during a round t. We assume that: (1) r s : The size of the random string R transmitted during a round t is greater than the storage capacity of the most powerful user of the system. (2) 1 1 such that n = u. |H| with u much smaller than the number of indexes specified by a document. (3) 2b r/b: The number of possible values for a block of size b bits is greater than the number of blocks of the string R transmitted during a round t. (4) r/b > l: The number of blocks of the string R transmitted during a round t is greater than the number of shares used in the adopted Shamir’s secret sharing scheme. (5) b 1: The size of a block of R is much greater than 1 bit. In our security study we demonstrate mainly two important characteristics of our non-interactive timestamping scheme. First, we prove that backdating documents in our timestamping scheme has a negligible probability. Second, we prove that the timestamps provided by our timestamping scheme have an eternal validity. 6.1 Negligible Probability of backdating documents In our timestamping scheme, the probability that an adversary backdates a document D for a string R already transmitted using his stored blocks of R is negligible. This proof is established in two steps. In the first step, we show that if an adversary A wants to backdate successfully a document D for a random string R, then he must backdate it ”correctly” for this string R with an error err negligible. In the second step, we show that the probability of backdating correctly a document D for a random string R is negligible. 73 First step If the adversary produces a timestamp of D such that the vector of blocks associated to Share(D) according to this timestamp is far in Hamming distance from the vector of blocks associated to the timestamp produced by Stamp(D, Store(D, R)) then the verifier may with a high probability reject the timestamp of the adversary because the values indexed by Share(D) according to the timestamp do not match the values indexed by Share(D) according to his sketch. Given a correct timestamp T and a timestamp produced by an adversary A for the document D and the random string R denoted T ⇤ , the adversary backdates the document D for R successfully, if he produces a timestamp T ⇤ such that the vector of blocks associated to Share(D) according to T ⇤ denoted R⇤ |Share(D) is close in Hamming distance to R|Share(D) . In this case, we say that the adversary backdates “correctly”the document D for the random string R with err errors. Where err is an integer very close to zero. More formally, let A be an adversary. Denote Rsuccessf ul(D) = R successf ul(D) the set of strings R for which A has the necessary storage to try to backdate the document D with a probability of success greater than . Lemma 1. If an adversary backdates a document D for a random string R with a probability of success then he backdates it “correctly”with at most (n/|H|)ln(1/ ) errors. [16] Proof. Let us suppose that the adversary provides a timestamp for the document D for the string R such that the timestamp is made with err⇤ > err incorrect indices. Denote IN CORRECT (D, R) the set of incorrect indices for D and R. If H \ IN CORRECT (D, R) 6= ; the verifier will reject the timestamp of the adversary. Let i be an indice of H, the probability that i be in IN CORRECT (D, R) is : P r[i 2 IN CORRECT (D, R)] = err⇤ /n. The probability that i does not belong to IN CORRECT (D, R) is 1 P r[i 2 IN CORRECT (D, R)] = 1 err⇤ /n. The probability that all the elements of |H| do not belong to IN CORRECT (D, R) is: ⇤ / IN CORRECT (D, R)] = |H|.(1 err⇤ /n)  e (err |H|)/n . P r[8i 2 H, i 2 If an adversary backdates a document D for a random string R with a prob- ⇤ ability of success  e (err |H|)/n  e (err|H|)/n then he backdates it correctly with at most err  n/|H|.ln(1/ ). Denote Rcorrect (D) the set of strings R for which A can “correctly”backdate D with at most (n/|H|)ln(1/ ) errors. Theorem 1. If err  n/|H|.ln(1/ ) then, Rsuccessf ul (D) is a subset of Rcorrect (D). [16] Proof. According to the lemma 1, if an adversary backdates a document D for a random string R with a probability of success then he backdates it cor- rectly with at most err  n/|H|.ln(1/ ). So, if a random string R belongs to Rsuccessf ul (D) then it belongs to Rcorrect (D). Thus, Rsuccessf ul (D) subseteq 74 Rcorrect (D). In addition, more the probability of success become close to 1, more this error become close to 0. So, successfully backdating means correctly backdating with a “negligible”error. We prove, in the second step, that the prob- ability that the random string R chosen by the adversary to backdate a document D be in Rcorrect (D) is negligible. Second step We now prove that for an adversary A a document D and a string R: P r[R 2 Rcorrect (D)] is negligible. Theorem 2. If l 1 and b 1 then P r[R 2 Rcorrect (D)] is negligible, with b the size of a block of R. Proof. We proved in the first step, that if an adversary backdates a document “successfully”, he backdated it “correctly”with at most a negligible error. Then we proved in the second stage that the probability that the string R for which the adversary tries to backdate the document D be in Rcorrect (D) is negligible. In fact, knowing that the size of blocks of a random string R is b and the n number of these blocks is n, the number of possible random strings is (2b ) . Moreover, to backdate a document D, the adversary has to create a times- tamp T ⇤ for D such that at least l err blocks of R indexed by D are used to generate T ⇤ . In other words, he can try to backdate D only for random strings for which he knows the values of at least l err blocks from the l blocks indexed by D. Thus, since the adversary tries to correctly backdate D with at most err errors, n l the number of random strings he can use is at most (2b ) . So, the probability that a random string R belongs to Rcorrect (D) is: n l P r[R 2 Rcorrect (D)]  (2b ) /(2b )n  1/2lb . Since l 1 and b 1, this probability is negligible. 6.2 The Eternal Security of our Timestamping Scheme In [16], Moran proves that in his non-interactive timestamping scheme, an ad- versary with a storage M can easily backdate = M/T documents by running the timestamping process on some k documents and storing the generated times- tamps (each of which has length at most T ). However, the probability that an adversary backdates more than k documents is negligible. We show here that, in our timestamping scheme, after the transmission of R, it is very difficult to forge a fake timestamp for a given document. Moreover, we show that an adversary having a document D and correct timestamps can forge a fake timestamp for D only with a negligible probability. Thus, we prove the following theorem: Theorem 3. If 2b r/b and r/b > l then the probability to forge a fake times- tamp for a document D and a string R using correct timestamps related to R is negligible. 75 Proof. The inequality 2b r/b means that the number of values for a block of size b bits is greater than the number of blocks of the string R transmitted during a round t. l is the number of indices specified by a given document, this number must always be less than the number of blocks of R. So, r/b > l. It follows that the 2b l, which means that the number of possible values for a block of R is much greater than the number of indices specified by the document D. For each timestamp Tj (1  j  ), if the adversary gives any value v from the 2b possible values of a block of R, it will recover a given index i. However, the fact that i = Tj (v) does not mean that Ri = v. This means that i is the value associated to v by the polynomial Tj but the couple (i, v) does not necessarily belong to the string R. In other words, the string R may not associate the value v to the block indexed by i. Moreover, it may exist i such that Ri0 = v and there is no way to verify if i = i0 . The only points of Tj for which the adversary knows that they belong to R are the points whose indices are specified by the document associated to Tj . The probability that the adversary chooses one of these points is l/2b ⌧ 1. To obtain the k points required to forge a fake timestamp, the probability is negligible since it is the product of the probabilities of selecting each of the points belonging to a valid timestamp. Thus, the adversary can obtain the k points needed to forge a fake timestamp for a document D for a random string R with a negligible probability. 7 Conclusion In this paper, we have presented a non-interactive timestamping solution in the bounded storage model. Our solution is not interactive and hides even the fact that a timestamping occurred. It also ensures total confidentiality of the provided timestamps. In addition, our solution provides eternal security for the provided timestamps. In fact, neither increasing the storage capacity of an adversary or the evolution of his computing power will compromise a provided timestamp. Thus, our solution is more secure than existing systems whose timestamps can be challenged when the computing power or storage capacity of users increase. In this context, we studied the security of our solution and formally proved that the possibility of cheating is negligible. In our future work, we plan to adopt new secret sharing schemes for timestamping. References 1. Y. Aumann and Y. Z.Ding and M. O. Rabin, Everlasting security in the bounded storage model, IEEE Transactions on Information Theory. 2002. 2. D. Bayer and S. Haber and W. S. Stornetta, Improving the efficiency and reliability of digital timestamping, In: Sequences91: Methods in Communication, Security and Computer Science. 1992. 76 3. A. Ben Shil and K. Blibech and R. Robbana, A New Timestamping Schema in the Bounded Storage Model, In: Proceedings of the 3rd Conference on Risks and Security of Internet and Systems. CRiSIS. 2008. 4. K. Blibech and A. Gabillon, A New Timestamping Scheme Based on Skip Lists, In: ICCSA (3). pp. 395-405, 2006. 5. K. Blibech and A. Gabillon, A New Totally Ordered Timestamping Scheme, In: 5th Conference on Security and Network Architectures SAR. . 2006. 6. K. Blibech and A. Gabillon, CHRONOS: an authenticated dictionary based on skip lists for timestamping systems, In: SWS. pp. 84-90, 2005. 7. K. Blibech and A. Gabillon and A. Bonnecaze, Etude des systèmes d’horodatage, Technique et Science Informatiques 26(3-4). pp. 249-278, 2007. 8. A. Bonnecaze and P. Liardet and A. Gabillon and K. Blibech, A distributed time stamping scheme, 4th Conference on Security and Network Architectures SAR 2005. 2005. 9. A. Budas and P. R. Laud and H. Lipmaa and J. Willemson, Timestamping with Binary Linking Schemes, In: CRYPTO. ICICS. pp. 486-501, 1998. 10. A. Budas and P. R. Laud and B. Schoenmakers, Optimally efficient accountable timestamping, In: Public Key Cryptography. 2000. 11. S. Dziembowski and U. Maurer, On Generating the Initial Key in the Bounded- Storage Model. In: Advances in Cryptology- EUROCRYPT. pp.126-137, 2004.8 12. S. Haber and W. S. Stornetta, How to Time-Stamp a Digital Document, J. Cryp- tology 3(2). pp. 99-111, 1991. 13. U. Maurer, Conditionally-perfect secrecy and a provably-secure randomized cipher, Journal of Cryptology. pp. 53-66, 1992. 14. U. Maurer, Secret key agreement by public discussion, IEEE Transaction on Infor- mation Theory, 39. pp. 733-742, 1993. 15. T. Moran and R. Shaltiel and A. Ta-Shma, Non-interactive Timestamping in the Bounded Storage Model, In: In Advances in Cryptology. CRYPTO’04. LNCS, Springer, 3152. pp. 460-476, 2004. 16. T. Moran and R. Shaltiel and A. Ta-Shma, Non-interactive Timestamping in the Bounded Storage Model, In: J. Cryptology. pp. 189-226, 2009. 17. National Institute of Standards and Technology (NIST), Announcement of Weak- ness in the Secure Hash Standard, Technical report. 1994. 18. A. Shamir, How to share a secret, Commun. ACM 22(11). pp. 612-613, 1979. 77