=Paper=
{{Paper
|id=Vol-1158/fms_proceedings
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1158/fms_proceedings.pdf
|volume=Vol-1158
}}
==None==
FMS 2014
Formal
Methods for
Security
Proceedings of the
Formal Methods for Security
Workshop
A Satellite Event of Petri Nets 2014
Proceedings Editors:
Véronique Cortier, CNRS Researcher at LORIA, Lorraine
University, France
Riadh Robbana, Professor in LIP2 & INSAT, University of
Carthage, Tunisia
Tunis, Tunisia June 23 rd , 2014
Preface
This volume contains the proceedings of the 5th Workshop on Formal Methods for Security
(FMS) held in Tunis, Tunisia, June 23, 2014 as satellite event of Petri Nets 2014.
FMS is dedicated to the advancement of the theory and practice of formal methods for security.
We received 9 submissions. These submissions went through a rigorous review process; each
submission was reviewed by at least 3 Program Committee members. The Program Committee
members have selected 5 papers for publication. In addition to the five selected papers, two more
papers were selected for an informal presentation at the workshop.
We also have the chance to welcome one invited talk by Mark Ryan « Du-Vote: Remote
Electronic Voting with Untrusted Computers » (joint work with Gurchetan Grewal, Michael
Clarkson, Liqun Chen).
We hope you will enjoy the FMS 2014 edition!
Véronique Cortier and Riadh Robbana
FMS 2014 Program Committee
Program Committee
Myrto Arapinis University of Birmingham
Kamel Barkaoui Cedric-Cnam
Narjes Ben Rajeb LIP2 - INSAT - University of Carthage
Vincent Cheval School of Computer Science, University of Birmingham
Stephen Chong Harvard University
Veronique Cortier CNRS, Loria
Stephanie Delaune CNRS, LSV
Susanna Donatelli Dipartimento di Informatica, Universita’ di Torino
Sibylle Froeschle University of Oldenburg
Pierre-Cyrille Heam LSV, ENS Cachan, INRIA-CNRS
Béchir Ktari Laval University
Yassine Lakhnech VERIMAG
Mahjoub Langar LIP2 - ENIT - Elmanar University
Mohamed Mejri laval
Riadh Robbana LIP2 - INSAT - University of Carthage
Hassen Saidi SRI International
Lilia Sfaxi INSAT - University of Carthage
Jacques Traore Orange Labs
1
FMS 2014 Additional Reviewers
Additional Reviewers
Berrima, Mouhebeddine University of Sousse
Yu, Jiangshan University of Birmingham
2
FMS 2014 Table of Contents
Table of Contents
Du-Vote: Remote Electronic Voting with Untrusted Computers(Invited Talk) . . . . . . . . . . . . . . . . . . . . . 4
Mark Ryan
Extraction of Insider Attack Scenarios from a Formal Information System Modeling . . . . . . . . . . . . . . 5
Amira Radhouani, Akram Idani, Yves Ledru and Narjes Ben Rajeb
Traffic Analysis of Web Browsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Sami Zhioua and Mahjoub Langar
Secrecy by Witness-Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Mohamed Mejri, Jaouhar Fattahi and Hanane Houmani
Formal Enforcement of Security Policies on Choreographed Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Mahjoub Langar and Karim Dahmani
A Timestamping Scheme with Eternal Security in the Bounded Storage Model . . . . . . . . . . . . . . . . . . . 68
Assia Ben Shil and Kaouther Blibech Sinaoui
3
Du-Vote: Remote Electronic Voting with
Untrusted Computers
(Invited Talk)
Mark Ryan, Gurchetan Grewal, Michael Clarkson, and Liqun Chen
University of Birmingham, UK
m.d.ryan@cs.bham.ac.uk
Abstract. Du-Vote is a new remote electronic voting protocol that elim-
inates the often-required assumption that voters trust general-purpose
computers. Trust is distributed in Du-Vote between a simple hardware
token issued to the voter, the voters’s computer, and a server run by elec-
tion authorities. Verifiability is guaranteed with statistically high prob-
ability even if all these machines are controlled by the adversary, and
privacy is guaranteed as long as at least either the voter’s computer or
the server is not controlled by the adversary. The design of the Du-Vote
protocol is presented in this paper. A new non-interactive zero-knowledge
proof is employed to verify the server’s computations. The security of the
protocol is analyzed to determine the extent to which, when components
of the system are malicious, privacy and verifiability are maintained.
Keywords: du-vote, electronic voting protocol
4
Extraction of Insider Attack Scenarios from a
Formal Information System Modeling
Amira Radhouani1,2,3,5 , Akram Idani1,2 , Yves Ledru1,2 , Narjes Ben Rajeb3,4
1
Univ. of Grenoble Alpes, LIG, F-38000 Grenoble, France
2
CNRS, LIG, F-38000 Grenoble, France
3
LIP2-LR99ES18, 2092, Tunis, Tunisia
4
INSAT - Carthage University, Tunisia
5
FST - Tunis-El Manar University, Tunisia
Abstract. The early detection of potential threats during the modelling
phase of a Secure Information System is required because it favours the
design of a robust access control policy and the prevention of malicious
behaviours during the system execution. This paper deals with internal
attacks which can be made by people inside the organization. Such at-
tacks are difficult to find because insiders have authorized system access
and also may be familiar with system policies and procedures. We are in-
terested in finding attacks which conform to the access control policy, but
lead to unwanted states. These attacks are favoured by policies involving
authorization constraints, which grant or deny access depending on the
evolution of the functional Information System state. In this context, we
propose to model functional requirements and their Role Based Access
Control (RBAC) policies using B machines and then to formally reason
on both models. In order to extract insider attack scenarios from these
B specifications our approach first investigates symbolic behaviours. The
use of a model-checking tool allows to exhibit, from a symbolic behaviour,
an observable concrete sequence of operations that can be followed by
an attacker. In this paper, we show how this combination of symbolic
execution and model-checking allows to find out such insider attack sce-
narios.
keywords: Information System, B-Method, RBAC, attack scenario, Model
Checking, Symbolic Search.
1 Introduction
Developing secure Information Systems remains an active research area address-
ing a wide range of challenges mostly interested in how to prevent from external
attacks such as intrusion, code injection, denial of service, identity fraud, etc.
Insider attacks are less addressed despite they may cause much more damage
because an insider is over all a trusted entity. Intrinsically it is given means
to violate a security policy, either by using legitimate access, or by obtaining
unauthorized access. This paper deals especially with Role Based Access Con-
trol (RBAC) concerns with the aim to exhibit potential insider threats from a
formal modelling of secure Information Systems. We are interested in finding
5
attacks which conform to the access control policy, but lead to unwanted states.
These attacks are favoured by policies involving authorization constraints, which
grant or deny access depending on the evolution of the functional Information
System state. This reveals, on the one hand, the need to link the security model
to the functional model of the information system, and on the other hand, to
build tools taking into account the dynamic evolution of the IS state.
Tools such as SecureMova [2] and USE [6] are dedicated to validate security
policies related to a functional model. But these tools don’t take into account
dynamic evolution of the functional state. In [10], we discussed shortcomings of
existing approaches in this context, and showed the advantages of using a formal
specification assisted by animation tools. This paper goes a step further than our
previous works by taking advantage of model-checking and proof tools in order
to automatically find insider attack scenarios composed of a sequence of actions
modifying the functional state and breaking the authorization constraint.
This paper is organized as follows: section 2 gives an overview of our approach
and its underlying methodology. In section 3 we present a simple example that
illustrates our contribution. Section 4 defines semantics and technical aspects.
In section 5 we propose a symbolic search that automates generation of attack
scenarios and we discuss results of its application on the given example. Finally,
we draw conclusions and perspectives.
2 Overall Approach
Bridging the gap between formal (e.g. Z, B, VDM . . .) techniques and graphical
languages such as UML has been a challenge since several years. On the one
hand, formal techniques allow automatic reasoning assisted by proof and model-
checking tools, and on the other hand, graphical techniques allow visualization
and better understanding of the system structure. These complementary aspects
are useful to ensure a software development process based on notations with pre-
cise syntax and semantics and which allows to structure a system graphically.
Most existing research works [1, 5, 7, 13] in this context have been focused only
on modelling and validation of functional aspects which are initially described
by various kinds of UML diagrams (class, state/transition, sequence, . . .) and
then translated into a formal specification. These works have shown the inter-
est of linking formal and graphical paradigms and also the feasibility of such
translations.
In our work, we adopt a similar approach in order to graphically model
and formally reason on both functional and security models. We developed the
B4MSecure6 platform [9] in order to translate a UML class diagram associated
to a SecureUML model into B specifications. The resulting B specifications il-
lustrated in figure 1 follow the separation of concerns principles in order to be
able to validate both models separately and then validate their interactions.
The functional B model on the left hand side of figure 1 is issued from a con-
ceptual class diagram. It integrates all basic operations generated automatically
6
http://b4msecure.forge.imag.fr/
6
(constructors, destructors, setters, getters, . . .) and also additional user-defined
operations which are integrated into the graphical model and specified using
the B syntax. This functional specification can be further improved by adding
invariants and carrying out proof of correction with the help of AtelierB prover.
Fig. 1. Validation of functional and security models
The security model, on the right hand side of figure 1 is dedicated to control
the access to functional operations with respect to access control rules defined
in the SecureUML model. In our approach, we don’t deal with administration
operations because we make the simplifying assumption that access control rules
don’t evolve during the system execution. The security formal model allows
to validate RBAC well-formedness rules such as no role hierarchy cycles, and
separation of duty properties (SoD) such as assignment of conflicting roles to
users. . .
This paper assumes that validation of both models in isolation is done: op-
erations of functional model don’t violate invariant properties, and the security
model is robust. Such validation activities are widely discussed in the literature
[12]. However, currently available validation approaches do not take sufficiently
into account interactions between both models which result from the fact that
constraints expressed in the security model also refer to information of the func-
tional model. In fact, security policies often depend on dynamic properties based
on the functional system state. For example, a bank customer may transfer funds
from his account, but if the amount is greater than some limit the transfer must
be approved by his account manager. Access control decisions depend then on
the satisfaction of authorization constraints in the current system state. Dy-
namic evolution of the functional state impacts these constraints and may lead
to a security vulnerability if it opens an unexpected access. In this paper we
use validation tools (prover and model-checker) in order to search for malicious
sequences of operations by analysing authorization constraints.
3 A Simple Example
In this section we use a running example issued from [2] and which deals with a
SecureUML model associated to a functional UML class diagram.
7
3.1 Functional Model
The functional UML class diagram (presented in figure 2) describes a meeting
scheduler dedicated to manage data about two entities: Persons and Meetings.
Fig. 2. Functional model of meeting scheduler system
A meeting has one and only one owner (association MeetingOwner), a list of
participants (association MeetingParticipants), a duration, and a starting date.
A person can be the owner of several meetings and may participate to several
meetings. Operations notify and cancel are user-defined, and allow respectively
to send messages to participants and to delete a meeting after notifying their
participants by e-mail. Constructors, setters and getters are implicitly defined
for both classes and both associations.
3.2 Access Control Rules
The access control model is given in Figure 3. It features three different roles:
– SystemUser: defines persons who are registered on the system and then
have permission UserMeetingPerm which allow them to create and read
meetings. Deletion and modification of meetings (including operation cancel)
are granted to system users by means of permission OwnerMeetingPerm,
featuring an authorization constraint checking that the user who tries to run
these actions is the meeting owner.
– Supervisor: defines system users with more privileges because they can run
actions notify and cancel on any meeting even if they are not owners.
– SystemAdministrator: having a full access on entity Person, an admin-
istrator manages system users. Full access grants him the right to create
a new person, remove or modify an existing one. Furthermore, a system
administrator has only a read access on meetings.
3.3 Validation
This example is intended to be validated in [2] based on a set of static queries
that query a given system state in order to grasp some useful information like
“which user can perform an action on a concrete resource in a given state”.
Authorization constraint associated to OwnerMeetingPerm requires in-
formation from the functional model because it deals with the MeetingOwner
8
Fig. 3. Security model of meeting scheduler system
association. In the rest of this article, we consider three users John, Alice and
Bob such that user assignments are as defined by figure 4 and a given initial
state in which Alice is owner of meeting m1 , Bob is a participant of m1 . In such
a state, the above static query establishes that only Alice is allowed to modify
or delete m1 because she is the owner of m1 .
Fig. 4. Users assignement
In [10, 9] a dynamic analysis approach based on animation of a formal speci-
fication showed that validation should not only be based on a given static state,
but should search for sequences of actions modifying this state and breaking the
authorization constraint. For example, starting from the above state, a static
query would only report that John, and also Bob, can’t modify m1 because none
9
of them satisfies the authorization constraint. A dynamic analysis would ask if
there exists a sequence of operations enabled by John, or Bob, that allows them
to modify m1 . This paper contributes towards automatically finding these ma-
licious sequences. To perform these analysis, we applied the B4MSecure tool to
the UML and SecureUML diagrams and generated a B specification counting
946 lines. This tool generates automatically a specification for all basic func-
tional operations, which is enriched manually by some user-defined operations
(i.e. cancel, notify).
4 Proposed Approach
4.1 Trace Semantics for B Specifications
In order to find malicious behaviours of an operational secure IS modelling,
we rely on the set of finite observable traces of our B specifications. Indeed, B
specifications can be approached by means of a trace semantics composed of an
initialization substitution init, a set of operations O and a set of state variables
V. We note val a possible state predicate allowed by the invariant and op an
operation from O. A functional behaviour is an observable sequence Q
Q=
ˆ init ; op1 ; op2 ; . . . ; opm
such that ’i.(i œ 1..m ∆ opi œ O) and there exists a sequence S of state
predicates which does not violate invariant properties:
S =
ˆ val0 ; val1 ; . . . ; valm
in which val0 is an initial state, and opi is enabled from state vali≠1 and state
vali is reached by opi , starting from state vali≠1 .
The security model filters functional behaviours by analysing access control
premises which are triplets (u, R, c) where u is a user, R is a set of possible
roles assigned to u, and c is an authorization constraint. An observable secure
behaviour is a sequence Q, where for every step i, premise (ui , Ri , ci ) is valid
(expressed as (ui , Ri , ci ) |= true). This means that roles Ri activated by user ui
grant him the right of running operation opi and if a constraint ci exists, then
it must be satisfied. The following premises sequence P must be valid for Q:
P =
ˆ (u1 , R1 , c1 ) ; (u2 , R2 , c2 ) ; . . . ; (um , Rm , cm )
4.2 Tools to Exhibit Behaviours from B Specifications
Model-checking and symbolic proof techniques are of interest in order to exhibit
a relevant behaviour from an operational B specification. Proof techniques deal
with infinite systems and can prove constraint satisfiability, or establish that
some operation can be enabled from an abstract state predicate. Model-checking
is based on model exploration of finite systems, and can be used to find a sequence
of actions leading to a given state or property. In our approach, we combine both
techniques in order to overcome their shortcomings: complexity of proofs for the
first one, and state explosion for the second one. In this sub-section, we illustrate
both tools.
10
Model checking and animation (the ProB tool). ProB [11] is an animation and
a model-checker of B specifications that explores the concrete state space of
the specification and generates accessibility graphs. Then, every predicate vali
(where i œ {0, 1, . . . , m}) of sequence S is a valuation of variables issued from
V. For example, starting from an initial state val0 where:
V = {person, meeting, meetingOwner, meetingP articipants} and such that:
val0 =
ˆ person = ÿ
· meeting = ÿ
· meetingOwner = ÿ
· meetingP articipant = ÿ
and having O = {personN ew, meetingN ew, meetingAddP articipants, . . .}, the
scenario of table 1 is successfully animated using ProB tool. Column “reached
states” gives only modified B variables from the previous step.
step Sequence Q Reached states S RBAC premises P
John
1 personNew person={Alice} SystemAdministrator
no constraint
John
2 personNew person={Alice, Bob} SystemAdministrator
no constraint
Alice
meeting={m1 }
3 meetingNew SystemUser
meetingOwner={(Alice, m1 )}
no constraint
Alice
SystemUser
4 meetingAddParticipants meetingParticipants={(m1 , Bob)}
Constraint: Alice is
the owner of m1
Table 1. animation of a normal scenario with ProB
In step 1, the tool animates operation personN ew which modifies variable
person (initially equal to emptyset) and this action was performed by user John
using role SystemAdministrator without need of authorization constraint. In
step 4, the tool adds participant Bob to the meeting m1 by animating operation
meetingAddParticipants, after validating that authorization constraint is valid
for Alice using role SystemUser. Indeed, Alice is the owner of m1 .
Symbolic proof (the GeneSyst tool). ProB is useful to animate scenarios identi-
fied during requirements analysis, or to exhaustively explore a finite subset of
state space. As we are interested in finding malicious scenarios that exhibit a
potential internal attack, the ProB technique may be useful only if it explores
the right subset of state space in the right direction, which is not obvious for
infinite systems. Symbolic proof techniques, such as that of GeneSyst [4], are
more interesting because they allow to produce symbolic transition systems that
represent a potentially infinite set of values. Such tools reason on the reachability
11
properties of a symbolic state F by some operation op from a symbolic state E.
In [4], three reachability properties are defined in terms of the following proof
obligations, where E and F are two disjoint state predicates:
(1) possibly reached: E · P re(op) ∆ ¬[Action(op)]¬F
(2) not reachable: E · P re(op) ∆ [Action(op)]¬F
(3) always reached: E · P re(op) ∆ [Action(op)]F
In the generalized substitution theory, formula [S]R means that substitution
S always establishes predicate R, and ¬[S]¬R means that substitution S may
establish predicate R. Hence, proof (1) means that state F can be reached by
actions of operation op, when operation precondition is true in state E. Proof (2)
means that F is never reached by actions of operation op from state E. Finally,
proof (3) means that F is always reached by op from E. Let us consider, for
example, the functional operation meetingN ew:
meetingN ew(m, p)=
ˆ
P RE m ”œ meeting · p œ person T HEN
meeting := meeting fi {m}
|| meetingOwner := meetingOwner fi {(m ‘æ p)}
EN D
This operation adds a new meeting m and links it to an owner p. If we define
states E and F such that:
E=ˆ meetingOwner[{m1 }] = ÿ
F =
ˆ meetingOwner[{m1 }] ”= ÿ
then proof obligation produced by GeneSyst for property (1) was successfully
proved showing that operation meetingN ew when enabled from a state where
m1 does not exist and there exists at least one person in the system, may lead
to a state where m1 is created and has an owner.
Our work will be focused on proof (1) which states the reachability of a target
state from an initial one, illustrated above, because it is sufficient to decide
wether an operation is potentially useful for a malicious behaviour. Proofs (2)
and (3) can be used if one would like to assume that a state can never be reached,
or it is always reached, by an operation.
4.3 Malicious Behaviour
Based on the security requirements, several operations are identified as criti-
cal. For example, security requirements have identified the integrity of meeting
information as critical. Therefore, operations which perform unauthorized mod-
ifications are identified as critical.
A malicious behaviour executed by a user u, regarding authorization con-
straints, is an observable secure behaviour Q with m steps such that:
12
– opm is a critical operation to which an authorization constraint cm is asso-
ciated.
– user u is malicious and would like to run opm by misusing his roles Ru .
– val0 : is an initial state where (u, Ru , cm ) |= f alse
– for every step i (i œ 1..m) premise (u, Ru , ci ) |= true
In other words, malicious user u is not initially allowed to execute a critical
operation, but he is able to run a sequence of operations leading to a state from
which he can execute this operation. In our investigation we suppose that user u
executes this malicious sequence without collusion. This problem will be tackled
in a further work.
Section 3.3 gave an example where neither Bob nor John are allowed to run
a modification operation, such as meetingSetStart which modifies attributes of
class Meeting, from the initial state due to the authorization constraint. This
initial state is:
val0 =
ˆ person = {Alice, Bob}
· meeting = {m1 }
· meetingOwner = {(Alice ‘æ m1 )}
· meetingP articipant = {(m1 ‘æ Bob)}
In the following, we denote as init0 the sequence of operations leading to
val0 such as that presented in table 1. We used the model-checking facility of
ProB in order to explore exhaustively the state space and automatically find a
path starting from val0 and leading to a state where operation meetingSetStart
becomes permitted to John. We asked ProB to find a sequence where John
becomes the owner of m1 :
meetingOwner(m1 ) = John
After exploring more than 1000 states, ProB found a scenario in which John ex-
ecutes sequentially operations personNew, personAddMeetingOwner and meet-
ingSetStart. Indeed, this dynamic analysis showed that John, as a system ad-
ministrator, has a full access to entity Person. This permission allows him to
create, modify, read and delete any instance of class Person. First, he creates
an instance John of class Person that corresponds to him by running opera-
tion personN ew(John). Then he adds meeting m1 to the set of meetings owned
by John, by running operation personAddM eetingOwner(John, m1 ) which is a
basic modification operation of class Person. These two actions allowed him to
become the owner of m1 and then he was able to modify the meeting of Alice.
Like all model-checking techniques, when ProB explores exhaustively the state
space, it faces the combinatorial explosion problem which depends on the number
of operations provided to the tool and the state space size. In order to address
this problem, our approach proposes a symbolic search which finds a sequence
of potentially useful operations on which the model-checker should be focused.
13
5 Symbolic Search
The proposed symbolic search is performed by an algorithm that looks for an
observable sequence Q = ˆ init0 ; op1 ; . . . ; opm executed by a user u, and where
(u, Ru , cm ) is not valid for a critical operation opm in the initial state val0 but
becomes valid for state valm≠1 where opm can be enabled. It is a backward search
algorithm, starting from the goal state valm≠1 from which the critical operation
opm can be enabled: valm≠1 = ˆ cm · P re(opm ); and working backwards until the
initial state val0 is encountered. The algorithm ends when sequence Q is found
or when all operations are verified without encountering the initial state. We
consider that val0 is a completely valuated state such as that where Alice is the
owner of m1 , and Bob is a participant to m1 . This prevents the initial state from
being included in both states valm≠1 and valm≠2 , which would never verify the
condition of the while loop. Note that each operation occurs at most once in a
computed sequence, which ensures the termination of our algorithm.
1. Q =ˆ opm ;
2. valm≠1 =ˆ cm · P re(opm );
3. valm≠2 =ˆ ¬valm≠1 ;
4. while val0 ”∆ valm≠1 do
5. choose any oi œ O where
6. (u, Ru , ci ) |= true ·
7. valm≠2 · P re(oi ) ∆ ¬[Action(oi )]¬valm≠1
8. do
9. Q= ˆ oi ; Q ;
10. valm≠1 = ˆ valm≠2 · P re(oi );
11. valm≠2 = ˆ valm≠2 · ¬P re(oi );
12. else
13. raise exception: No sequence found
14. enddo
15. endwhile
16. Q =ˆ init ; Q ;
5.1 Step by Step Illustration
We take advantage of abstraction and step by step we refine the valm≠2 symbolic
state:
1. At the first step of the algorithm, the state space is represented by two sym-
bolic states: the first one valm≠1 includes all states where the authorization
constraint cm is true and which are enabling opm , and the second one valm≠2
is the negation of valm≠1 which is then ¬cm ‚ ¬P re(opm ). As they are two
disjoint state predicates, we conduct proof (1) in order to find an operation
oi that belongs to O and which possibly reaches the first state valm≠1 from
the second one valm≠2 and such that premise (u, Ru , ci ) is valid. If oi does
not exist, then no sequence could be found for the expected attack and we
14
can try proof (2) for each operation attesting that all operations never reach
valm≠1 from valm≠2 .
2. At the second step of the algorithm, if the proof (1) succeeds for some oper-
ation opm≠1 , then it may exist an observable sequence leading to the critical
operation where access control premise (u, Ru , cm ) is valid, and hence a po-
tential symbolic attack scenario can be found. The algorithm looks inside
state valm≠2 in order to find out the previous operations that can be in-
voked in the attack scenario. State valm≠2 is partitioned into two sub-states
which are:
valm≠2 · P re(opm≠1 ) © ¬(cm · P re(opm )) · P re(opm≠1 )
valm≠2 · ¬P re(opm≠1 ) © ¬(cm · P re(opm )) · ¬P re(opm≠1 )
Then, we look for operations that reach the first sub-state from the second
one.
3. The algorithm proceeds iteratively by partitioning the second state into two
sub-states until it finds a state that includes the initial state. In the best
case, our algorithm gives some symbolic attack scenario, which consists of
sequence (init0 ; opn ; opn+1 ; . . . ; opm ) invoked by the same user u and
where:
valn≠1 =¬(c
ˆ m · P re(opm )) · ¬P re(opm≠1 ) · ¬P re(opm≠2 ) · . . . · P re(opn )
and such that val0 ∆ valn≠1 · ’i.(i œ (n..m) ∆ (u, Ru , ci ) |= true)
5.2 Application
We apply our algorithm to the meeting scheduler example starting from the
following initial state val0 :
val0 =
ˆ person = {Alice, Bob}
· meeting = {m1 }
· meetingOwner = {(Alice ‘æ m1 )}
· meetingP articipant = {(m1 ‘æ Bob)}
In this state user John is not allowed to modify meeting m1 because the au-
thorization constraint allows modification only for the owner of m1 . A malicious
scenario would lead to a state where John becomes able to execute a modifica-
tion operation such as operation meetingSetStart on meeting m1 . In this state
we have to verify:
P re(meetingSetStart(m1 , start))=
ˆ m1 œ meeting · start œ N AT
and (John, SystemUser, M eetingOwner(m1 ) = John) |= true
1. First iteration: considering the following symbolic states
valm≠1 = (M eetingOwner(m1 ) = John) · m1 œ M eeting · start œ N AT
valm≠2 = ¬valm≠1
15
we have:
– val0 ”∆ valm≠1 because, in state val0 , M eetingOwner(m1 ) = Alice, and
– proof (1) succeeds for the operations meetingNew and personAddMeetingOwner.
Then, we may go on the second iteration of the algorithm for each of these
operations.
2. Second iteration: we partition state valm≠2 into two sub-states:
valm≠2 = ¬valm≠1 · P re(opm≠1 )
valm≠3 = ¬valm≠1 · ¬P re(opm≠1 )
– case 1: we choose opm≠1 = meetingN ew, and then we have:
– P re(meetingN ew) = ˆ m1 œ meeting · John œ person, and
– val0 ”∆ valm≠2 because John ”œ person
In this case, the algorithm does not find an operation leading to a state
where operation meetingN ew becomes enabled. Indeed, no operation
satisfied proof obligation (1). Our algorithm concludes that it does not
exist an attack scenario invoking meetingN ew at this step.
init0
meetingN ew
meetingSetStart
Fig. 5. No state enables operation meetingN ew is found
– case 2: we choose opm≠1 = personAddM eetingOwner and then we
have:
– P re(personAddM eetingOwner) =
m1 œ meeting·John œ person·(John, m1 ) œ / M eetingOwner
– val0 ”∆ valm≠2 because John ”œ person
In this case, proof (1) succeeds for operation personN ew which means
that if personN ew is executed, it may lead to a state where meetingN ew
can be enabled.
3. Third iteration: we partition state valm≠3 into two sub-states:
valm≠3 = ¬valm≠1 · ¬P re(personAddM eetingOwner) · P re(personN ew)
valm≠4 = ¬valm≠1 · ¬P re(personAddM eetingOwner) · ¬P re(personN ew)
This stops normally the algorithm because in this case val0 ∆ valm≠3 .
Indeed, P re(personN ew) =ˆ John œ / person and in the initial state John
does not belong to set person. Figure 6 presents the full symbolic scenario
that allows John to modify Alice’s meeting.
16
init0
personN ew
personAddM eetingOwner
meetingSetStart
Fig. 6. Symbolic malicious behaviour for user John.
5.3 Discussion
Technically, our approach applies the GeneSyst tool in order to produce proof
obligations and then asks the AtelierB prover to discharge them automatically.
As the resulting scenarios are symbolic and based on “possibly reached proofs”,
the analyst can conclude that attacks may exist but he can not attest their fea-
sibility for the concrete system. An interesting contribution of our proof-based
symbolic sequences, besides the fact that they draw the analyst’s attention to
potential flaws, is that they give useful inputs to the model-checker. Indeed, a
model-checking tool can be used to exhibit, from a symbolic behaviour, an ob-
servable concrete sequence of operations that can be followed by an attacker. In
order to reduce significantly the state space, we can ask ProB to explore only
operations found in the symbolic malicious scenarios. For our example, when try-
ing only operations personNew, personAddMeetingOwner and meetingSetStart,
ProB exhibits a concrete attack scenario after visiting a dozen of states which
shows a significant speed up with respect to our initial ProB attempts (involving
more than 1000 states).
Our technique was able to extract another scenario (figure 7) which can be
executed by user Bob from the same initial state, in order to steal the ownership
of m1 . In this scenario, Bob first cancels the meeting and then he recreates it
before applying the critical operation.
init0
meetingCancel
meetingN ew
meetingSetStart
Fig. 7. Symbolic malicious behaviour for user Bob.
The first scenario, done by user John, is made possible by the full access per-
mission to class Person, associated to role SystemAdministrator, which includes
the right to modify association ends. This attack affects meeting integrity. One
17
solution can be to add a SSD constraint between roles SystemAdministrator and
SystemUser. John will then still be able to become owner of the meeting, but
will not be able to log in as SystemUser in order to modify it.
The second scenario done by Bob was possible due to role Supervisor which
gives him the right to cancel a meeting, and then, as a SystemUser he can
recreate it in order to become its owner. This scenario does not point out a flaw
since whenever a meeting is cancelled it should be legitimate that a user can
start a new meeting with the same identifier as the cancelled one.
6 Conclusion
We described in this paper a symbolic search approach that can extract insider
malicious behaviours from a formal Information System modelling. The meet-
ing scheduler example was discussed in several articles [3, 2]. However, they do
not report the attack scenarios presented in this paper. This is due to the fact
that dynamic evolution of the functional state is not taken into account. Con-
tributions of this paper showed how dynamic analysis, assisted by proofs and
model-checking, is useful to find out potential threats. In addition, thanks to our
algorithm, proofs and model checking tools, our method can be fully automated
in order to extract attack scenarios breaking authorization constraint. In [10],
a dynamic analysis is done interactively with the help of a Z animator, but it
is tedious and may miss many possible flaws. We also applied our approach on
the case study that has been treated in [8] and we were able to find, automat-
ically, the discussed threat. Currently we are looking for application on a real
case study, issued from the ANR-Selkis project7 , and which deals with a medical
information system involving various authorization constraints.
Our approach is automated by exploiting tools B4MSecure8 , GeneSyst9 , Ate-
lierB10 and ProB11 . B4MSecure translates functional and security graphical
models into B specification, from which we automatically produce proof obli-
gations on reachability properties by taking advantage of the GeneSyst tool.
Then, these proof obligations are discharged automatically using the AtelierB
prover. When a symbolic scenario is found, ProB is used to explore concrete
state space focusing on operations issued from the symbolic scenario. The main
limitation of our work is that sometimes, when proof obligations are complex,
AtelierB fails to prove them automatically. Interactive proofs are then required,
but they may be pretty difficult for the analyst. One naive solution is to keep
operations for which proofs don’t succeed automatically in order to be exploited
further using the model-checker. A more interesting solution is to focus on other
kinds of proof obligations. For example, one can try to prove that an operation
o is never enabled from a state E and/or o never reaches a state F . Applying
7
http://lacl.univ-paris12.fr/selkis
8
http://b4msecure.forge.imag.fr
9
http://perso.citi.insa-lyon.fr/nstouls/?ZoomSur=GeneSyst
10
http://www.atelierb.eu/
11
http://www.stups.uni-duesseldorf.de/ProB
18
these proofs to the meeting scheduler example we were able to eliminate half of
the operations after proving automatically that they cannot be involved in the
attack scenario.
We believe that reachability properties can be expressed by means of LTL
formula. We started exploring this direction basing on the LTL formula checking
facilities of ProB and the first results are promising.
References
1. K. Anastasakis, B. Bordbar, G. Georg, and I. Ray. Uml2alloy: A challenging model
transformation. Model Driven Engineering Languages and Systems, 2007.
2. D. Basin, M. Clavel, J. Doser, and M. Egea. Automated analysis of security-design
models. Information Software Technology, 51, 2009.
3. D. Basin, J. Doser, and T. Lodderstedt. Model driven security: From uml models
to access control infrastructures. ACM Trans. Softw. Eng. Methodol., 15(1), 2006.
4. M-L. Potet D. Bert and N. Stouls. GeneSyst: a Tool to Reason about Behavioral
Aspects of B Event Specifications. Application to Security Properties. In H. Tre-
harne, S. King, M. C. Henson, and S. A. Schneider, editors, ZB 2005: Formal
Specification and Development in Z and B, 4th International Conference of B and
Z Users, volume 3455 of LNCS. Springer-Verlag, 2005.
5. A. Idani, J. Boulanger, and L. Philippe. Linking paradigms in safety critical sys-
tems. International Journal of Computers and their Applications (IJCA), Special
Issue on the Application of Computer Technology to Public Safety and Law En-
forcement, 16(2), 2009.
6. M. Kuhlmann, K. Sohr, and M. Gogolla. Employing uml and ocl for designing and
analysing role-based access control. Mathematical Structures in Computer Science,
23(4), 2013.
7. K. Lano, D. Clark, and K. Androutsopoulos. UML to B: Formal Verification of
Object-Oriented Models. In Integrated Formal Methods, volume 2999 of LNCS.
Springer, 2004.
8. Y. Ledru, A. Idani, J. Milhau, N. Qamar, R. Laleau, J-L. Richier, and M-A. Labi-
adh. Taking into account functional models in the validation of is security policies.
In Camille Salinesi and Oscar Pastor, editors, CAiSE Workshops, volume 83 of
Lecture Notes in Business Information Processing. Springer, 2011.
9. Y. Ledru, A. Idani, J. Milhau, N. Qamar, Régine Laleau, J-L. Richier, and M-A.
Labiadh. Validation of IS security policies featuring authorisation constraints. In-
ternational Journal of Information System Modeling and Design (IJISMD), 2014.
10. Y. Ledru, N. Qamar, A. Idani, J-L. Richier, and M-A. Labiadh. Validation of
security policies by the animation of z specifications. In Proceedings of the 16th
ACM Symposium on Access Control Models and Technologies, SACMAT ’11, New
York, NY, USA, 2011. ACM.
11. M. Leuschel and M. Butler. ProB: A Model Checker for B. In FME 2003: Formal
Methods Europe, volume 2805 of LNCS. Springer-Verlag, 2003.
12. N. Qamar, Y. Ledru, and A. Idani. Evaluating RBAC Supported Techniques and
their Validation and Verification. In 6th International Conference on Availabil-
ity, Reliability and Security (ARES 2011), Vienna, Autriche, August 2011. IEEE
Computer Society.
13. C. Snook and M. Butler. UML-B: Formal modeling and design aided by UML.
ACM Transactions on Software Engineering and Methodology, 15(1), 2006.
19
Traffic Analysis of Web Browsers
Sami Zhioua∗ and Mahjoub Langar∗∗
∗
Information and Computer Sciences Department, KFUPM
P.O. Box 958, Dhahran 31261, KSA
zhioua@kfupm.edu.sa
∗∗
Ecole Nationale des Ingenieurs de Tunis
B.P. 37, Belveder, Tunis
mahjoub.langar@gmail.com
Abstract. Tor network is currently the most commonly used anonymity
system with more than 300,000 users and almost 3000 relays. Attacks
against Tor are typically confirmation attacks where the adversary injects
easily discernible traffic pattern and observes which clients and/or relays
exhibit such patterns. The main limitation of these attacks is that they
require a “powerful” adversary. Website fingerprinting is a new breed of
attacks that identifies which websites are visited by a Tor client by learn-
ing the traffic pattern for each suspected website. Recent works showed
that some classifiers can successfully identify 80% of visited websites. In
this paper we use a classic classifier, namely, decision trees (C4.5 algo-
rithm) and we study to which extent popular web browsers can resist to
website fingerprinting attacks. Among four studied web browsers, Google
Chrome offers the best resistance to website fingerprinting (5 times bet-
ter than the other web browsers). Since most of existing fingerprinting
techniques have been evaluated using Firefox web browser, we expect the
accuracy results of existing works to be reduced in case Chrome browser
is used.
1 Introduction
Anonymity systems, such as Tor [1] and Jap [2] are designed primarily to pro-
vide privacy and anonymity to Internet users living in oppressive regimes giving
them the opportunity to evade censorship. These systems achieve anonymity by
embedding user data inside several layers of encryption and by forwarding the
traffic through a set of relay nodes/proxies. This makes the job of an eavesdrop-
ping adversary much more challenging since by just observing the traffic she
cannot deduce who is communicating with whom and what is the type of traffic
exchanged.
Tor [1] represents the current state-of-the-art in low-latency anonymity sys-
tems. The Tor network is currently the largest deployed anonymity network ever,
consisting of almost 3000 relays and more than an estimated 300,000 users.
Most of attacks against Tor anonymity system were traffic confirmation at-
tacks where the idea was to inject easily discernable traffic pattern and observe
which potential clients are exhibiting such patterns [3–7]. Most of these attacks
20
require a “powerful” adversary which is assumed to observe the traffic of a sig-
nificant number of Tor relays and in some attacks to inject malicious relays in
the Tor network. These assumptions are relatively strong and beyond the ca-
pabilities of most of attackers including totalitarian regimes. A more practical
attack on Tor which does not require strong assumptions is passive traffic anal-
ysis. Traffic analysis consists in intercepting and analyzing the traffic messages
(usually encrypted) in order to reveal information about the communication (e.g.
the identities of the communicating entities, the type of data exchanged, etc.).
To carry out the attack, the adversary is assumed to observe the traffic of only
one side of the communication (usually the Tor client). This threat model is very
common and holds particularly in presence of censorship.
Website fingerprinting [8] is a variant of passive traffic analysis that can be
carried out by a local eavesdropper or by any entity observing Tor client traffic. In
this attack the adversary analyzes the traffic to extract patterns that can reveal
the identity of the website accessed by the client. Patterns are constructed from
certain features in the traffic such as the size of transferred data, the timing, the
order of packets, etc.
Website fingerprinting was first used to analyze encrypted HTTP traffic [8–
11]. Most of these attacks were based on tracking the size the objects fetched
by the main web page. With the migration to HTTP/1.1 which makes use of
persistent connections and pipelining, it is no longer possible to easily distin-
guish between single objects fetching. Only few works focused on implementing
website fingerprinting on anonymity systems [12–15]. It turned out that website
fingerprinting is much challenging when applied on anonymity systems in partic-
ular Tor. The reason is that Tor protocol performs some structural modifications
in the traffic: restructuring the traffic into fixed size cells, merging small packets
together, multiplexing TCP streams, etc. However, despite these challenges, re-
cent works showed that the precision of website fingerprinting could be as high
as 80% when applied on Tor [15].
Tor protocol might be used with different web browsers1 . Since web browsers
use different user agents and process data packets differently, the choice of the
web browser should have an impact on the efficiency of website fingerprinting. In
this paper we study the impact of the choice the web browser on the anonymity
of Tor clients with respect to website fingerprinting attacks. We consider a repre-
sentative set of four popular web browsers, namely, Firefox, Chrome, Konqueror,
and Internet Explorer, and we empirically analyze to which extent they resist
to website fingerprinting. This is the first work in the literature that studies
the efficiency of website fingerprinting while using different web browsers. All
existing works were focusing on a single web browser, mainly Firefox.
The contributions of this paper are two-fold:
1. A detailed and complete survey of existing website fingerprinting approaches
in particular targeting anonymity systems.
2. A comparative analysis of the most popular web browsers according to their
resistance to website fingerprinting.
1
Provided that the web browser allows to configure the socket proxy
21
2 Related Work
Early website fingerprinting techniques were focusing on analyzing simple en-
crypted HTTP traffic. Hintz [8], which is the first to use the term “fingerprint-
ing” to refer to this type of attack, implemented a simple website fingerprinting
attack targeting the SafeWeb encrypting web proxy [16]. The attack was based
on tracking the size of objects fetched by a visited website. This was possible
because the author did a strong assumption that every web object (image, ads,
etc.) is fetched through a separate TCP connection using a different port. His
experiment was a simple proof of concept distingushing only 5 websites. He
achieved a detection rate between 45 and 75%.
Similarly, Sun et al. [9] based their approach on the size of fetched objects.
Objects are isolated in the encrypted traffic by counting the number of packets
between blocks of requests. The fingerprint is expressed as a multiset of object
lengths. An unknown traffic sequence is then evaluated against website finger-
prints using a measure of similarity (Jaccard’s Similarity [17]). A similarity value
more than a threshold c indicates a matching. In their empirical analysis, Sun
et al. constructed a database of 2000 website fingerprints and then tried to dis-
tinguish these same 2000 websites out of a set of 100,000 websites. The optimal
accuracy was obtained with the threshold c equal to 0.7 where 75% of the 2000
websites were correctly identified with a false positive rate of 1.5%.
The strong assumption that web objects can be distinguished by observing
the different TCP connections does not hold anymore since with the migration
to HTTP/1.1, no TCP connection is opened for each object as was the case in
HTTP/1.0.
Bissias et al. [10] were the first to use IP packet sizes and Inter-Packet-Time
(IPT) instead of the size of fetched objects to fingerprint websites. From every
website visit they extract two traces: one size trace and one time trace. The size
trace is the sequence of packet sizes while the time trace is the sequence of IPT
times. Then all traces corresponding to a given website are merged into a web-
site profile2 by computing the arithmetic mean at every time step. Once a bank
of profiles is constructed, an unknown network traffic is matched with each one
of the constructed profiles using Cross Correlation [18]. The empirical analysis
was based on the 100 websites most visited in the authors’s department (Uni-
versity of Massachusetts) and showed that size profiles are much more efficient
in identifying visited websites than time profiles. With size profiles, 20% of the
analyzed traces are correctly identified after one guess and 65% after 10 guesses
while with time profiles 8% of websites were correctly identified after one guess
and 27% after 10 guesses. The analysis showed also that the time gap between
the training phase (constructing the profiles) and the testing phase has only a
small impact on the accuracy: a one hour gap is only 5% better than a 168 hours
gap.
Liberatore et al. [11] obtained much better results by focusing only on packet
sizes. In their work, they represented a traffic trace as a vector of packet size
2
Actually two profiles: a size profile and a time profile
22
frequencies: each visit will result in a histogram of packet size frequencies.
They tried two classification techniques: Jaccard’s Similarity [17]3 and Naive
Bayes [19]. The empirical analysis was also performed using the University of
Massachusetts’s typical traffic by filtering the top 2000 visited websites. Jac-
card’s based classification was slightly better than Naive based one with 73% of
website visits correctly identified. Experiments showed also that the training set
need not be very large since a training set of size 4 resulted in almost the same
accuracy of training set of size 12.
All above works focused on website fingerprinting typical encrypted HTTP
traffic. As anonymity systems became popular, recent website fingerprinting con-
tributions focused on attacking those systems, in particular Tor [12–15].
Shi et al. [13] detailed a website fingerprinting attack on Tor. They adapted
Hintz [8] and Sun et al. [9] techniques for Tor since instead of tracking the size
of fetched objects (which is not possible in Tor), they tracked the number of
packets sent or received in every interval4 . A traffic trace is then represented
by a vector specifying the number of intervals with 2 packets, the number of
intervals with 3 packets, etc. Once a profile is built for a website (after several
visits), the similarity between a profile and a traffic trace is computed using
cosine similarity. The technique was evaluated using the traffic from the top 20
websites in Japan. They could identify successfully 50% of the visited websites.
Federrath et al. [12] used a Multinomial Naive Bayes (MNB) classifier to
website fingerprint 6 anonymity systems: 4 single-hop proxy VPNs and 2 multi-
hop systems: Tor and JonDonym [2]. As in Liberatore et al. [11] a traffic trace is
represented as a histogram of packet size frequencies distribution without taking
into consideration the packet ordering nor packet timing. They improved the
efficiency of the MNB classifier by using text mining optimizations [19] such as
Term Frequency Transformation, Inverse Document Frequency, etc. The evalua-
tion was based on the top 2000 websites extracted from the log files of medium-
range proxy server used by 50 schools. These 2000 websites has been filtered
to 775 websites. The accuracy of their technique was very good for single-hop
anonymity systems where 94% of website visits were correctly identified while
it was relatively poor for multi-hop anonymity systems: 20% for JonDonym and
only 3% for Tor. This shows once again that website fingerprinting is much more
challenging with anonymity systems than with typical encrypted HTTP traffic.
Panchenko et al. [14] focused only on Tor and JonDonym and used SVMs
(Support Vector Machines) for classification. They represented a traffic trace as
a sequence of packet lengths where input and output packets are distinguished
by using negative and positive values. In addition, they inject some features in
these sequences to help in the classification such as size markers (whenever flow
direction changes, insert the size of packets in the interval), number markers
3
To use Jaccard’s Similarity as a classifier, they turned the metric value into a class
membership probability by dividing it by the sum of all metric values in the traininig
set.
4
An interval refers to the time period without packet flow change. Moving from one
interval to the other happens when the direction of the flow changes.
23
(number of packets in every interval), total transmitted bytes, etc. They used
Weka tool [20] to fine-tune the SVM parameters. The proposed technique has
been evaluated using two experiments: Closed-world and Open-world. In the
closed-world experiment, the same set of 775 websites of Federrath et al. [12]
as well as ten-fold cross validation have been used to estimate the accuracy.
As of open-world experiment, 5000 websites have been randomly chosen among
the top one million websites according to Alexa [21] in addition to 5 censored
websites. The closed-world experiment showed that the SVM technique resulted
in an accuracy of 30% for the basic variant and 54% when all features are used5 .
The open-world experiment showed that, censored websites were successfully
identified with a true positives rate between 56 and 73% while the false positives
rate was less than 1%.
The most recent contribution was by Cai et al. [15]. As in Panchenko et
al. [14], they represented a traffic trace as a sequence of (negative and positive)
packet lengths. The training and testing is based on an SVM with a distance-
based kernel. They tried several variants of parameters and distances and ob-
tained the best accuracy with a Damerau-Levenshtein edit distance [22, 23]. The
use of this distance is motivated by the fact that it is the length of the short-
est sequence of character insertions, deletions, substitutions, and transpositions
required to transform a trace t to t′ . These operations correspond to discarding
and reordering of packets in a stream. In order to compute the distance between
two traces of different lengths, the Damerau-Levenshtein distance is normalized
with respect to the length of the shortest trace between the two. For evaluation
they used the top 1000 websites according to Alexa which are then filtered to 800
websites. Using the basic version of Tor, they could successfully identify 80% of
visited websites. However, when random cover packets are added to the traffic,
the accuracy falls to 50%. Decreasing the size of the training set from 36 to 4
samples decreased the accuracy with 20%.
In all aformentioned works, without exception, the website fingerprinting ap-
proaches have been evaluated using only one web browser, mainly, Firefox. Since
web browser use different browser engines and user agents and consequentely
fetch pages differently, we strongly think that the chosen web browser has an
impact on the estimated accuracy. In this paper, we consider a representative
set of popular web browsers and repeat the data collection and experiments for
every one of them. Our aim is to compare the resistance of web browsers to
website fingerprinting attacks.
3 Threat Model
The typical threat model for anonymity systems is a global passive adversary
that can observe all the traffic of the network. However, since Tor is a low-
latency anonymity system, it has been designed to protect against a weaker
form of adversary. Indeed, it is assumed that the adversary can observe only
5
The feature with the highest contribution was the total number of packets in the
traffic trace.
24
some fraction of the network traffic; who can generate, modify, delete, or delay
traffic; who can operate onion routers of his own; and who can compromise some
fraction of the onion routers [1]. In this paper we assume a weaker threat model
where the attacker can only access the encrypted traffic between the client and
the first Tor relay. The attacker does not generate, modify, delete or delay any
traffic which makes the attack completely stealth. Except the attacker own Tor
node, no other Tor relay or server is compromised which makes the attack easily
deployable in practice.
4 Tor Traffic Capture
Fig. 1. Tor Traffic Capture
Traffic analysis typically starts by intercepting the traffic packets. In a typ-
ical scenario, only the victim and the gateway can capture the data flowing
between them. In practice, however, several entities might have access to the
traffic packets. The administrator of the LAN has access to the traffic of all
endpoints in the network. The ISP (Internet Service Provider) can monitor the
traffic of any of its subscribers. A Law Enforcement Agency, after approval from
the ISP, can observe and record the traffic of any internet user which is referred
to as lawful intercept. A censoring entity can observe the traffic of any user in its
“juridiction”. In addition to these entities, a malicious user in the LAN can carry
out a MITM (Man-In-The-Middle) attack between the victim and the gateway
and make all the traffic pass through her. The MITM attack can be easily per-
formed using ARP spoofing/poisoning. Cain & Abel [24] and Ettercap [25] are
two popular tools for ARP Spoofing/Poisoning.
Assuming that the attacker can intercept the Tor traffic packets using one of
the above scenarios, the Tor traffic capture goes through three stages as shown in
Figure 1. First the traffic is dumped in a file using a simple tool like tcpdump6 .
6
Alternative tools like Wireshark/tshark or Omnipeek can be used as well.
25
Then the raw traffic is filtered to keep only Tor related traffic. Finally, the traffic
is classified into streams.
Since the list of Tor relays is public, we use it to filter Tor traffic from the rest
of the traffic. The list of Tor relays can be extracted from the Tor status files,
in particular the cached-consensus file. These files can be downloaded manually
from one of the authorities or they can be accessed directly in the Tor status
local folder. Tor automatically updates those files once they are no more fresh.
In our setting, the attacker, which is also a Tor client, uses her own Tor status
files to extract the list of Tor relays.
The next step in the Tor traffic capture is to classify the packets into streams
at the TCP protocol level7 . Every stream is then tracked using IP addresses,
ports, TCP flag bits, Sequence and Acknowledgment numbers. The stream is
closed after a TCP connection termination (FIN, ACK-FIN and ACK).
Typically, Tor creates two types of streams: short living streams and long
living streams. Short living streams are streams to download either router de-
scriptors or directory consenus. They last around 3 minutes because the down-
load takes a couple of seconds and then the stream stays idle until it hits the
maximum stream idle period which is set to 3 minutes in Tor. Typically, only
one circuit is established during a short livinig stream which is a single-hop cir-
cuit. The long living streams are for data communications and typically several
3-hop circuits are established during the lifetime of such streams. Besides, since
all Tor communications are encrypted, all Tor streams initiate a TLS handshake
just after the TCP three-way handshake.
Fig. 2. Tor stream initial packets generated by Wireshark
7
It is important to note that the classified streams at this stage are streams on top of
which Tor communications are conveyed. These are different from the TCP streams
tunneled through Tor. Hence, there are two levels of TCP streams: one level below
the Tor protocol and one level on top of Tor protocol.
26
Figure 2 shows the initial packets in a Tor stream. The three first packets
are for establishing the TCP connection then a TLS handshake follows. The
illustrated stream is a short-living stream whose goal is to download a set of
routers descriptors. Therefore, it creates a single hop circuit with the directory
server which is performed using a CREATE and CREATED cells as shown in
the Figure.
5 Data Collection
In order to evaluate the accuracy of the website fingerprinting for each web
browser, 100 websites have been used. The set of 100 websites is composed
of 90 randomly chosen websites from the top 1000 visited websites worldwide
according to Alexa [21] and 10 censored websites in some countries of the Mid-
dle East. All censored websites are related to anonymizers and proxy services8 .
Website traffic traces are collected in 24 hours sessions. In each session, web-
sites are fetched several times in a round-robin fashion. For visiting websites,
we used two lab machines running Ubuntu Linux. Another Windows 7 machine
is used to fetch websites through Internet Explorer. Traffic packets are dumped
using tcpdump version 4.1.1 and libpcap version 1.1.1. Only packet headers are
dumped in the dump file9 . The experiments were performed using Tor version
0.2.2.39.
We wrote a python script to automate the fetching of websites. For each
website, the script proceeds as follows: (1) it records the system time, (2) requests
the website url, (3) waits for 50 seconds (the time to load the website10 , (4) stops
the website connection, (5) records the system time, (6) waits for 10 seconds (to
have a time gap between every two visits). Time snapshots are taken just before
and after a website visit so that they can be intersected with the dump file in
order to isolate the traffic for every website visit.
Once isolated, the taffic corresponding to each visit is represented as a se-
quence of packet sizes where a positive value refers to an inbound packet while a
negative value refers to an outbound packet. This representation captures three
features about the trace, namely, the size and direction of each packet and the
order of packets11 . The traffic representation is the same as recent works [14,
15]. For every visit, we keep only the first 500 packets of the traffic so that all
obtained sequences have the same length12 .
Interestingly, parsing Tor traffic during a website visit shows that packets are
flowing through several TCP streams not just one. One reason is that Tor needs
to update the Tor relays status regularly by fetching fresh data from the directory
servers and also to send dummy cells to keep some circuits open. TCP streams
8
Examples of these websites include: torpoject.org, unblock-proxy.net, etc.
9
tcpdump is launched with options -n -tttt.
10
If a website takes more than 50 seconds to load, the sample sequence will be incom-
plete.
11
As in most of related work, acknowledgement packets are ignored.
12
This is a requirement for the classification algorithm.
27
used for fetching directory servers data can be easily distinguished from TCP
streams used for data communication since the number of packets exchanged
does not exceed 100 packets. Interestingly, ruling out these short TCP streams,
the traffic resulting from visiting some websites is carried through two TCP
streams with more than 200 packets each. This shows that Tor does not always
multiplex the traffic of a website in a single stream. For those website visits, the
corresponding traffic sequence is obtained by merging both streams into one.
Similarly to normal visits, only the first 500 packets of the merged stream are
kept.
In order to avoid the noise introduced by active content (Flash, etc.) and the
browser cache, active content and caching are disabled. For instance, Chrome
internet browser is used with the “incognito“ mode while firefox is used with the
”private“ mode.
Four Web browsers have been used for fetching websites, namely, Chrome
18.0, Firefox 15.0.1, Internet Explorer 9.0 and Konqueror 4.8.5.
The same experiment is performed four times, each with a different browser.
The experiment consists in 10 iterations. Every iteration consists in visiting all
100 websites once. Hence, every website is visited 10 times using the same web
browser yielding a maximum of 10 samples for very website13 .
6 Classification Algorithm
The goal of this paper is to compare popular web browsers with respect to their
capabilities to resist to web site fingerprinting. To this end, we use a classical
classifier, namely, decision trees. It is important to mention that recent related
work showed that better fingerprinting results can be obtained using other clas-
sifiers in particular Support Vector Machine (SVM) based. The next paragraphs
give an overview of the decision tree classifier and the techniques used to evaluate
the accuracy of the classifier for every web browser.
6.1 Decision Tree Classifier
Decision tree learning is a well known and classic classification technique [26]. It
is very popular because it is self-explanatory, easy to understand and to use since
it requires few parameter settings. It has been successfully used for classification
in several diverse areas. Overall, it is well suited for exploratory knowledge dis-
covery. In this paper we use a decision tree known as C4.5 [27] to classify website
visits traffic sequences.
A decision tree can be learned typically using a top-down approach where
each node corresponds to one of the input variables, each edge corresponds to
possible values of each variable, and each leaf correspond to a class label. Every
data set is split into subsets based on attributed values. This process is repeated
13
Some website visits resulted in less than 500 packets. These sequences are discarded
from the data set.
28
recursively and is called recursive patitioning. The recursion is completed when
splitting adds nothing to the prediction. Inducing a decision tree using a top-
down approach requires dealing with three other issues apart from selecting the
best attribute to use at each node in the tree. Firstly, one has to choose a splitting
threshold to form the two children for each node. Second, one needs a criterion
to determine when to stop growing the tree. Thirdly, the final issue is how to
decide what class label to assign for the terminal (leaf) node.
Classic strategies for splitting mainly focus on the use of impurity criteria,
e.g., information gain, gain ratio and gini index. C4.5 decision tree uses gain
ratio to select the best attribute and choose the optimal splitting threshold. In
this approach, the attribute value that provides the best gain ratio is chosen as
splitting threshold. To address the second issue discussed above, i.e., to avoid
difficulties in choosing a stopping rule, most decision tree induction algorithms
grow the tree to its maximum size where the terminal nodes are pure or almost
pure, and then selectively prune the tree. The class label of each of the terminal
nodes are typically decided based on the majority voting, i.e. the class label of
data instances that are major in terms of counting compared to the other classes
that contain in the respective terminal node.
6.2 Cross-Validation
To achieve a generalized performance of the decision tree used in this paper
a cross-validation (CV) scheme is applied. CV is a well known method to test
the performance of a classifier by varying training and test datasets [28]. CV is
a standard test commonly used to test the ability of the classification system
using various combinations of the testing and training data sets [29, 28, 30]. In
this method, classification is measured by systematically excluding some data
instances during the training process and testing the trained model using the
excluded instances [31]. The process is repeated to cover all the dataset as testing
dataset. In this paper, we have chosen 10-fold CV scheme where each time data
in 1 fold are applied as test data and the rest 9 folds are used to train the model.
6.3 Performance Metric
Classification accuracy is one of the widely used performance metric to evaluate
a classifier. Classification accuracy (ACC) is defined as the ratio of the number
of all samples that are classified correctly over the total number of samples
available (N).
ACC = (T P + T N )/N (1)
where, T P (True Positives) = the total data instances from positive class that
are classified as positive by the classifier; T N (True Negatives) = the total data
instances from negative class that are classified as negative by the classifier.
29
7 Web Browser Resistance to Fingerprinting
Popular web browsers differ in several aspects, in particular, they use different
web browser engines. The engine does most of the work of a web browser since it
retrieves the document corresponding to a given URL and handles links, cookies,
scripting, plug-ins loading, etc. The type of the web browser engine has an impact
on the shape of the observed (encrypted) traffic. For example, some web browser
engines may wait until all data is received before rendering a page while others
may begin rendering before all data is received.
In order to compare the resistance of popular web browsers to website fin-
gerprinting attacks, data is collected using different web browsers and then a
C4.5 decision tree classifier is used to evaluate the accuracy of the website fin-
gerprinting. More precisely, once the data about website visits is collected, we
evaluated the accuracy of website fingerprinting in four scenarios:
– Basic Packets Sequence: The traffic trace is the first 500 packets of the
TCP stream with the largest number of packets.
– Merged Streams: The traffic trace is the first 500 packets obtained by
merging all TCP streams with more than 200 packets. If only one TCP
stream has more than 200 packets, this scenario is the same as the first one.
– Rounded Packet Sizes: The same as the first scenario but the packet size
values are rounded to multiples of 600. For instance, a packet size of 512 is
rounded to 600 while a packet size of 743 is rounded to 1200.
– Merged Streams and Rounded Packet Sizes: This scenario is the com-
bination of the two previous scenarios.
14
Basic Packets Sequence
Merged Streams
Rounded Packet Sizes
12 Merged Streams and Rounded Packet Sizes
10
Classification Accuracy
8
6
4
2
0
Chrome Firefox IExplorer Konqueror
Fig. 3. Website Fingerprinting Accuracy for Common Web Browsers
30
Figure 3 shows the accuracy of the C4.5 classifier for each browser and for
each scenario. Using Firefox, Internet Explorer and Konqueror, more than 9% of
websites have been correctly identified by the classifier. With Chrome, however,
only 2% of websites have been successfully identified. The histogram shows also
that Rounding packet size values improves the efficiency of our classifier for all
browsers. Merging TCP streams, on the other hand, improved the efficiency of
the classifier only for Firefox and Konqueror. Merging streams resulted in lower
accuracy than the basic scenario for Chrome and Internet Explorer. The most
important result illustrated by the histogram is that Chrome browser offers a
better resistance to website fingerprinting than the other studied browsers. The
advantage Chrome browser has on the other studied browsers is expected to be
preserved in case a more efficient classifier (e.g. [15]) is used.
8 Conclusion
Website fingerprinting is a new attack on Tor anonymity system that tries to
reveal the identities of visited websites by recognizing patterns in the Tor traf-
fic. Compared to previous attacks on Tor, in particular confirmation attacks,
website fingerprinting does not require an attacker with extended capabilities.
Only the ability to sniff the Tor client encryted traffic is required. In this pa-
per we presented a detailed survey on website fingerprinting techniques which
recently reached high accuracy rates (80%) [15] on Tor anonymity system. The
main contribution of this paper, however, is an empirical analysis of how much
resistance popular web browsers provide against website fingerprinting. Four
notable web browsers have been considered, namely, Firefox, Chrome, Internet
Explorer, and Konqueror. The analysis showed that the resistance of Chrome
to website fingerprinting is five times better than the remaining web browsers.
Since most of existing fingerprinting techniques have been evaluated using Fire-
fox web browser [12, 14, 15], we expect the accuracy results to be reduced in case
Chrome browser is used.
There are two main mitigation approaches for the website fingerprinting at-
tack. The first and the most commonly used approach is padding where the
sender appends some random (dummy) bits to the actual data to obtain, for
instance, fixed size packets. It has been shown that padding reduces the accu-
racy of fingerprinting techniques only slightly [14, 15]. The second mitigation
approach is traffic camouflage which can be implemented in two ways. The first
variant is to obfuscate the actual traffic by loading several pages simultaneously.
Panchenko et al. [14] load a randomly chosen page whenever an actual website is
to be visited. The second variant is to disguise the actual traffic within a typical
encrypted cover protocol such as Skype voice over IP. The StegoTorus proxy [32]
constructs a database of pre-recorded packet traces from real sessions of a given
cover protocol. Then, when a Tor client visits a website, it choses randomly a
pre-recorded trace from the database which is used as a template to reshape the
actual traffic. The packet sizes of the pre-recorded trace are matched exactly
and the packet timings are matched “to the nearest millisecond”.
31
References
1. Dingledine, R., Mathewson, N., Syverson, P.: Tor : the second-generation onion
router. In: Proceedings of the 13th Usenix Security Symposium. (August 2004)
2. Berthold, O., Federrath, H., Köpsell, S.: Web MIXes: A system for anonymous
and unobservable Internet access. In: Proceedings of Designing Privacy Enhancing
Technologies, Springer-Verlag, LNCS 2009 (July 2000) 115–129
3. Murdoch, S., Danezis, G.: Low-cost traffic analysis of Tor. In: Proceedings of the
2005 IEEE Symposium on Security and Privacy, IEEE CS (May 2005)
4. Murdoch, S.: Hot or not: Revealing hidden services by their clock skew. In:
Proceedings of CCS 2006. (October 2006)
5. Hopper, N., Vasserman, E., Chan-Tin, E.: How much anonymity does network la-
tency leak? ACM Transactions on Information and System Security 13(2) (Febru-
ary 2010)
6. Evans, N., Dingledine, R., Grothoff, C.: A practical congestion attack on tor using
long paths. In: Proceedings of the 18th USENIX Security Symposium. (August
2009)
7. Mittal, P., Khurshid, A., Juen, J., Caesar, M., Borisov, N.: Stealthy traffic analy-
sis of low-latency anonymous communication using throughput fingerprinting. In:
Proceedings of the 18th ACM conference on Computer and communications secu-
rity. CCS ’11, New York, NY, USA, ACM (2011) 215–226
8. Hintz, A.: Fingerprinting websites using traffic analysis. In: Privacy Enhancing
Technologies (PETS),LNCS. Volume 2482., Springer (2002) 171–178
9. Sun, Q., Simon, D.R., Wang, Y.M., Russell, W., Padmanabhan, V.N., Qiu, L.:
Statistical identification of encrypted web browsing traffic. In: Proceedings of the
2002 IEEE Symposium on Security and Privacy. SP ’02, Washington, DC, USA,
IEEE Computer Society (2002) 19–
10. Bissias, G.D., Liberatore, M., Jensen, D., Levine, B.N.: Privacy vulnerabilities
in encrypted http streams. In: Proceedings of the 5th international conference
on Privacy Enhancing Technologies. PET’05, Berlin, Heidelberg, Springer-Verlag
(2006) 1–11
11. Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections.
In: Proceedings of the 13th ACM conference on Computer and communications
security. CCS ’06, New York, NY, USA, ACM (2006) 255–263
12. Herrmann, D., Wendolsky, R., Federrath, H.: Website fingerprinting: attacking
popular privacy enhancing technologies with the multinomial naïve-bayes
classifier. In: Proceedings of the 2009 ACM workshop on Cloud computing security.
CCSW ’09, New York, NY, USA, ACM (2009) 31–42
13. Shi, Y., Matsuura, K.: Fingerprinting attack on the tor anonymity system. In
Qing, S., Mitchell, C., Wang, G., eds.: Information and Communications Security.
Volume 5927 of Lecture Notes in Computer Science., Springer Berlin Heidelberg
(2009) 425–438
14. Panchenko, A., Niessen, L., Zinnen, A., Engel, T.: Website fingerprinting in onion
routing based anonymization networks. In: Proceedings of the 10th annual ACM
workshop on Privacy in the electronic society. WPES ’11, New York, NY, USA,
ACM (2011) 103–114
15. Cai, X., Zhang, X.C., Joshi, B., Johnson, R.: Touching from a distance: website
fingerprinting attacks and defenses. In: Proceedings of the 2012 ACM conference
on Computer and communications security. CCS ’12, New York, NY, USA, ACM
(2012) 605–616
32
16. Safe Web: Safeweb proxy. ”http://www.safeweb.com”
17. Rijsbergen, C.J.V.: Information Retrieval. 2nd edn. Butterworth-Heinemann, New-
ton, MA, USA (1979)
18. Bracewell, R.: Pentagram Notation for Cross Correlation. The Fourier Transform
and Its Application. McGraw-Hill, New York, USA (1965)
19. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech-
niques. Second edn. Morgan Kaufmann (June 2005)
20. Weka Tool: Weka: Data mining software in java. ”www.cs.waikato.ac.nz/ml/
weka”
21. Alexa Website: Alexa: The web information company. ”www.alexa.com”
22. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and re-
versals. Soviet Physics Doklay (1966)
23. Navarro, G.: A guided tour to approximate string matching. ACM Computing
Surveys 33 (1999) 2001
24. Massimiliano Montoro: Cain & abel. ”http://www.oxid.it/cain.htm”
25. ALor and NaGA: Ettercap. ”http://ettercap.sourceforge.net”
26. Kotsiantis, S.B.: Supervised machine learning: A review of classification techniques.
Informatica 31 (2007) 249–268
27. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA (1993)
28. Barton, J., Lees, A.: An application of neural networks for distinguishing gait
patterns on the basis of hip-knee joint angle diagrams. Gait & Posture 5(1) (1997)
28 – 33
29. Ding, C.H.Q., Dubchak, I.: Multi-class protein fold recognition using support
vector machines and neural networks. Bioinformatics 17 (2001) 349–358
30. Hassan, M.R., Begg, R., Taylor, S., Kumar, D.K.: Hmm-fuzzy model for recognition
of gait changes due to trip-related falls. In: Proceeding of the IEEE Eng Med Biol
Soc., Springer Berlin Heidelberg (2006) 1216–1219
31. Begg, R.K., Palaniswami, M., Owen, B.: Support vector machines for automated
gait classification. IEEE Transactions on Biomedical Engineering 52 (2005) 828–
838
32. Weinberg, Z., Wang, J., Yegneswaran, V., Briesemeister, L., Cheung, S., Wang,
F., Boneh, D.: Stegotorus: a camouflage proxy for the tor anonymity system.
In: Proceedings of the 2012 ACM conference on Computer and communications
security, New York, NY, USA, ACM (2012) 109–120
33
1 1 2
1
2
34
C = hM, ⇠, |=, K, Lw , p.qi
• M hN ⌃i N
⌃
enc dec pair
M = ThN ,⌃i (X )
X ! M A M, A(m)
m I
I k 1 k (k 1 ) 1 = k
• ⇠
⌃ dec(enc(x, y), y 1 ) = x
• |= M
m M |= m
m M ⇢ |= m
m ⇢
• K I M
KC (I)
K(I)
• Lw (L, w, t, u, ?, >)
(2I , ✓, \, [, I, ;)
↵
• p.q
M M m pM q w pmq 9m0 2
M.pm0 q w pmq
p RG (p)
p
I
N SL
m1 : A ! B : {Na .A}kb
m2 : B ! A : {B.Na }ka .{B.Nb }ka
m3 : A ! B : A.B.{Nb }kb
RG ( )=
{AG , AG , BG , BG }
AG = i.1 A ! I(B) {Nai .A}kb
AG = i.1 A ! I(B) {Nai .A}kb
i.2 I(B) !A {B.Nai }ka .{B.X}ka
i.3 A ! I(B) A.B.{X}kb
BG = i.1 I(A) !B {Y.A}kb
i.2 B ! I(A) {B.Y }ka .{B.Nbi }ka
BG = i.1 I(A) !B {Y.A}kb
i.2 B ! I(A) {B.Y }ka .{B.Nbi }ka
i.3 I(A) !B A.B.{Nbi }kb
35
MGp RG (p) Mp
MGp R+
R
R
M
m R r
R.r r
[[p]] p
C
F
F C
8M, M1 , M2 ✓ M, 8↵ 2 A(M)
8
< F (↵, {↵}) =?
F (↵, M1 [ M2 ) = F (↵, M1 ) u F (↵, M2 )
:
F (↵, M ) = >, ↵ 2 / A(M )
↵ M
? ↵ M
u
> ↵ M
m1 = {↵}, m2 = { .A}kab M = {m1 , m2 } F
• F (↵, M ) = F (↵, {m1 } [ {m2 }) = F (↵, {↵}) u F (↵, { .A}kab )= ? u > = ?
• F ( , M )= F (↵, {m1 } [ {m2 })= F ( , {↵}) u F ( , { .A}kab )= > u F ( , { .A}kab )
= F ( , { .A}kab )
F
F C
8M ✓ M, m 2 M.M |=C m ) 8↵ 2 A(m).(F (↵, m) w F (↵, M )) _ (pK(I)q w p↵q)
F
↵ M
m M
↵ F (↵, m) w F (↵, M )
↵ pK(I)q w p↵q
36
F C
F C F F C.
F
F p
p F C
8R.r 2 RG (p), 8 2 : X ! Mp
8↵ 2 A(Mp ).F (↵, r+ ) w p↵q u F (↵, R )
F
F F
F
p
p ↵ 2 A(M) C
9⇢ 2 [[p]].(⇢ |=C ↵) ^ (pK(I)q 6w p↵q)
[[p]]
K(I) C ↵
pK(I)q 6w p↵q
F C p F
8m 2 M.[[p]] |=C m ) 8↵ 2 A(m).(F (↵, m) w p↵q) _ (pK(I)q w p↵q).
⌅
↵
F C p F
p C
⌅
37
EC EC ⌃\EC
C keys
↵ m
M ✓M f 2⌃ m2M
Keys : A ⇥ M ! P(P(A))
8t1 , t2 ...tn m
Keys(↵, ↵) = {;}
Keys(↵, ) =; ↵ 6= 2A
n
Keys(↵, fk (t1 , ..., tn )) = {k} ⌦ [ Keys(↵, ti ), fk 2 EC
i=1
n
Keys(↵, f (t1 , ..., tn )) = [ Keys(↵, ti ), f 2 EC
i=1
Keys
8M ✓ M.Keys(↵, M ) = [ Keys(↵, m) Keys(↵, ;) = ;.
m2M
⇠
!⇠
8m 2 M, 8↵ 2 A(m), 8l ! r 2!⇠ , Keys(↵, r) ✓ Keys(↵, l)
m+ m !⇠
38
e(k, d(k 1 , m)) ! m
Access Access(↵, m)
↵ m
M ✓M f 2⌃ m2M
Access : A ⇥ M ! P(P(A))
8t1 , t2 ...tn m
Access(↵, ↵) = {;}
Access(↵, ) =; ↵ 6= 2A
n
1
Access(↵, fk (t1 , ..., tn )) = {k } ⌦ [ Access(↵, ti ), fk 2 EC fk (t1 , ..., tn ) = fk (t1 , ..., tn )+
i=1
n
Access(↵, f (t1 , ..., tn )) = [ Access(↵, ti ), f 2 EC fk (t1 , ..., tn ) = fk (t1 , ..., tn )+
i=1
Access(↵, f (t1 , ..., tn )) = Access(↵, f (t1 , ..., tn )+ )
Access
8M ✓ M.Access(↵, M ) = [ Access(↵, m) Access(↵, ;) = ;.
m2M
m m = {{A.D.↵}kab .↵.{A.E.{C.↵}kef }kab }kac ;
Access(↵, m) = {{kac1 , kab1 }, {kac1 }, {kac1 , kab1 , kef1 }}
↵ p↵q A ? k
pk 1 q w p↵q
C m2M M ✓M ↵ 2 A(m) p↵q A ?
↵ m
8K 2 Access(↵, m).pKq w p↵q
↵ M
8m 2 M.↵ m
m C
8↵ 2 A(m).↵ m
M C
8m 2 M.m C.
Clear(m) Clear(m)
m
39
Clear
m2M M ⇢M
Clear(m) = {↵ 2 A(m)| ; 2 Access(↵, m)}
Clear(M ) = [ Clear(m)
m2M
M M
M |=C m ) 8↵ 2 A(m).(↵ m) _ (pK(I)q w p↵q)
⌅
pK(I)q 6w p↵q
p↵q A ? k pk 1 q w p↵q
k 1 ↵
k 1 M k0
pk q w pk 1 q
0 1
M M ↵ M
M |=C ↵ ) pK(I)q w p↵q.
⌅
F
p p F
Mp F
Mp
Mp
↵ m Mp p↵q
F (↵, m) = ?
M, M1 , M2 ✓ M M, M1 M2
S : A ⇥ M 7 ! 2A
S C
8
< S(↵, {↵}) = A,
S(↵, M1 [ M2 ) = S(↵, M1 ) [ S(↵, M2 ),
:
S(↵, M ) =; ↵2
/ A(M )
40
↵ M
M M = {↵}
M
M ✓M M
S : A ⇥ M 7 ! 2A
S C
8M ✓ M, m 2 M,
M |=C m ) 8↵ 2 A(m).(S(↵, m) ✓ S(↵, M )) _ (pK(I)q w p↵q)
m M
M
m M
M
S : A ⇥ M 7 ! 2A C
S C S S C.
EK
SGen
C
EK
SGen
EK
SGen S
• S(↵, ↵) = A;
• S(↵, m) = ; ↵ 62 A(m);
• 8↵ 2 A(m), m = fk (m1 , ..., mn ) :
S(↵, m) ✓ ( [ A(mi )[{k 1 }\{↵}) fk 2 EC pk 1 q w p↵q m = m+
1in
• 8↵ 2 A(m), m = f (m1 , ..., mn ) :
8
>
> [ S(↵, mi ) fk 2 EC pk 1 q 6w p↵q m = m+
<1in
S(↵, m) = [ S(↵, mi ) f 2 EC m = m+
>
> 1in
:
S(↵, m+ ) m 6= m+
• S(↵, {m} [ M ) = S(↵, m) [ S(↵, M )
41
↵ m = fk (m1 , ..., mn ) S
✓ ↵ m
k k 1
↵ [ A(mi )[{k 1 }\{↵}
1in
pk 1 q w p↵q ↵ m
M
EK
SGen
EK
S 2 SGen C
!⇠ 8m 2 M, 8↵ 2 A(m) ^ ↵ 62 Clear(m),
8l ! r 2!⇠ , S(↵, r) ✓ S(↵, l)
S C
⌅
!⇠
S
S
m = {↵.C}kab
{↵.C}kab = {↵}kab .{C}kab {↵.C}kab S(↵, {↵.C}kab )
C {↵}kab .{C}kab S(↵, {↵}kab .{C}kab )
!⇠
{↵}kab .{C}kab {↵.C}kab C
S
EK
SGen
42
S
↵ m p↵q = {A, B}
m = {A.C.↵.D}kab S1 , S2 S3 S1 (↵, m) = {kab1 }
S2 (↵, m) = {A, C, kab1 } S3 (↵, m) = {A, C, D, kab1 } C
EK
SGen
EK
SGen
EK
SGen
EK
SGen
EK EK
SM AX : SGen
m
k k 1
EK EK
SEK : SGen
m
EK EK
SN : SGen
↵ m p↵q = {A, C} m ={{{↵.C}kab .B}kac .D}kad
EK 1 EK 1 EK
SM AX (↵, m) = {C, B, kac }; SEK (↵, m) = {kac }; SN (↵, m) = {C, B}
C
(2A )✓ 7! (
Lw
> M =;
M 7! u (↵)
↵2M
⇢
{↵} ↵2I
(↵) =
p↵q
EK EK EK EK
FM AX = SM AX FEK = SEK FNEK = EK
SN C
⌅
43
✓ w
[ u
S S
EK
SGen
↵ m kab p↵q = {A, B} m = {A.C.↵.D}kab
kab1 = kab pkab q = {A, B}
EK
SEK (↵, m) = {kab1 } SNEK
(↵, m) = {A, C, D} SMEK 1
AX (↵, m) = {A, C, D, kab }
EK
FEK (↵, m) = SEK (↵, m) = pkab q = {A, B} FN (↵, m) =
EK 1 EK EK
SN (↵, m) = {A, C, D}
AX (↵, m) = {A, C, D}upkab q = {A, C, D} u {A, B} = {A, C, D, B}
EK EK 1
FM AX (↵, m) = SM
m, m1 , m2 2 M X m = V ar(m) S1 , S2 ✓ 2Xm ↵ 2 A(m); X, Y 2 Xm ✏
@X ✏ =✏
@X ↵ =↵
@X X =✏
@X Y =Y
@X f (m) = f (@X m), f 2 EC [ E C
@{X} m = @X m
@[X]m = @{Xm \X} m
@S1 [S2 m = @S2 [S1 m @S1 @S2 m @S2 @S1 m
@m @ Xm m
@
44
@X m X m @[X]m
X m X m @m
m
m = {A.X.Y }kab A kab X Y
@m = {A}kab ; @X m = {A.Y }kab ; @[X]m = {A.X}kab
m 2 MGp X 2 Xm m
↵ 2 A(m ) 2
8
<> ↵ 62 A(m ),
F (↵, @[↵]m ) = F (↵, @m) ↵ 2 A(@m),
:
F (X, @[X]m) ↵ 2 A(X ) ^ ↵ 2
/ A(@m).
m
↵
@m F (↵, @[↵]m ) m F (↵, @m)
X m
F (↵, @[↵]m )
F (X, @[X]m) m
EK
F SGen F (↵, @[↵]m )
m ↵
F (↵, m ) F (↵, @[↵]m )
m1 m2 p m1 =
{↵.B.X}kad m2 = {↵.Y.C}kad p↵q = {A, D} m = {↵.B.C}kad
p
(
EK {B, A, D} m X C m1
FM AX (↵, @[↵]m) =
{A, D, C} m Y B m2
EK
FM AX (↵, @[↵]m) m
m 2 MGp X 2 Xm m
p F C
Wp,F ↵ 2 A(m ) 2
Wp,F (↵, m ) = u F (↵, @[↵]m0 0 )
m0 2MG
p
9 0 2 .m0 0 =m
Wp,F F
EK
SGen .
MGp p
MGp
45
MGp
m
fk 2 E C m
{m0 2 MGp |9 0 2 .m0 0 = m } MGp
F
Wp,F
Wp,F F
⌅
F
EK
SGen
EK
F SGen
MGp = {{↵.B.X}kad , {↵.Y.C}kad , {A.Z}kbc }; m1 = {↵.B.C}kad V ar(MGp ) = {X, Y, Z}.
Wp,FM
EK (↵, m1 )
AX
=
EK 0 0 EK 0 0
u FM AX (↵, @[↵]m )= u FM AX (↵, @[↵]m )
m0 2MG
p
{{↵.B.X}k
ad
,{↵.Y.C}k
ad
}
9 0 2 .m0 0 =m 0 ={X7 !C,Y 7 !B}
1
= {Wp,FMEK
AX
EK EK
FM AX (↵, @[↵]{↵.B.X}kad [X 7 ! C])uFM AX (↵, @[↵]{↵.Y.C}kad [Y 7 ! B])
=
EK EK
FM AX (↵, {↵.B}kad ) u FM AX (↵, {↵.C}kad )
EK
= FM AX }
{B, A, D} [ {C, A, D} = {B, A, D, C}
m 2 MGp Wp,F
8 2 , 8↵ 2 A(Mp )
F (↵, @[↵]m) w Wp,F (↵, m ) w u F (↵, @[↵]m0 0 )
m0 2MG
p
9 0 =mgu(m0 ,m)
46
⌅
Wp,F
p
8R.r 2 RG (p), 8↵ 2 A(r+ )
u F (↵, @[↵]m0 0 ) w p↵q u F (↵, @[↵]R )
m0 2MG
p
9 0 =mgu(m0 ,r + )
⌅
N SL
pAq = ? pBq = ? pNai q = {A, B}
A B pNbi q = {A, B} A B pka 1 q = {A}
pkb q = {B} (L, w, t, u, ?, >) = (2 , ✓, \, [, I, ;) I = {I, A, B, A1 , A2 ,B1 , B2 , ...}
1 I
MGp = {{NA1 .A1 }kB1 , {B2 .NA2 }kA2 ,
{B3 .X1 }kA3 , {X2 }kB4 , {Y1 .A4 }kB5 , {B6 .Y2 }kA5 , {B7 .NB7 }kA6 , {NB8 }kB8 }
X 1 , X2 , Y 1 Y2
N A 1 A1 k B 1 B 2 N A 2 k A 2 B 3 k A 3 k B 4 A4 k B 5 B 6
k A 5 B 7 N B7 k A6 N B 8 k B8
MGp = {{NA1 .A1 }kB1 , {B2 .NA2 }kA2 , {B3 .X1 }kA3 , {X2 }kB4 ,
{Y1 .A4 }kB5 , {B7 .NB7 }kA6 , {NB8 }kB8 }
EK
p = N SL F = FM AX Wp,F (↵, m ) = u F (↵, @[↵]m0 0 )
m0 2MG p
9 0 .m0 0 =m
0
Wp,F (↵, r+ ) = u F (↵, @[↵]m0 0 )
m0 2MG
p
9 0 =mgu(m0 ,r + )
47
A
p A
Si Sj j >i Si A
{Nai .A}kb Sj
{B.Nai }ka .{B.X}ka A.B.{X}kb
2 {B.Nai }ka .{B.X}ka
Si : i
Sj :
{Na .A}kb A.B.{X}kb
Si
Nai
rS+i = {Nai .A}kb
8Na .{m 2 Mp |9 0 = mgu(m0 , rS+i )}
i 0 G
= 8Nai .{m0 2 MGp |9 0 = mgu(m0 , {Nai .A}kb )}
= {({NA1 .A1 }kB1 , 10 ), ({X2 }kB4 , 20 ), ({Y1 .A4 }kB5 , 0
3 )}
8 0 i
< 1 = {NA1 7 ! Na , A1 7 ! A, kB1 7 ! kb }
0 i
2 = {X2 7 ! Na .A, kB4 7 ! kb }
: 0 i
3 = {Y1 7 ! Na , A4 7 ! A, kB5 7 ! kb }
0
Wp,F (Nai , {Nai .A}kb )
={ }
F (Nai , @[Nai ]{NA1 .A1 }kB1 10 ) u F (Nai , @[Nai ]{X2 }kB4 20 )u F (Nai , @[Nai ]{Y1 .A4 }kB5 0
3)
={ }
F (Nai , @[Nai ]{Nai .A}kb 10 ) u F (Nai , @[Nai ]{X2 }kb 20 )u F (Nai , @[Nai ]{Y1 .A}kb 30 )
={ 43}
F (Nai , {Nai .A}kb ) u F (X2 , @[X2 ]{X2 }kb )u F (Y1 , @[Y1 ]{Y1 .A}kb )
={ 41}
F (Nai , {Nai .A}kb ) u F (X2 , {X2 }kb ) u F (Y1 , {Y1 .A}kb )
EK
={ F = FM AX }
{A, B} [ {B} [ {A, B} = {A, B}
RS i = ;
F (Nai , @[Nai ];) = F (Nai , ;) = >
0
Wp,F (Nai , {Nai .A}kb ) = {A, B} w pNai q u F (Nai , ;) = {A, B}
Si
Sj
X
rS+j = A.B.{X}kb
0 0 0 0 0
Wp,F (X, A.B.{X}kb ) = Wp,F (X, A)uWp,F (X, B)uWp,F (X, {X}kb ) = >u>uWp,F (X, {X}kb ) =
0
Wp,F (X, {X}kb )
8X.{m0 2 MGp |9 0 = mgu(m0 , {X}kb )} = {({X2 }kB4 , 10 )}
0
1 = {X2 7 ! X, kB4 7 ! kb }
0
Wp,F (X, {X}kb )
={ }
48
F (X, @[X]{X2 }kB4 10 )
={ }
F (X, @[X]{X2 }kb 10 )
={ 43}
F (X2 , @[X2 ]{X2 }kb )
={ 41}
F (X2 , {X2 }kb )
EK
={ F = FM AX }
{B}
RS j = {B.Nai }ka .{B.X}ka
F (X, @[X]{B.Na }ka .{B.X}ka ) =F (X, @[X]{B.Nai }ka ) u F (X, @[X]{B.X}ka ) =
i
F (X, {B.Nai }ka ) u F (X, {B.X}ka ) = > u {A, B} = {A, B}
0
Wp,F (X, A.B.{X}kb ) = {B} w pXq u F (X, @[X]{B.Nai }ka .{B.X}ka ) = pXq [ {A, B}
Sj
A
B
i
p B S0
{Y.A}kb {B.Y }ka .{B.Nbi }ka
i {Y.A}kb
S0 :
{B.Y }ka .{B.Nbi }ka
Nbi
r+0 i = {B.Y }ka .{B.Nbi }ka
S
0 0 0
Wp,F (Nbi , {B.Y }ka .{B.Nbi }ka ) =Wp,F (Nbi , {B.Y }ka ) u Wp,F (Nbi , {B.Nbi }ka ) =
0 i i 0 i i
> u Wp,F (Nb , {B.Nb }ka ) = Wp,F (Nb , {B.Nb }ka )
8Nbi .{m0 2 MGp |9 0 = mgu(m0 , {B.Nbi }ka )}
= {({B3 .X1 }kA3 , 10 ), ({X2 }kB4 , 20 ), ({B7 .NB7 }kA6 , 30 )}
8 0
< 1 = {B3 7 ! B, X1 7 ! Nbi , kA3 7 ! ka }
0
2 = {X2 7 ! B.Nbi , kB4 7 ! ka }
: 0
3 = {B7 7 ! B, NB7 7 ! Nbi , kA6 7 ! ka }
0
Wp,F (Nbi , {B.Nbi }ka )
={ }
F (Nbi , @[Nbi ]{B3 .X1 }kA3 10 ) u F (Nbi , @[Nbi ]{X2 }kB4 20 )uF (Nbi , @[Nbi ]{B7 .NB7 }kA6 0
3)
={ }
F (Nb , @[Nb ]{B.X1 }ka 1 ) u F (Nb , @[Nbi ]{X2 }ka 20 )uF (Nbi , @[Nbi ]{B.Nbi }ka 30 )
i i 0 i
={ 43}
F (X1 , @[X1 ]{B.X1 }ka ) u F (X2 , @[X2 ]{X2 }ka )uF (Nbi , @[Nbi ]{B.Nbi }ka )
={ 41}
F (X1 , {B.X1 }ka ) u F (X2 , {X2 }ka ) u F (Nbi , {B.Nbi }ka )
EK
={ F = FM AX }
{A, B} [ {A} [ {A, B} = {A, B}
49
R i = {Y.A}kb
S0
F (Nbi , @[Nbi ]{Y.A}kb ) = >
Y
r+0 i = {B.Y }ka .{B.Nbi }ka
S
0 0 0
Wp,F (Y, {B.Y }ka .{B.Nbi }ka ) = Wp,F (Y, {B.Y }ka ) u Wp,F (Y, {B.Nbi }ka ) =
0 0
Wp,F (Y, {B.Y }ka ) u > = Wp,F (Y, {B.Y }ka )
8Y.{m0 2 MGp |9 0 = mgu(m0 , {B.Y }ka )} = {({B3 .X1 }kA3 , 1 ), ({X2 }kB4 , 2 )}
⇢ 0
1 = {B3 7 ! B, X1 7 ! Y, kA3 7 ! ka }
0
2 = {X2 7 ! B.Y, kB4 7 ! ka }
0
Wp,F (Y, {B.Y }ka )
={ }
F (Y, @[Y ]{B3 .X1 }kA3 10 ) uF (Y, @[Y ]{X2 }kB4 20 )
={ }
F (Y, @[Y ]{B.X1 }ka 10 ) uF (Y, @[Y ]{X2 }ka 20 ) =
={ 43}
F (X1 , @[X1 ]{B.X1 }ka ) uF (X2 , @[X2 ]{X2 }ka )
={ 41}
F (X1 , {B.X1 }ka ) u F (X2 , {X2 }ka )
EK
={ F = FM AX }
{A, B} [ {A} = {A, B}
R 0 i = {Y.A}kb
S
F (Y, @[Y ]{Y.A}kb ) = {A, B}
0
Wp,F (Nbi , {B.Y }ka .{B.Nbi }ka ) = {A, B} w pNbi q u F (Nbi , @[Nbi ]{Y.A}kb ) = {A, B}
0
Wp,F (Y, {B.Y }ka .{B.Nbi }ka ) = {A, B} w pY q u F (Y, @[Y ]{Y.A}kb ) = pY q u {A, B}
i
S0
B
N SL
50
1 1
S(↵, {{E.↵}kAB }kCD ) = {E, kAB , kCD }
51
52
Formal Enforcement of Security Policies on
Choreographed Services
Mahjoub Langar and Karim Dahmani
LIP2 Research Laboratory, Facult◆e des Sciences de Tunis, Tunisia
Abstract. Web services are software systems that support distributed
applications composed of independent processes which communicate by
message passing. To realize the full potential of web services, we need
to compose existent web services in order to get more functionalities.
However the composition of web services should be secure. In this paper
we propose an automatic formal approach to monitor the execution of a
choreography of web services and we prove its correctness. We introduce
the syntax and semantic rules of a new operator which takes as input
a choreography and a security policy and produces as output a secure
version of this choreography which behaves like the original one and does
not violate the security policy.
Keywords: Choreographed services, web service security, formal veri cation,
runtime veri cation
1 Introduction
With the explosive growth of the Internet, intranet and electronic commerce,
the concept of web services has emerged in the world to be exploited by most
cross-organizational boundaries. In fact, web services provide a suitable techni-
cal foundation for making business processes accessible within and across en-
terprises. Moreover web services are software systems that support distributed
applications composed of independent processes which communicate by message
passing. But for an appropriate exploitation of web services, we need to com-
pose them. Indeed, individual web services often o↵er limited capabilities. In
some situations, a client's request cannot be satis ed by a single web service.
However, when composed with other web services, they can satisfy the client
demand. To realize the full potential of web services, we need to compose exis-
tent web services in order to get more functionalities. Web service composition
rules deal with how di↵erent services are composed into a coherent global ser-
vice. In particular, they specify the order in which services are invoked, and the
conditions under which a certain service may or may not be invoked. Among
the approaches investigated in service composition, we distinguish orchestra-
tion and choreography. The orchestration composes available services and adds
a central coordinator (the orchestrater) which is responsible for invoking and
composing the single sub-activities. However the second one, referred to as web
53
service choreography, does not assume the exploitation of a central coordinator
but it de nes complex tasks via the de nition of the conversation that should be
undertaken by each participant. While several proposals exist for orchestration
languages (e.g. Business Process Execution Language (BPEL)[1], Web Services
Flow Language (WSFL)[2]), choreography languages are still in a preliminary
stage of de nition. A proposal, Web Service Choreography Description Language
(WS-CDL)[3], was issued by the World Wide Web Consortium (W3C) in De-
cember 2004.
The need of secure web service composition has led to a great interest from
researchers of the last decade. However most of the works focus on how to guar-
antee desired properties such as correctness, deadlock avoidance, etc. In our
work, in addition to these properties, we introduce an automatic approach to
monitor a choreography by specifying and enforcing security policies on web ser-
vices and we prove its correctness using formal methods. Some works such as
[4,5] have enforced security policies on concurrent systems using process alge-
bra. Formal methods are mathematical techniques that are well-suited to address
the aforementioned issues. A variety of proposals to formally describe, compose
and verify web service compositions using the formal methods exists. In [6], a
Petri-net based design and veri cation framework for web service composition
is proposed. In [7], the authors introduce a Petri-net based algebra to compose
web services based on control ows. In [8], authors use Milner's process algebra
CCS to specify and compose web services as processes, and the Concurrency
Workbench to validate properties such as correct web service composition. In
[9], the authors provide an encoding of BPEL processes into web service timed
state transition systems, a formalism that is closely related to timed automata
and discuss a framework in which timed assumptions can be model checked.
In this paper, we propose an enforcement of security policies over web ser-
vices using formal methods. We present an automated approach for monitoring
the behavior of a service in a choreography. More precisely, we de ne an operator
⌦ that takes as input a process P and a security policy P and generates a new
process P 0 = P ⌦ P which respects the following conditions :
✏ P 0 js P ; i.e. P 0 satis es the security policy P :
✏ P 0 v P; i.e. behaviors of P 0 are also behaviors of P:
✏ 8Q : ((Q js P ) and (Q v P )) =) Q v P 0 , i.e. all good behaviors of P
are also behaviors of P 0 :
54
P P
⌦
P0 = P ⌦ P
The rest of the paper is structured as follows. In section 2, we brie y describe
the global calculus. Section 3 presents the formalism used to specify security poli-
cies. Section 4 illustrates the formalism used to specify choreographed services.
In section 5, we present our security framework for monitoring a choreography.
In section 6, we discuss some related works while conclusions are in section 7.
2 Global Calculus
The global calculus [10] is distilled from WS-CDL. It describes the global ow of
interactions between the participants. The syntax of the global calculus is given
by the standard BNF.
2.1 Syntax of the Global Calculus
I ::= A ! B : ch(⌫ s~):I (Init)
j A ! B : shop; e; yi:I (Comm)
j x@A := e:I (Assign)
j if e@A then I1 else I2 (If T henElse)
j I1 + I2 (Sum)
j I1 jI2 (P ar)
j (⌫s)I (N ew)
j XA (recV ar)
j rec X A :I (Rec)
j0 (Inaction)
The rst rule (Init) says that the participant A invokes a service ch located
at B and initiates new freshly generated session channels s~: The second rule
(Comm) expresses the sending action by A whose message consists of a selected
operator op and an expression e to the participant B which will store the value
of e at the variable y: (Assign) is a local construct which updates the variable x
located at A with e: (P ar) and (Sum) are respectively the parallel composition
and the non-deterministic choice of interactions. (If T henElse) is the standard
conditional operation, while (recV ar) and (Rec) are used to express recursive
behavior of interactions. 0 is the inactive interaction.
55
2.2 Semantics of the Global Calculus
The formal semantics of the global calculus is de ned in terms of con gurations
( ; I): The notation ( ; I) ! ( 0 ; I 0 ) says that the global description I at a state
(which is the collection of all local states of the participants) will reduce to I 0
at a new state 0 : Samples of reduction rules are presented here while the overall
operational semantic is detailed in [10].
(Init)
( ; A ! B : ch(⌫ s~):I) ! ( ; (⌫ s~)I)
` e@A + v
(Comm)
( ; A ! B : shop; e; xi:I) ! ( [x@B 7! v]; I)
` e@A + v ( ; I1 ) ! ( 0 ; I10 )
(Assign) (P ar)
( ; x@A := e:I) ! ( [x@A 7! v]; I) ( ; I1 jI2 ) ! ( 0 ; I10 jI2 )
( ; I1 ) ! ( 0 ; I10 ) ( ; I[rec X A :I=X A ]) ! ( 0 ; I 0 )
(Sum) 0 0 (Rec)
( ; I 1 + I 2 ) ! ( ; I1 ) ( ; rec X A :I) ! ( 0 ; I 0 )
The rst rule is for initiation : A invokes a service located at B and initiates
session channels s~ which will be shared locally by A and B: The second rule is for
communication. The notation ` e + v is an evaluation judgment : evaluates
the expression e to the value v: [x@B 7! v] is the resulting state of updating
the local variable x at B by v: The evaluation judgment is also used in the
assignment rule. The fourth rule is for parallelism. The (Sum) rule describes the
behavior of a non-deterministic choice. The (Rec) rule says that if the unfolding
of rec X A :I under reduces to I 0 then rec X A :I under will reach ( 0 ; I 0 ):
Example 1. This example shows a communication between a buyer and a seller.
These participants share new freshly generated session channels B2Sch and
S2Bch: Through S2Bch; the seller o↵ers a quote to the buyer. Through B2Sch;
the buyer selects one of the two options o↵ered by the seller, QuoteAccept and
QuoteReject. If the rst option is selected, the buyer sends the quote "100"
which will be stored in x by Seller and continues with the interaction I1 : In the
other case, the seller sends the abort number stored in the variable xAbortN o
which will be stored in y by the Seller and terminates.
Buyer ! Seller : ch(⌫B2Sch; S2Bch):
Seller ! Buyer : S2Bchhquote; 100; yi:
if y@Buyer 1000 then
f Buyer ! Seller : B2SchhquoteAccept; 100; xi:I1 g
else
f Buyer ! Seller : B2SchhquoteReject; xabortN o ; xi:0 g
3 Security Policy Speci cation
The language that we will use for the speci cation of security policies is EP C
which is a subset of EP C (End-point calculus). The End-Point Calculus pre-
cisely identi es a local behavior of each participant in a web service. End-Point
56
Calculus is inspired from the ⇡-calculus. It is an applied variant of ⇡-calculus
augmented with the notion of participants and their local states. In [10], au-
thors have established a projection from the global calculus into the end-point
calculus. So descriptions of participants behaviors in end-point calculus are not
extracted directly from the choreography, but projected from the global calculus.
Details of this theory of end-point projection are presented in [10]. Since in the
proposed enforcement approach we transform the security policy to a monitor,
we reach immediately this goal by choosing EP C since it is a subset of EP C:
3.1 Syntax of EP C
We show the syntax of EP C ; where s denote session channels, e an expression
which can be an atomic value (such as a natural number or a boolean value) or
a function (such as arithmetic functions and boolean operators) or a variable.
op1 ; op2 ; ::: range over operations. x is a variable. The rst construct is a receiving
action, it is the invocation of one of the operators opi and reception of an expres-
sion which will be evaluated and stored in xi : The second construct is a sending
action, it is the invocation of operator op with expression e: Furthermore, the
operator " " represents the alternative composition. Finally, recursion is used
for representing unbounded repetition. For representing recursive behaviors, we
use term variables X; Y; : : : The operator used for recursion is rec X:P: Each
occurrence of X in P denotes a recurring point. rec X:P behaves as P until an
occurrence of X is found in the execution of P; then it will return to rec X:P:
s B ⌃ opi (xi ):Pi ; s C ophei:P; P1 P2 ; X; rec X:P; 0
i
3.2 Semantics of EP C
Note that processes do not evolve alone. A process is located in a participant
which synchronize with another one to evolve. A participant A with its behavior
P at a local state is called a network and denoted by A[P ] : Syntax of networks
is given by the following grammar :
N ::= A[P ] (Participant)
j N jM (Parallel-NW)
j (⌫s)N (Res-NW)
j ✏ (Inaction-NW)
where A[P ] is a participant with its behavior as shown in the rst construct.
Two communicating participants are represented by two networks combined by
parallel composition. When two participants initiate a session channel for com-
munication, this session channel must be restricted to these two participants,
this is given by (Res-NW): ✏ denote the lack of networks. Reduction rules of
networks are given by :
M ⌘ M0 M0 ! N0 N0 ⌘ N
Struct-NW
M !N
57
M ! M0 M ! M0
Par-NW Res-NW
M jN ! M 0 jN (⌫s)M ! (⌫s)M 0
Semantics of EP C is then given by :
2 `e+v
A[s C opj hei:P ] 1 jB[s B ⌃ opi (xi ):Qi ] 2 ! A[P ] 1 jB[Qj ] 2 [xj 7!v]
i
A[P1 ] ! A[P10 ] 0 A[P [rec X:P=X]] ! A[P 0 ] 0
A[P1 P2 ] ! A[P10 ] 0 A[rec X:P ] ! A[P 0 ] 0
Security policies are usually speci ed in logic-based languages. Such languages
employ mainly three operators to compose properties : and, or and not: The
or operator is given here by : The not operator is de ned here as follows :
assume we have two participants A and B communicating and we want to apply
the property "A should not send op1 to participant B", then we write it using
EP C as follows :
P = rec X:(sB C opi hei i:X sB B ⌃ opi (xi ):X)
i6=1 i
Example 2. Assume we have a client wanting to check its account details within
a bank. The bank should not send him back his account details if he is not
yet authenticated. A client is said to be authenticated if he has received from
the bank an acceptance for his authentication's attempt using the operation
accept: The bank answers the client for his account request through the operation
resAccount: The security property will then be written as follows :
P = rec X:(s B ⌃ opi (xi ):X
i
sC opi hei:X
opi 2faccept;resAccountg
=
s C accept:rec Y:(s C opi hei i:Y s B ⌃ opi (xi ):Y ))
i i
The security property is expressed using the recursion operator. In the recursion
block, we have 3 behaviors combined with the internal choice which we denote
by P 1 ; P 2 and P 3 where
P 1 = s B ⌃ opi (xi ):X
i
P 2 =sC opi hei:X
opi 2faccept;resAccountg
=
P 3 = s C accept:rec Y:(s C opi hei i:Y s B ⌃ opi (xi ):Y )
i i
The block P 1 expresses the fact that the property allows all receiving actions.
The block P 2 inhibits sending accept or resAccount to the client. The block P 3
intercepts the accept sending action and then no more restrictions are imposed
since the authentication of the client is accepted.
So, an intruder who wants to check an account's details without authentication
cannot achieve its goal since the resAccount is only permitted in the block
following the accept sending action.
58
4 Choreography Speci cation
The technique used in this paper for describing the composition of web services
is the choreography. The WS-CDL is based on two engineering principles :
Service Channel Principle corresponds to the repeated availability of service
channels.
Session Principle is a basic principle in many communication-centred programs
which says a sequence of conversations belonging to a protocol should not be
confused with other concurrent runs of this or other protocols by the participants.
As a speci cation of the choreography description language WS-CDL, we
introduce a modi ed version of the end-point calculus which is a formalism
presented in [10]. The end-point calculus describes the behavior of a participant
in a choreography from its end-point view.
4.1 Secured End-Point Calculus
The secured end-point calculus EP CS is a variant of EP C (End-Point Calculus)[10].
EP CS has the particularity of explicitly handling the monitoring concept through
its operator @P : For instance, the process @P (P ) can execute only actions that
could be executed by both the controller P and the process P: We describe
hereafter the formal syntax and dynamic semantics of the secured end-point
calculus.
Syntax of EP CS Since EP C is a subset of EP C and EP CS is an extension
of EP C; a part of the syntax of EP CS is presented in the last section. So in
this section we will present the overall syntax of EP CS but we will describe only
constructs that have not been described before.
P ::= !ch(~
s):P
j ch(⌫ s~):Q
j s B ⌃ opi (xi ):Pi
i
j s C ophei:Q
j x := e:P
j if e then P else Q
j P Q
j P jQ
j (⌫s)P
j X
j rec X:P
j 0
j @P (P )
59
The two rst constructs represent session initiation. !ch(~ s):P is used for input
and ch(⌫ s~):Q for output. !ch(~
s) says that the service channel ch; which is avail-
able to public, is ready to receive an unbounded number of invocations, o↵ering
a communication via its freshly generated session channels s 2 s~: ch(⌫ s~) is an
invocation of a service located at the service channel ch and an initiation of a
communication session which will occur through session channels s~: After a ses-
sion has been initiated between two participants and freshly generated session
channels have been shared between them, they can communicate using the com-
munication constructs. The assignment operator is x := e:P . if e then P else Q
is a choice based on the evaluation of the expression e: The parallel composi-
tion is given by P jQ: The restriction construct (⌫s)P indicates that the session
channel s is local to P: 0 denotes the lack of actions. Finally, @P (P ) is the en-
forcement operator which controls the execution of P by allowing it to evolve if
P is satis ed.
Semantics of EP CS
Initiation/Communication simulation Let P be a process stipulating the local
behavior of a participant A in a web service. We observe in the end-point cal-
culus reduction rules that processes always evolve without an external factor
unless in initiation and communication. In fact, a participant initiating a com-
munication or communicating with an another participant needs to synchronize
with him to realize the interaction . Therefore, for enforcing security policies
on a participant's behavior, we need to have a totally local description of its
interactions. So we need to de ne a simulation relation for the initiation and the
communication of processes so that we can simulate the evolution of a process
locally, i.e. without synchronizing with an another participant . We will de ne
the normal form of a process then introduce the simulation relation.
De nition 1 (Normal form of a process). Every process representing the
local behavior of a participant in a web service can be written as an internal sum
of processes, which we call the normal form of a process :
8P 2 P; P = a i Pi
i
where P denotes the set of processes, ai range over atomic actions and Pi range
over processes in P:
De nition 2 (Simulation Relation). We de ne a simulation relation over
a
networks, denoted by A[P ] A[P 0 ] 0 ; which says that process P in A at the
state is able to execute the action a and reduce to P 0 with a new local state 0 :
The simulation relation is de ned following this rule
P = a i Pi 9i 2 f1; : : : ; ng : a = ai
i
a
A[P ] A[Pi ]
This rule says that when P is written in its normal form and one of the
constituting processes is able to do an action a then A[P ] is also able to do it.
60
Semantics of EP CS :
Init
s):P j P 0 ] A j B[ch(⌫ s~):Q j Q0 ] B ! (⌫ s~)(A[!ch(~
A[!ch(~ s):P jP jP 0 ] A j B[QjQ0 ] B )
A `e+v
Assign
A[x := e:P jP 0 ] A ! A[P jP 0 ] A [x7!v]
A ` e + tt
IfTrue
A[if e then P1 else P2 jP 0 ] A ! A[P1 jP 0 ] A
A ` e + ff
IfFalse
A[if e then P1 else P2 jP 0 ] A ! A[P2 jP 0 ] A
A[P1 jR] A ! A[P10 jR] A0 A[P ] A ! A[P 0 ] A0
Par Res
A[P1 jP2 jR] A ! A[P10 jP2 jR] A0 A[(⌫s)P ] A ! A[(⌫s)P 0 ] A0
Init-In-Sec
s):P )] A jB[ch(⌫ s~):QjR] B ! (⌫ s~)(A[@P (P )j@P (!ch(~
A[@P (!ch(~ s):P )] A jB[QjR])
Init-Out-Sec
A[@P (ch(⌫ s~):P )] A jB[!ch(~
s):QjR] B ! (⌫ s~)(A[@P (P )] A jB[!ch(~
s):QjQjR] B )
sBop(x) sBop(x)
A[P ] A A[P 0 ] A A[P ] A A[P 0 ] A A `e+v
Comm-In-Sec
A[@P (P )] A jB[s C ophei:QjR] B ! A[@P (P )] A [x7!v] jB[QjR] B
0 0
Comm-Out-Sec
sCopj hei sCopj hei
A[P ] A A[P 0 ] A A[P ] A A[P 0 ] A B `e+v
A[@P (P )] A jB[s B ⌃ opi (xi ):Qi jR] B ! A[@P 0 (P 0 )] A jB[Qj jR] B [xj 7!v]
i
A `e+v
Assign-Sec
A[@P (x := e:P )] A ! A[@P (P )] A [x7!v]
A[@P (P1 )] ! A[@P 0 (P10 )] 0
Par-Sec
A[@P (P1 jP2 )] ! A[@P 0 (P10 jP2 )] 0
` e + tt
IfTrue-Sec
A[@P (P jif e then P1 else P2 )] ! A[@P (P jP1 )]
` e + ff
IfFalse-Sec
A[@P (P jif e then P1 else P2 )] ! A[@P (P jP2 )]
61
A[@P (P )] ! A[@P 0 (P 0 )] 0
Res-Sec
A[(⌫ s~)(@P (P ))] ! A[(⌫ s~)(@P 0 (P 0 )] 0 )
The Init-rule shows how two participants initiate a session by sharing new
freshly generated session channels s~: These session channels are restricted to par-
ticipants A and B by the binding operator (⌫): Assignment is a local construct.
Assign-rule evaluates an expression e and assigns the result of this evaluation
to the variable x in A; then A behaves as P: The Res-rule restricts the use of
session channels s~ to the process P in A: Init-In-Sec and Init-Out-Sec are the
rules for communication initiation. We do not control session initiations but we
control communication messages between the participants. Communication rules
say if P and P are able to send or receive through the same session channel
the same operation and become respectively P 0 and P 0 then @P (P ) do this ac-
tion and becomes @P 0 (P 0 ): Secured assignment rule says that assignment is not
considered by enforcement. The secured parallel composition rule says that the
security operator applied to P1 jP2 can evolve into (@P 0 (P10 jP2 )) if P1 can evolve
simultaneously with the policy security P into respectively P10 and P 0 : For the
restriction rule, binding session channels does not a↵ect the enforcement. Finally,
the conditional rules say that enforcement is not a↵ected by conditionals.
5 Choreography Monitoring
The goal of this research is to enforce security policies over a choreography.
Some of the important features of the enforcement operator is that it allows us
to enforce only concerned participants by the security policies. In this in-lined
monitoring framework, we do not modify the original behaviors of participants
in a choreography. But if the security policy is not veri ed the evolution of the
choreography will stop. In this section, we prove the correctness of our theory
by de ning rst some notions such as the partial order over processes and satis-
faction notions.
De nition 3 (Partial Order over Processes). Let A[P1 ] ; A[P2 ] be two
networks. We say A[P1 ] v A[P2 ] if the following condition hold :
a a
A[P1 ] A[P10 ] =) A[P2 ] A[P20 ] and A[P10 ] v A[P20 ] :
De nition 4 (Satisfaction Notions). Let P be a security policy and ⇠ a
trace. Symbols ✏ and js are de ned as follows :
{ We say that ⇠ satis es P ; denoted by ⇠ ✏ P ; if ⇠ 2 JP K where JP K denotes
the set of traces of P :
{ We say that ⇠ could satisfy P ; denoted by ⇠ js P ; if it exists a trace ⇠ 0
such that ⇠:⇠ 0 ✏ P :
Theorem 1. Let P be a process and P a policy security. The following prop-
erties hold :
62
1. @P (P ) js P ;
2. A[@P (P )] v A[P ] ;
3. 8P 0 : ((P 0 js P ) and (A[P 0 ] v A[P ] )) =) A[P 0 ] v A[@P (P )] :
Proof. 1. The process @P (P ) is de ned so that it can evolve into another pro-
cess only if the security policy P is satis ed.
2. As well as the rst property, the proof is obtained directly from the reduction
rules of the security operator and the de nition of the partial order.
3. Consider a process P 0 2 P such that P 0 js P and A[P 0 ] v A[P ] : Suppose
a
A[P 0 ] A[P10 ] 0 : Since A[P 0 ] v A[P ] 0 ; we conclude directly from the
a
de nition of v that A[P ] A[P1 ] 0 : Since P 0 js P ; we can also conclude
a
from the de nition of js; that a js P : Finally, as A[P ] A[P1 ] 0 and
a
a js P so A[@P (P )] A[@P 0 (P1 )] 0 and then A[P 0 ] v A[@P (P )] :
Example 3 (Buyer-Seller Protocol). The buyer initiates a communication with
the seller and requests for a quote. The seller sends back the quote. If the buyer
rejects it then the protocol terminates. Otherwise, the seller sends to the buyer
an order con rmation and the buyer con rms his command. In this case, the
seller contacts the shipper and asks for delivery details which he transfer to the
buyer and the protocol terminates. The security property that we want to ap-
ply in this protocol is : "the seller should communicate with the shipper only if
the buyer con rms his command". As a consequence, the seller won't send to
buyer the delivery details if he has not con rmed his command. The protocol is
depicted in Fig.1. Critical actions that we have to supervise are in dark grey in
Fig.1.
The choreography's description in global calculus is the following :
Buyer ! Seller : B2SCh(s):
Seller ! Buyer : shackSessioni:
Buyer ! Seller : shreqQuotei:
Seller ! Buyer : shrespQuote; vquote ; xquote i:
if resonable(xquote )@Buyer then
Buyer ! Seller : shquoteAccepti:
Seller ! Buyer : shorderConf irmi:
Buyer ! Seller : shackConf irmi:
Seller ! Shipper : B2SHCh(s0 ):
Shipper ! Seller : s0 hackSessioni:
Seller ! Shipper : s0 hreqDelivDeti:
Shipper ! Seller : s0 hdelivDet; vdelivDet ; xdelivDet i:
Seller ! Buyer : shdelivDet; xdelivDet ; xdelivDet i:0
else
Buyer ! Seller : shquoteRejecti:0
The end-point projection gives the end-point behaviors. The buyer's behavior is
the following :
Buyer[B2SCh(⌫s):s B ackSession:s C reqQuote:s B respQuote(xquote )
63
ackSession
reqQuote
respQuote
quoteAcc
Buyer Seller
quoteRej
orderCon rm
ackCon rm
delivDet
n
sio
es
kS
et
ac
D
iv
el
qD
et
re
D
liv
de
Shipper
Fig. 1. Buyer Seller Protocol
:if reasonable(vquote ) then s C quoteAcc:s B orderConf irm:s C ackConf irm
:s B delivDet(xdelivDet ):0 else s C quoteRej:0]
The seller's behavior is the following :
Seller[B2SCh(s):s C ackSession:s B reqQuote:s C respQuotehequote i:
(s B quoteAcc:s C orderConf irm:s B ackConf irm:
B2SHCh(⌫s0 ):s0 B ackSession:s0 C reqDelivDethebuyer i:s0 B delivDet(xdelivDet )
:s C delivDet(xdelivDet ):0 ) + (s B quoteRej:0)]
The shipper's behavior is the following :
Shipper[B2SHCh(s0 ):s0 C ackSession:s0 B reqDelivDet(xbuyer )
:s0 C delivDethedelivDet i:0]
64
The security property has the following behavior :
P = rec X:( s C opi hei:X
op6=delivDet
sB ⌃ opi (xi ):X
op6=ackConf irm
s B ackConf irm:rec Y:(s0 C opi hei:Y s0 B ⌃ opi (xi ):Y
i i
sC opi hei:Y )
i
)
The security property P is written using the recursion operator. In the recursion
block, there are 3 processes combined with the internal choice. The rst process
is : sC opi hei:X which says that, through the session channel s (shared
op6=delivDet
between the buyer and the seller), the buyer can execute any sending action un-
less delivery details which should be done after the seller has received a con r-
mation from the buyer. The second process is : sB ⌃ opi (xi ):X which
op6=ackConf irm
says that the seller can receive any operation other than ackConf irm. This ac-
tion is intercepted in the third process s B ackConf irm:rec Y:(s0 C opi hei:Y
i
s0 B ⌃ opi (xi ):Y s C opi hei:Y ): In this process, after the seller has received a
i i
con rmation from the buyer, he can communicate with the shipper through the
session channel s0 without any restriction and he has also no restriction on his
sending actions to the buyer, so he can send him the delivery details.
6 Related Works
Web services veri cation have been a subject of interest of several research ef-
forts. Some of the relevant contributions in this domain are cited in this section.
Most of formal approaches introduced a monitor which does not stop the pro-
gram when a violation is detected. Moreover, these contributions implement a
monitor as a web service in addition to other web services. The originality of our
work is the introduction of the monitor within concerned participants processes.
In [11], a run-time event-based approach to deal with the problem of monitoring
conformance of interaction sequences is presented. When a violation is detected,
the program shows errors in dashboards. In [12], authors introduce an approach
to verify the conformance of a web service implementation against a behavioral
speci cation, through the application of testing. The Stream X-machines are
used as an intuitive modeling formalism for constructing the behavioral speci -
cation of a stateful web service and a method for deriving test cases from that
speci cation in an automated way. The test generation method produces com-
plete sets of test cases that, under certain assumptions, are guaranteed to reveal
all non-conformance faults in a service implementation under test. However, this
approach only returns non-conformance faults and does not react dynamically
against these errors. While, in [13], authors propose a monitoring framework of
a choreographed service which supports the early detection of faults and decide
65
whether it is still possible to continue the service. Authors in [?] have proposed
service automata as a framework for enforcing security policies in distributed
systems. They encapsulate the program in a service automaton composed of
the monitored program, an interceptor, an enforcer, a coordinator and a local
policy. The interceptor intercepts critical actions and passes them to the coor-
dinator that determines whether the action complies the security policy or not
and decides upon possible countermeasures then the enforcer implements these
decisions. However the authors do not precise how to detect critical actions.
7 Conclusion
In this paper, we have introduced a formal approach allowing to automatically
enforce a security policy on choreographed services. Indeed, we introduced a
new calculus with an enforcement operator @P : The semantics of the proposed
calculus insure that choreographed services can evolve only if it does not violate
the enforced security policy. The originality of our work consists on the fact that
we do not add a new web service as a monitor but rather we wrap the security
policy inside the choreographed services.
Future work will focus on the de nition of a complete mapping between WS-
CDL and global calculus. Moreover, we will seek means to optimize the enforced
choreographed services so that we reduce as much as we can the overhead due
to the enforcement operator.
References
1. Corporation, I.: Business process execution language for web services bpel-4ws.
http://www.ibm.com/developerworks/library/ws-bpel/ (2002)
2. F.Leymann: Web services ow language (ws ) version 1.0. Technical Report,
International Business Machines Corporation (IBM) (May 2001)
3. Kavantzas, N., Burdett, D., Ritzinger, G., Fletcher, T., Lafon, Y.: Web services
choreography description language version 1.0. W3C Working Draft (December
2004)
4. Langar, M., Mejri, M., Adi, K.: Formal enforcement of security policies on con-
current systems. J. Symb. Comput. 46(9) (2011) 997{1016
5. Khoury, R., Tawbi, N.: Corrective enforcement: A new paradigm of security policy
enforcement by monitors. ACM Trans. Inf. Syst. Secur. 15(2) (2012) 10
6. Yi, X., Kochut, K.: A cp-nets-based design and veri cation framework for web
services composition. In: ICWS. (2004) 756{760
7. Hamadi, R., Benatallah, B.: A petri net-based model for web service composition.
In: ADC. (2003) 191{200
8. Sala
un, G., Bordeaux, L., Schaerf, M.: Describing and reasoning on web services
using process algebra. In: ICWS. (2004) 43{
9. Kazhamiakin, R., Pandya, P.K., Pistore, M.: Timed modelling and analysis in web
service compositions. In: ARES. (2006) 840{846
10. Carbone, M., Honda, K., Yoshida, N.: Structured communication-centred pro-
gramming for web services. In: ESOP. (2007) 2{17
66
11. Baouab, A., Perrin, O., Godart, C.: An optimized derivation of event queries to
monitor choreography violations. In: ICSOC. (2012) 222{236
12. Dranidis, D., Ramollari, E., Kourtesis, D.: Run-time veri cation of behavioural
conformance for conversational web services. In: ECOWS. (2009) 139{147
13. Ardissono, L., Furnari, R., Goy, A., Petrone, G., Segnan, M.: Monitoring chore-
ographed services. In: Innovations and Advanced Techniques in Computer and
Information Sciences and Engineering. (2007) 283{288
14. Fu, X., Bultan, T., Su, J.: Analysis of interacting bpel web services. In: WWW.
(2004) 621{630
67
A Timestamping Scheme with Eternal Security
in the Bounded Storage Model
Assia Ben Shil1
Faculty of Sciences of Bizerte
University of Carthage
Email: essia.benshil@gmail.com
and
Kaouther Blibech Sinaoui2
ISSAT of Mateur
University of Carthage
Email: kaouther.blibech@gmail.com
1,2
LIP2 Laboratory, Tunis, Tunisia
Abstract. Digital timestamping is a cryptographic technique allowing
affixing a reliable date to a digital document. The security of most ex-
isting timestamping systems is based on the security of the used cryp-
tographic techniques as hash functions. However, a hash function has a
limited lifetime. In this context, we provide a non-interactive timestamp-
ing scheme in the bounded storage model (BSM) whose security is not
related to the life of any cryptographic technique. We prove, in fact, that
our timestamping scheme is eternally secure even against an adversary
with unlimited computing power.
Keywords: Timestamping, bounded storage model, computing power, eternal
security.
1 Introduction
In the last years, digital documents are beginning to replace paper documents
in several areas. For this reason, it is important to locate a digital document in
time to prove its existence and integrity since a given date. This task is achieved
through a timestamping system that operates in two phases: a timestamping
phase allowing one or more Timestamping Authority (TSA) to affix a reliable
date to a document and generate the associated timestamp and a verification
phase allowing any potential verifier to verify the correctness of the produced
timestamp.
The security of most existing timestamping systems [2-10] is based on the
security of the used cryptographic techniques as hash functions. These tech-
niques are secure under the assumption saying that the users’ computing power
68
is limited. However, nowadays, the computing power of computers grows expo-
nentially. Therefore, the used hash functions are not secure all the time [17].
So, the existing timestamping systems cannot be considered secure forever. To
propose timestamping systems producing timestamps with eternal validity, we
are placed in the Bounded Storage Model (BSM) [13, 14]. In this model, Maurer
assumes that users’ storage capacity is limited, while their computing power can
be unlimited. In this model, the ciphers are eternally secure [1]. The principle is
that a very long random string is transmitted in every round t (the interval be-
tween the time t and the time t+1). At the time of the transmission of this string,
no participant has sufficient storage capacity to store it fully.Even if later, his
storage capacity becomes sufficient to fully store the string transmitted in t, the
user has already lost he access to this string. Thus, the performed encryptions
are valid forever. Moran proposed in [15] a non-interactive timestamping scheme
in the bounded storage model. Indeed, in the scheme of Moran, each stamper
or document owner can timestamp his document locally without communicat-
ing with any third party [15], unlike conventional timestamping systems that
involve at least one TSA. Thus, the non-interactive timestamping ensures to-
tal confidentiality and hides even the fact that a timestamping occurred. These
characteristics have made the proposition of Moran very interesting, but the
timestamping system he proposed su↵ers from a lack of precision and practical
details. In this context, we present a non-interactive timestamping scheme in the
BSM. This paper is organized as follows: After introducing the BSM in section
2, we present firstly Moran’s timestamping system, and then our non-interactive
timestamping scheme in the BSM in section 3. Then, we detail more formally
our timestamping scheme. Finally, we formally prove the eternal security of our
timestamping solution.
2 The Bounded Storage Model
Generally, in cryptography, the proposed systems and the used functions are
secure under the assumption that there is a limit of the computational power of
any user or adversary. In the bounded storage model, the proposed systems must
be secure even against an adversary with an unlimited computational power. The
BSM was proposed by Maurer in 1992 [13] for the development of encryption
keys. It aims to generate, from a short secret key K, a key with a large size X
[14] that can be used as encryption key. The system operates as follows: In this
model, it is assumed that the storage capacity is unlimited, no assumptions about
the computing power was made. Let s be the assumed limit on a user’s storage
capacity. Ciphers in the BSM use a very long string R called randomizer. The
latter may for example be a random sequence of bits transmitted by a satellite.
r
If R is a random string of r bits, the space of R is {0, 1} . Notice that r s is
required to ensure that no user can fully store R. Having a secret key K of size k
k r k
in the space {0, 1} , we can use a known function f : {0, 1} ⇥{0, 1} , x ! {0, 1}
to generate the derived key X = f (R, K) of size x bits. The function f must use
only a small part of R so that we do not need to fully read R. Maurer’s system
69
has been the subject of intensive studies. Indeed, many key generation systems
have been proposed in the BSM [11]. The BSM was used for timestamping by
Moran in [15]. In [3], we proposed an improvement of Moran’s timestamping
system. In the following section, we present the timestamping system of Moran
and then our proposition is presented.
3 Timestamping Solutions in the BSM
In the BSM, we assume that a long random string R of r bits is transmitted
during the round t (between t and t + 1). If s is the maximum storage capacity
r
of any entity in t, then s ⌧ r. Notice the space of R is {0, 1} where r is the
size of the string R. Similarly, we consider that a document D of size d has a
d
value in {0, 1} . In the timestamping scheme proposed by Moran, to timestamp
a document D, its content is used to select a few blocks from R whose values will
be inserted in the timestamp T . Any verifier must save randomly some blocks
of R (using a function named Sketch(R)) in order to verify, later, the validity of
any timestamp made during the round t. Verifying the validity of a timestamp
associated to a document D is performed at a later date by a verifier who has
simply to verify that there are no conflicts between his sketch and the timestamp
of D. However, in this solution, a timestamp includes some additional values of
R. If the verifier cannot store during the round t more than s bits of R, he may
at the verification time store s0 with s0 > s. Each verification of a timestamp
generated in the round t can lead him to discover new blocks of R. After a
number of verification processes, he can reconstruct partially, if not entirely R
and backdate any document.
To remedy this problem, we propose a timestamping protocol allowing veri-
fying the timestamp of a document D without learning additional values of R.
To this aim, instead of inserting the blocks of R in the timestamp, these blocks
will be the secret of the stamper. The verifier must then prove that he has the
value of a given block in his sketch and the stamper has to prove that he has used
this value to create the timestamp. In the verification process of this scheme,
a verifier may accept or reject a timestamp without discovering any additional
information about the string R transmitted during the round t. The idea is to
timestamp the document D locally using the secret sharing scheme of Shamir
[18] in order to divide D into n shares Di . Assuming that the blocks of R are
indexed by a number beginning from 1 till the number of blocks, the values RDi ,
indexed by the shares Di for 1 i n are recovered. The polynomial that
passes through the points Pi (RDi , Di ) for 1 i n represents the timestamp of
D. Note Share(D) the set of shares Di . Any user of the system saves a random
subset of R named Sketch(R) formed by a number of couples of values (index of
block, value of the block). To verify the validity of the timestamp T of D for a
random string R the verifier who stored Sketch(R) proceeds as follows: For each
index in both Share(D) and Sketch(R) he recovers the associated block Ri of
R, computes its associated index by the polynomial given in the timestamp and
verifies that the associated value in Sketch(R) is Ri . In the following sections,
70
we provide a formal representation of our timestamping system and we prove its
eternal security. To this aim, we will show that once the string R is transmitted,
the probability of forging a fake timestamp for a document D is close to zero.
4 Definitions and Notations
In this section, we introduce some notations that we will use later in this paper.
- Randomizer: We call ”randomizer” and we denote R the random string
transmitted during a round t. This string is divided into n blocks of size b bits
indexed from 1 to n and denoted R1 , . . . , Rn . Knowing that the length of R is
r bits and the size of a block is b bits, r = n.b. For each subset S ✓ I(R) where
I(R) is the set of indexes of blocks Ri (1 i n), we denote R|S the set of
blocks of R 8 i 2 S.
- Hamming Distance: A vector being a set of blocks, we define the Ham-
ming Distance between two vectors c1 and c2 denoted DH as the number of
blocks for which the two vectors di↵er.
- Threshold secret sharing: The threshold secret sharing is a technique
for dividing a secret S into l shares such that the coalition of at least shares is
necessary to reconstruct the secret while the coalition of at most 1 shares do
not reveal even partially the secret ( < l). A -out of-l threshold secret sharing
scheme is denoted SSS( , l).
- Shamir secret sharing scheme: The secret sharing scheme based on
Shamir’s polynomial interpolation scheme is a SSS( , l). We denote it SSSS
( , l). The principle is to fix a polynomial P of degree 1 and X a set of
values (X = X1 , . . . , Xl ). The secret sharing function denoted Share takes as
input the secret S and generates the set of shares Share(S) = (S1 , . . . , Sl ) such
that 8i, 1 i l, Si = P (Xi ). The Reconstruction function denoted Share 1
take as input a subset of X denoted XS = [XS1 , . . . , XS ] such that |XS | =
and the associated shares: Share 1 (XS , SXS1 , . . . , SXS ) = P , with P a
polynomial such that 8 Xi 2 XS and Si 2 [SXS1 , . . . , SXS ], P (Xi ) = Si . The
secret S is computed as follows: S = P (0).
5 Presentation of our Timestamping Scheme
5.1 Timestamping Phase
A stamper is represented by the two following functions:
- Store(D, R) that uses Shamir’s secret sharing process to compute
Share(D) for a given document D. Then, It computes the vector R|Share(D) and
stores it. More formally, Store(D, R) consists in computing Share(D) = (D1 ,
. . . ,Dl ), where Di is the ith indexspecif iedbyD.T othevector(RD1 , . . . , RDl ) is as-
sociated the vector Share(D) where RDi is the block of R indexed by Di . We
denote this vector R|Share(D) , where the notation R|I means the values of blocks
of R indexed by I1 , . . . , In , with I = I1 , . . . , In .
71
Definition 1. Store(D, R) = R|Share(D) .
- Stamp(D, Store(D, R)) uses Shamir’s reconstruction process to find
the unique polynomial passing through the points Pi (x, y) where x is a block of
the vector R|Share(D) and y the associated blocks in Share(D). This polynomial
is the timestamp T .
Definition 2. T = Share 1 (R|Share(D) , Share(D)).
5.2 Verification Phase
A verifier is represented by the two following functions:
- Sketch(R) allows choosing, randomly, a set of indexes denoted H with
H ⇢ I(R), computing R|H and storing this vector.
Definition 3. Sketch(R) = (H, R|H ).
- V erif y(Sketch(R), D, T ) allows verifying that there are no conflicts
between Sketch(R) and T .
Definition 4. V erif y(Sketch(R), D, T ) allows verifying the following equality
: T (R|H\Share(D) ) = H \ Share(D)
In other words:
Definition 5. V erif y(Sketch(R), D, T ) allows verifying that DH(T (R|H \
Share(D)), H \ Share(D)) = 0 for a timestamp T . If this equality is veri-
fied, the timestamp T is accepted by the verifier and is said ”valid”.
5.3 The behavior of an adversary
An adversary consists in the two following functions:
- Store⇤ (R) which saves a subset of R called C. The di↵erence be-
tween Sketch(R) and Store⇤ (R) is that Sketch(R) is computed ”online” while
Store⇤ (R) function may not be.
- Stamp⇤ (D, C) that given a document D and a string C tries to produce
a timestamp T ⇤ of D.
Definition 6. Stamp⇤ (D, C) = R⇤ |Share(D) .
Definition 7. T ⇤ = Share 1 (R⇤ |Share(D) , Share(D)).
Where R⇤ |Share(D) is the vector of blocks associated to Share(D) according
to T ⇤ . If an adversary A produces for a document D and a randomizer R, a times-
tamp T ⇤ that is equal to the timestamp T produced by Stamp(D, Store(D, R)),
we say that he backdates ”correctly” the document. More formally:
Definition 8. An adversary backdates a document D with a success probability
for a given randomizer R if : P r[V erif y(Sketch(R), D, Stamp⇤ (D, Store⇤ (R)))]
72
Definition 9. An adversary backdates correctly a document D for a given
randomizer R if DH(V1 , V2 ) = 0 with V1 = R⇤ |Share(D) and V2 = R|Share(D) .
Definition 10. An adversary backdates correctly a document D for a given
randomizer R with at most err errors, if DH(V1 , V2 ) err, with V1 = R⇤ |Share(D)
and V2 = R|Share(D) .
6 Security Proofs of our Timestamping Scheme
Given the following parameters:
- s : the storage capacity of the most powerful adversary.
- r : the size of the random string R transmitted during a round t.
- b: the size of a block of the random string R transmitted during a
round t.
- l: the number of indexes specified by a given document.
- n: the number of blocks of the random string R transmitted during a
round t.
We assume that:
(1) r s : The size of the random string R transmitted during a round
t is greater than the storage capacity of the most powerful user of the system.
(2) 1 1 such that n = u. |H| with u much smaller than the number of
indexes specified by a document.
(3) 2b r/b: The number of possible values for a block of size b bits is
greater than the number of blocks of the string R transmitted during a round t.
(4) r/b > l: The number of blocks of the string R transmitted during a
round t is greater than the number of shares used in the adopted Shamir’s secret
sharing scheme.
(5) b 1: The size of a block of R is much greater than 1 bit.
In our security study we demonstrate mainly two important characteristics
of our non-interactive timestamping scheme. First, we prove that backdating
documents in our timestamping scheme has a negligible probability. Second, we
prove that the timestamps provided by our timestamping scheme have an eternal
validity.
6.1 Negligible Probability of backdating documents
In our timestamping scheme, the probability that an adversary backdates a
document D for a string R already transmitted using his stored blocks of R is
negligible. This proof is established in two steps. In the first step, we show that
if an adversary A wants to backdate successfully a document D for a random
string R, then he must backdate it ”correctly” for this string R with an error
err negligible. In the second step, we show that the probability of backdating
correctly a document D for a random string R is negligible.
73
First step If the adversary produces a timestamp of D such that the vector of
blocks associated to Share(D) according to this timestamp is far in Hamming
distance from the vector of blocks associated to the timestamp produced by
Stamp(D, Store(D, R)) then the verifier may with a high probability reject the
timestamp of the adversary because the values indexed by Share(D) according
to the timestamp do not match the values indexed by Share(D) according to his
sketch. Given a correct timestamp T and a timestamp produced by an adversary
A for the document D and the random string R denoted T ⇤ , the adversary
backdates the document D for R successfully, if he produces a timestamp T ⇤
such that the vector of blocks associated to Share(D) according to T ⇤ denoted
R⇤ |Share(D) is close in Hamming distance to R|Share(D) . In this case, we say that
the adversary backdates “correctly”the document D for the random string R
with err errors. Where err is an integer very close to zero. More formally, let
A be an adversary. Denote Rsuccessf ul(D) = R successf ul(D) the set of strings R
for which A has the necessary storage to try to backdate the document D with
a probability of success greater than .
Lemma 1. If an adversary backdates a document D for a random string R
with a probability of success then he backdates it “correctly”with at most
(n/|H|)ln(1/ ) errors. [16]
Proof. Let us suppose that the adversary provides a timestamp for the document
D for the string R such that the timestamp is made with err⇤ > err incorrect
indices. Denote IN CORRECT (D, R) the set of incorrect indices for D and R.
If H \ IN CORRECT (D, R) 6= ; the verifier will reject the timestamp of the
adversary.
Let i be an indice of H, the probability that i be in IN CORRECT (D, R)
is : P r[i 2 IN CORRECT (D, R)] = err⇤ /n.
The probability that i does not belong to IN CORRECT (D, R) is 1 P r[i 2
IN CORRECT (D, R)] = 1 err⇤ /n.
The probability that all the elements of |H| do not belong to IN CORRECT (D, R)
is:
⇤
/ IN CORRECT (D, R)] = |H|.(1 err⇤ /n) e (err |H|)/n .
P r[8i 2 H, i 2
If an adversary backdates a document D for a random string R with a prob-
⇤
ability of success e (err |H|)/n e (err|H|)/n then he backdates it correctly
with at most err n/|H|.ln(1/ ).
Denote Rcorrect (D) the set of strings R for which A can “correctly”backdate
D with at most (n/|H|)ln(1/ ) errors.
Theorem 1. If err n/|H|.ln(1/ ) then, Rsuccessf ul (D) is a subset of Rcorrect (D).
[16]
Proof. According to the lemma 1, if an adversary backdates a document D for
a random string R with a probability of success then he backdates it cor-
rectly with at most err n/|H|.ln(1/ ). So, if a random string R belongs to
Rsuccessf ul (D) then it belongs to Rcorrect (D). Thus, Rsuccessf ul (D) subseteq
74
Rcorrect (D). In addition, more the probability of success become close to 1,
more this error become close to 0. So, successfully backdating means correctly
backdating with a “negligible”error. We prove, in the second step, that the prob-
ability that the random string R chosen by the adversary to backdate a document
D be in Rcorrect (D) is negligible.
Second step We now prove that for an adversary A a document D and a string
R: P r[R 2 Rcorrect (D)] is negligible.
Theorem 2. If l 1 and b 1 then P r[R 2 Rcorrect (D)] is negligible, with b
the size of a block of R.
Proof. We proved in the first step, that if an adversary backdates a document
“successfully”, he backdated it “correctly”with at most a negligible error. Then
we proved in the second stage that the probability that the string R for which
the adversary tries to backdate the document D be in Rcorrect (D) is negligible.
In fact, knowing that the size of blocks of a random string R is b and the
n
number of these blocks is n, the number of possible random strings is (2b ) .
Moreover, to backdate a document D, the adversary has to create a times-
tamp T ⇤ for D such that at least l err blocks of R indexed by D are used to
generate T ⇤ .
In other words, he can try to backdate D only for random strings for which
he knows the values of at least l err blocks from the l blocks indexed by D.
Thus, since the adversary tries to correctly backdate D with at most err errors,
n l
the number of random strings he can use is at most (2b ) .
So, the probability that a random string R belongs to Rcorrect (D) is:
n l
P r[R 2 Rcorrect (D)] (2b ) /(2b )n 1/2lb . Since l 1 and b 1, this
probability is negligible.
6.2 The Eternal Security of our Timestamping Scheme
In [16], Moran proves that in his non-interactive timestamping scheme, an ad-
versary with a storage M can easily backdate = M/T documents by running
the timestamping process on some k documents and storing the generated times-
tamps (each of which has length at most T ). However, the probability that an
adversary backdates more than k documents is negligible. We show here that, in
our timestamping scheme, after the transmission of R, it is very difficult to forge
a fake timestamp for a given document. Moreover, we show that an adversary
having a document D and correct timestamps can forge a fake timestamp for
D only with a negligible probability. Thus, we prove the following theorem:
Theorem 3. If 2b r/b and r/b > l then the probability to forge a fake times-
tamp for a document D and a string R using correct timestamps related to R
is negligible.
75
Proof. The inequality 2b r/b means that the number of values for a block
of size b bits is greater than the number of blocks of the string R transmitted
during a round t.
l is the number of indices specified by a given document, this number must
always be less than the number of blocks of R. So, r/b > l.
It follows that the 2b l, which means that the number of possible values
for a block of R is much greater than the number of indices specified by the
document D.
For each timestamp Tj (1 j ), if the adversary gives any value v from
the 2b possible values of a block of R, it will recover a given index i.
However, the fact that i = Tj (v) does not mean that Ri = v. This means
that i is the value associated to v by the polynomial Tj but the couple (i, v)
does not necessarily belong to the string R. In other words, the string R may
not associate the value v to the block indexed by i. Moreover, it may exist i
such that Ri0 = v and there is no way to verify if i = i0 . The only points of
Tj for which the adversary knows that they belong to R are the points whose
indices are specified by the document associated to Tj . The probability that the
adversary chooses one of these points is l/2b ⌧ 1.
To obtain the k points required to forge a fake timestamp, the probability
is negligible since it is the product of the probabilities of selecting each of the
points belonging to a valid timestamp.
Thus, the adversary can obtain the k points needed to forge a fake timestamp
for a document D for a random string R with a negligible probability.
7 Conclusion
In this paper, we have presented a non-interactive timestamping solution in the
bounded storage model. Our solution is not interactive and hides even the fact
that a timestamping occurred. It also ensures total confidentiality of the provided
timestamps. In addition, our solution provides eternal security for the provided
timestamps. In fact, neither increasing the storage capacity of an adversary or
the evolution of his computing power will compromise a provided timestamp.
Thus, our solution is more secure than existing systems whose timestamps can
be challenged when the computing power or storage capacity of users increase.
In this context, we studied the security of our solution and formally proved that
the possibility of cheating is negligible. In our future work, we plan to adopt new
secret sharing schemes for timestamping.
References
1. Y. Aumann and Y. Z.Ding and M. O. Rabin, Everlasting security in the bounded
storage model, IEEE Transactions on Information Theory. 2002.
2. D. Bayer and S. Haber and W. S. Stornetta, Improving the efficiency and reliability
of digital timestamping, In: Sequences91: Methods in Communication, Security and
Computer Science. 1992.
76
3. A. Ben Shil and K. Blibech and R. Robbana, A New Timestamping Schema in the
Bounded Storage Model, In: Proceedings of the 3rd Conference on Risks and Security
of Internet and Systems. CRiSIS. 2008.
4. K. Blibech and A. Gabillon, A New Timestamping Scheme Based on Skip Lists, In:
ICCSA (3). pp. 395-405, 2006.
5. K. Blibech and A. Gabillon, A New Totally Ordered Timestamping Scheme, In: 5th
Conference on Security and Network Architectures SAR. . 2006.
6. K. Blibech and A. Gabillon, CHRONOS: an authenticated dictionary based on skip
lists for timestamping systems, In: SWS. pp. 84-90, 2005.
7. K. Blibech and A. Gabillon and A. Bonnecaze, Etude des systèmes d’horodatage,
Technique et Science Informatiques 26(3-4). pp. 249-278, 2007.
8. A. Bonnecaze and P. Liardet and A. Gabillon and K. Blibech, A distributed time
stamping scheme, 4th Conference on Security and Network Architectures SAR 2005.
2005.
9. A. Budas and P. R. Laud and H. Lipmaa and J. Willemson, Timestamping with
Binary Linking Schemes, In: CRYPTO. ICICS. pp. 486-501, 1998.
10. A. Budas and P. R. Laud and B. Schoenmakers, Optimally efficient accountable
timestamping, In: Public Key Cryptography. 2000.
11. S. Dziembowski and U. Maurer, On Generating the Initial Key in the Bounded-
Storage Model. In: Advances in Cryptology- EUROCRYPT. pp.126-137, 2004.8
12. S. Haber and W. S. Stornetta, How to Time-Stamp a Digital Document, J. Cryp-
tology 3(2). pp. 99-111, 1991.
13. U. Maurer, Conditionally-perfect secrecy and a provably-secure randomized cipher,
Journal of Cryptology. pp. 53-66, 1992.
14. U. Maurer, Secret key agreement by public discussion, IEEE Transaction on Infor-
mation Theory, 39. pp. 733-742, 1993.
15. T. Moran and R. Shaltiel and A. Ta-Shma, Non-interactive Timestamping in
the Bounded Storage Model, In: In Advances in Cryptology. CRYPTO’04. LNCS,
Springer, 3152. pp. 460-476, 2004.
16. T. Moran and R. Shaltiel and A. Ta-Shma, Non-interactive Timestamping in the
Bounded Storage Model, In: J. Cryptology. pp. 189-226, 2009.
17. National Institute of Standards and Technology (NIST), Announcement of Weak-
ness in the Secure Hash Standard, Technical report. 1994.
18. A. Shamir, How to share a secret, Commun. ACM 22(11). pp. 612-613, 1979.
77