Precise Dynamic Verification of Confidentiality

                                     Gurvan Le Guernic
                   INRIA-MSR - Parc Orsay Universit, 91893 Orsay - France
                       http://www.msr-inria.inria.fr/~gleguern/
                                  gleguern@gmail.com


     Abstract. Confidentiality is maybe the most popular security property to be formally
     or informally verified. Noninterference is a baseline security policy to formalize confiden-
     tiality of secret information manipulated by a program. Many static analyses have been
     developed for the verification of noninterference. In contrast to those static analyses,
     this paper considers the run-time verification of the respect of confidentiality by a single
     execution of a program. It proposes a dynamic noninterference analysis for sequential
     programs based on a combination of dynamic and static analyses. The static analysis
     is used to analyze some unexecuted pieces of code in order to take into account all
     types of flows. The static analysis is sensitive to the current program state. This sensi-
     tivity allows the overall dynamic analysis to be more precise than previous work. The
     soundness of the overall dynamic noninterference analysis with regard to confidentiality
     breaches detection and correction is proved.


1    Introduction
Language-based security is an active field of research. The majority of work on
confidentiality in this field focuses on static analyses [15]. Recent years have seen
a resurgence of dynamic analyses aiming at enforcing confidentiality at run time
[5, 8, 16, 21]. The first reason is that nowadays it is nearly impossible for con-
sumers to prevent the execution of “bad” code on their devices — for example,
in September 2007 cybercriminals introduced malicious scripts which were exe-
cuted by any browser visiting webpages of a US Consulate [9]. Moreover, there
are two main potential advantages of dynamic analyses over static analyses [8].
The first one is the increased knowledge of the execution environment and be-
havior at run time, including the knowledge of the precise control flow followed
by the current execution. This increased knowledge allows the dynamic analysis
to be more precise than a static analysis in some cases; as, for example, with the
program on page 93. The second advantage lies in the ability of sound informa-
tion flow monitors to run some “safe” executions of an “unsafe” program while
still guarantying the confidentiality of secret data. In order to take into account
all indirect flows (flows originating in control statements) dynamic analyses rely
on static analyses of some, but not all, unexecuted pieces of code.
     This paper proposes to increase the precision of such dynamic information
flow analyses. This is done by taking advantage, at the static analysis level, of
the dynamic nature of the overall analysis. To do so, when statically analyzing
an unexecuted piece of code, the current program state is taken into account in
order to reduce the program space to analyze. The following piece of code is a
                                                    Precise Dynamic Verification of Confidentiality   83


motivating example for this work. It corresponds to the body of the main loop
of an Instant Messaging (IM) program. This one has the appealing “movie-like”
feature of displaying messages characters by characters as they are typed.

 1   c := g e t C h a r F r o m K e y b o a r d ();
 2   tmp := tmp + (( i n t ) c );
 3   i f ( tmp > (( i n t ) u s e r S e c r e t K e y ) ) {
 4       tmp := 0;
 5       i f ( t o = " s e x y P i r a t e ") { c := s p e c i a l C h a r }
 6   };
 7   s e n d ( to , c )


    This IM program is a malware developed by “sexyPirate”. When someone
uses this software to communicate with a user other than sexyPirate, everything
goes as expected and no secret is revealed. However, if a user communicates with
sexyPirate using this IM then information about the user’s secret key is leaked to
the pirate. When the integer value of the characters typed by the user since the
last time tmp has been reset to 0 reaches the integer value of the user’s secret key,
a special character (that sexyPirate is able to distinguish) appears on the pirate’s
screen. Therefore, by iterating the process, sexyPirate is able to get an accurate
approximation of the user’s secret key. Any sound static analysis would reject
this program; and therefore, all its executions. One of the advantages of dynamic
information flow analysis, if it is precise enough, is to allow use of this program
for communicating with users other than sexyPirate, while still guarantying the
confidentiality of the secret key in any case. However, none of the previous work
are precise enough. When statically analyzing lines 4 and 5, no knowledge about
the value of the variable to is taken into account. Therefore, the overall dynamic
analysis will always consider that the value of c may be modified; which implies
a flow from userSecretKey to c. With such dynamic analyses, line 7 must then
be corrected in order to prevent any potential leakage of the value of the secret
key to the outside world. In the work proposed in this paper, the static analysis
used for lines 4 and 5 takes into account the run time value of the variable to.
This allows the overall dynamic analysis proposed to detect that there is no flow
from userSecretKey to c whenever to is different from sexyPirate. Therefore, it
allows one to use this IM program for communicating with any user different
from sexyPirate; while still preserving the confidentiality of the user’s secret key
even when trying to use this program to communicate with sexyPirate if the
correction mechanism is applied carefully [4, 7].
    The next section defines various notions used in this paper and introduces the
principles of the dynamic analysis proposed in this paper. Section 3 formalizes
the dynamic information flow analysis whose main properties are exposed in
Sect. 4. Before presenting related works in Sect. 6, the main benefits of dynamic
information flow analyses are exposed in Sect. 5. Finally, Sect. 7 concludes.
84     Gurvan Le Guernic


2     Definitions and Principles
A direct flow is a flow from the right side of an assignment to the left side.
Executing “x := y” creates a direct flow from y to x. An explicit indirect flow
is a flow from the test of a conditional to the left side of an assignment in the
branch executed. Executing “if c then x := y else skip end” when c is true
creates an explicit indirect flow from y to x. An implicit indirect flow is a flow
from the test of a conditional to the left side of an assignment in the branch
which is not executed. Executing “if c then x := y else skip end” when c is
false creates an implicit indirect flow from y to x.
    At any execution step, a variable or expression is said to carry variety [2,
Sect.1] if its value is not completely constrained by the public inputs of the
program. In other words, a variable or expression carries variety if its value is
influenced by the private inputs; therefore, if it may have a different value at
this given execution step if the values of the private inputs were different.

A “safe” execution is a noninterfering execution. In this article, as commonly
done, noninterference is defined as the absence of strong dependencies between
the secret inputs of an execution and the final values of some variables which
are considered to be publicly observable at the end of the execution. For every
execution of a given program P, two sets of variable identifiers are defined. The
set of variables corresponding to the secret inputs of the program is designated
by S(P). The set of variables whose final value are publicly observable at the end
of the execution is designated by O(P). No requirements are put on S(P) and
O(P) other than requiring them to be subsets of X (the domain of variables). A
variable x is even allowed to belong to both sets. In such a case, in order to be
noninterfering, the program P would be required to, at least, reset the value of x.
In the following definitions, we consider that a program state may contain more
than just a value store. This is the reason why a distinction is done between
program states (X) and value stores (σ).

Definition 1 (V -Equivalent States).
Let V be a set of variables. Two program states X1 and X2 , containing the value
stores σ1 and σ2 respectively, are V -equivalent with regards to a set of variables
                V
V , written X1 = X2 , if and only if the value of any variable belonging to V is
the same in σ1 and σ2 :
       V
     X1 = X2    ⇐⇒         ∀x ∈ V : σ1 (x) = σ2 (x)

    Definition 1 states a formal relation among program states. This relation de-
fines equivalence classes of program states with regard to a given set of variables.
If two program states are V -equivalent, it means that it is impossible to distin-
guish them solely by looking at the value of the variables belonging to the set
                                      Precise Dynamic Verification of Confidentiality   85


V . This relation is used to define the confidentiality property which is verified
by the dynamic analysis presented in this paper.
Definition 2 (Noninterfering Execution).
Let ⇓s denote a big-step semantics. Let S(P) be the complement of S(P) in the
set X. For all programs P, program states X1 and X10 , an execution with the
semantics ⇓s of the program P in the initial state X1 and yielding the final state
X10 is noninterfering, written ni(P, s, X1 ), if and only if, for every program states
X2 and X20 such that the execution with the semantics ⇓s of the program P in
the initial state X2 yields the final state X20 :
       S(P)             O(P)
   X1 = X2 ⇒ X10 = X20

    Definition 2 states that an execution is safe — i.e. it has the desired con-
fidentiality property — if any other execution started with the same public
(non-secret) values yields a final program state which is O(P)-equivalent to the
final program state of the execution analyzed. It means that, by looking only
at the final values of the variables observable at the end of the execution, it is
impossible to distinguish this execution from any other execution whose initial
program state differs only in the values of the secret inputs. Therefore, for such
an execution, it is impossible to deduce information about the secret inputs of
the program by looking solely at the values of the publicly observable outputs.

The dynamic analysis is based on a flow and state sensitive static analysis.
During the execution, every variable is associated a tag which reflects the fact
that the variable may or may not carry variety — i.e. may or may not be
influenced by the secret inputs of the program. A tag store in the program state
keeps track of those associations. The dynamic analysis treats directly the direct
and explicit indirect flows. For implicit indirect flows, a static analysis is run on
the unexecuted branch of every conditional whose test carries variety.
    The static analysis is context sensitive. An unexecuted branch P is analyzed
in the context of the program state at the time the test of the conditional, to
which P belongs, has been evaluated. The static analysis is then aware of the
exact value of the variables which do not carry variety. During the analysis, the
context (value store and tag store used for the analysis) is modified to reflect loss
of knowledge (in fact, only the tag store is modified). The static analysis does
not compute the values of variables. Therefore, when analyzing an assignment to
a variable x, the context of the static analysis is modified to reflect the fact that
the static analysis does not anymore have knowledge of the precise value of the
variable x. When analyzing a conditional whose test value can be computed in
the current context (using only the values of the variables whose tag is ⊥), only
the branch designated by the test is analyzed. As the value of any variable which
does not carry variety depends only on the public inputs, branches which are not
86      Gurvan Le Guernic


designated by the test value would never be executed by any execution started
with the same public inputs as the monitored execution. Implicit indirect flows
and explicit indirect flows must be treated with the same precision in order
to prevent the creation of a new covert channel [7]. This particular point is
discussed in Sect. 4. As the static analysis detects implicit indirect flows more
accurately than context insensitive analyses, explicit indirect flows can also be
treated more accurately.
    The next section formalizes the mechanisms presented above. It presents a
monitoring semantics incorporating a dynamic noninterference analysis.


3     The Monitoring Semantics
The dynamic information flow analysis and the monitoring semantics are de-
fined together in Fig. 1. An example of the behavior of this analysis on a given
execution is presented in the companion technical report [6]. Information flows
are tracked using tags. At any execution step, every variable has a tag which
reflects whether this variable may carry variety or not. The static analysis used
for the analysis of some unexecuted branches is characterized in Fig. 2.
     The language studied is an imperative language for sequential programs
whose grammar follows. In this grammar, hidenti stands for a variable iden-
tifier. hexpri is an expression of values and variable identifiers. Expressions in
this language are deterministic — their evaluation in a given program state al-
ways results in the same value — and are free of side effects — their evaluation
has no influence on the program state.
     hprogi ::= skip
              | hidenti := hexpri
              | hprogi ; hprogi
              | if hexpri then hprogi else hprogi end
              | while hexpri do hprogi done
    A program expressed with this language is either a skip statement (skip)
which has no effect, an assignment of the value of an expression to a variable,
a sequence of programs (hprogi ; hprogi), a conditional executing one program
— out of two — depending on the value of a given expression (if statements),
or a loop executing repetitively a given program as long as a given expression is
true (while statements).

3.1    A Semantics Making Use of Static Analysis Results
Let X be the domain of variable identifiers, D be the semantics domain of values,
and T be the domain of tags. In the remainder of this article, T is equal to
                                       Precise Dynamic Verification of Confidentiality   87


{>, ⊥}. Those tags form a lattice such that ⊥ @ >. > is the tag associated to
variables that may carry variety — i.e. whose value may be influenced by the
secret inputs.
    The monitoring semantics described in Fig. 1 is presented as standard infer-
ence rules for sequents written in the format:

   ζ, tpc ` P ⇓M ζ 0

This reads as follows: in the monitoring execution state ζ, with a program
counter tag equal to tpc , program P yields the monitoring execution state ζ 0 .
The program counter tag (tpc ) is a tag which reflects the security level of the
information carried by the control flow. A monitoring execution state ζ is a pair
(σ, ρ) composed of a value store σ and a tag store ρ. A value store (X → D)
maps variable identifiers to values. A tag store (X → T) maps variable identifiers
to tags. The definitions of value store and tag store are extended to expressions.
σ(e) is the value of the expression e in a program state whose value store is σ.
Similarly, ρ(e) is the tag of the expression e in a program state whose tag store
is ρ. ρ(e) is formally defined as follows, with FV(e) being the set of free variables
appearing in the expression e:
               G
    ρ(e) =         ρ(x)
           x∈FV(e)

The semantics rules make use of static analyses results. In Fig. 1, application
of a static information flow analysis to the piece of code P in the context ζ is
written: [[ζ ` P]]]G . The analysis of a program P in a monitoring execution state ζ
must return a subset of X. This set, usually written X, is an over-approximation
of the set of variables which are potentially defined in an execution of P in the
context ζ. This static information flow analysis can be any such analysis that
satisfies a set of formal constraints which are stated in Sect. 3.2.

The monitoring semantics rules are straightforward. As can be expected, the
execution of a skip statement with the semantics given in Fig. 1 yields a final
state equal to the initial state. The monitored execution of the assignment of
the value of the expression e to the variable x yields a monitored execution
state (σ 0 , ρ0 ). The final value store (σ 0 ) is equal to the initial value store (σ)
except for the variable x. The final value store maps the variable x to the value
of the expression e evaluated with the initial value store (σ(e)). Similarly, the
final tag store (ρ0 ) is equal to the initial tag store (ρ) except for the variable x.
The tag of x after the execution of the assignment is the least upper bound of
the program counter tag (tpc ) and the tag of the expression computed using the
initial tag store (ρ(e)). ρ(e) corresponds to the level of the information flowing
into x through direct flows. tpc corresponds to the level of the information flowing
into x through explicit indirect flows.
88    Gurvan Le Guernic


                                    ζ, tpc ` skip ⇓M ζ


                 (σ, ρ), tpc ` x := e ⇓M (σ[x 7→ σ(e)], ρ[x 7→ ρ(e) t tpc ])

                      ζ, tpc ` P h ⇓M ζ h     ζ h , tpc ` P t ⇓M ζ 0
                                ζ, t pc
                                        ` P ; P t ⇓M ζ 0
                                           h


                  ρ(e) = ⊥      σ(e) = v    (σ, ρ), tpc t ⊥ ` P v ⇓M ζ 0
                   (σ, ρ), t pc
                                ` if e then P true else P false end ⇓M ζ 0

               ρ(e) = >       σ(e) = v   (σ, ρ), tpc t > ` P v ⇓M (σ v , ρv )
                                ¬v ]G
                    [[(σ, ρ) ` P ]] = X       ρe = (X × {>}) ∪ (X × {⊥})
              (σ, ρ), tpc ` if e then P true else P false end ⇓M (σ v , ρv t ρe )

                                  ρ(e) = ⊥    σ(e) = false
                       (σ, ρ), tpc ` while e do P l done ⇓M (σ, ρ)

                                   ρ(e) = ⊥   σ(e) = true
                    (σ, ρ), tpc t ⊥ ` P l ; while e do P l done ⇓M ζ 0
                         (σ, ρ), tpc ` while e do P l done ⇓M ζ 0

                                 ρ(e) = >     σ(e) = true
                 (σ, ρ), tpc t > ` P l ; while e do P l done ⇓M (σ 0 , ρe )
                    (σ, ρ), tpc ` while e do P l done ⇓M (σ 0 , ρ t ρe )

                                   ρ(e) = >     σ(e) = false
                         [[(σ, ρ) ` P l ; while e do P l done]]]G = X
                                  ρe = (X × {>}) ∪ (X × {⊥})
                              pc
                    (σ, ρ), t    ` while e do P l done ⇓M (σ, ρ t ρe )


                    Fig. 1. Rules of the monitoring semantics


     The monitored execution of a conditional whose test expression does not
carry variety (ρ(e) = ⊥) follows the same scheme as with a standard semantics.
For a conditional whose test expression e carries variety, the branch (P v ) desig-
nated by the value of e (v) is executed and the other one (P ¬v ) is analyzed. The
final value store is the one returned by the execution of P v . The final tag store
(ρ0 ) is the least upper bound of the tag store returned by the execution of P v
and a new tag store (ρe ) generated from the result of the analysis of P ¬v (X).
By definition, ρ t ρ0 is equal to λx.ρ(x) t ρ0 (x). The new tag store (ρe ) reflects
the implicit indirect flows between the value of the test of the conditional and
the variables (X) which may be defined in an execution of P ¬v . In ρe , the tag of
                                                   Precise Dynamic Verification of Confidentiality   89


a variable x is equal to the initial tag of the test expression of the conditional
(ρ(e)) if and only if x belongs to X; otherwise, its tag is ⊥.


3.2   The Static Analysis Used

Fig. 2 defines some constraints characterizing a set of static analyses which
can be used by the dynamic noninterference analysis. The result X of a static
analysis of a given program (P) in a given context (ζ) is acceptable for the
dynamic analysis only if the result satisfies those rules. This is written in the
format: X |= (ζ ` P). In the definitions of those rules, {[S true , S false ]}tv returns
either the set S true , the set S false or the union of both depending on the tag t
and the boolean v. Its formal definition follows.
                            true
        true    false t     S     ∪ S false iff t = >
    {[S      ,S      ]}v =    v
                            S               iff t = ⊥


      ∅ |= ( (σ, ρ) ` skip )

      {x} |= ( (σ, ρ) ` x := e )

             “                      ”
      X |=       (σ, ρ) ` Ph ; Pt
      iff there exist Xh and Xt such that:
          Xh |= ((σ, ρ) ` Ph )
          let ρ0 = ρ t ((Xh × >) ∪ (Xh × ⊥)) in Xt |= ((σ, ρ0 ) ` Pt )
          X = Xh ∪ Xt


                 (σ, ρ) ` if e then Ptrue else Pfalse end
             `                                              ´
      X |=
      iff there exist Xtrue and Xfalse such that:
          Xtrue |= ((σ, ρ) ` Ptrue )
          Xfalse |= ((σ, ρ) ` Pfalse )
                                 ρ(e)
          X = {[Xtrue , Xfalse ]}σ(e)

             “                                 ”
      X |=       (σ, ρ) ` while e do Pl done
      iff there exists Xl such that:
          let ρ0 = ρ t ((Xl × >) ∪ (Xl × ⊥)) in Xl |= ((σ, ρ0 ) ` Pl )
                        ρ(e)
          X = {[Xl , ∅]}σ(e)


                       Fig. 2. Constraints on the static analysis results
90     Gurvan Le Guernic


4    Properties of the Monitoring Semantics
Section 3 formally defines the dynamic information flow analysis proposed in this
article. In the current section, this dynamic noninterference analysis is proved to
be sound with regard to the enforcement of the notion of noninterfering execution
given in Definition 2. This means that, any monitor enforcing noninterference
using this dynamic analysis would be able to ensure that: for any two monitored
executions of a given program P started with the same public inputs (variables
which do not belong to S(P)), the final values of observable outputs (variables
which belong to O(P)) are the same for both executions.
    Theorem 1 proves that the dynamic analysis is sound with regard to infor-
mation flow detection. Any variable, whose tag at the end of the execution is ⊥,
has the same final value for any executions started with the same public inputs.

Theorem 1 (Detection Soundness).
For all programs P and states (σ1 , ρ1 ), (σ10 , ρ01 ), (σ2 , ρ2 ) and (σ20 , ρ02 ):
                                  
 (σ1 , ρ1 ), ⊥ ` P ⇓M (σ10 , ρ01 )
                                  
 (σ2 , ρ2 ), ⊥ ` P ⇓M (σ20 , ρ02 )
                                                                                   
                                    ⇒ ∀x. (ρ01 (x) = ⊥) ⇒ (σ10 (x) = σ20 (x))
 ∀x ∈ / S(P). σ1 (x) = σ2 (x)     
                                  
 ∀x ∈ S(P). ρ1 (x) = >
                                  

Proof (Proof summary). The detailed formal proof can be found in the companion
technical report [6]. The proof aims at showing that the invariant which relates
the fact, for every variable, of having a ⊥ tag and not carrying variety — i.e. not
being influenced by the secret inputs — is preserved during the execution. The
invariant preservation is obvious for skip statements. If the tag of a variable
x after an assignment of e to x is ⊥, then it means, first, that the control flow
does not carry variety (tpc = ⊥) and therefore that, with similar public inputs,
the assignment will always be executed. It also means that the expression does
not carry variety and the invariant property is preserved by the execution of
assignments. For sequence statements, if the invariant is preserved by the first
and second statements then it is preserved by the sequential execution of both
statements. If the tag of the condition of a branching statement is ⊥, then any
execution started with the same public inputs evaluates the same branch as the
execution monitored. As, by induction, the execution of the branch preserves the
invariant, the invariant is also preserved. If the condition’s tag is >, as the tag of
the control flow (tpc ) is updated to >, all the variables assigned to in the branch
have a tag of >. Additionally, any variable which may have been assigned to in
the other branch are in the set returned by the static analysis, therefore their
tag also becomes>. Hence, the invariant relating ⊥ and not carrying variety is
preserved by the execution of a branching statement. If the tag of the condition
of a loop is ⊥ then, if its value is false the loop is equivalent to a skip and the
                                          Precise Dynamic Verification of Confidentiality   91


invariant is preserved. If the tag of a true condition is ⊥, the proof follows by
induction. If the condition’s tag is > then, for the same reasons as the case for
branching statements, the tag of all the variables whose value may be modified
by the loop becomes >. Therefore, the invariant property is preserved by loops.
    More informally, theorem 1 compares any 2 executions which are such that:
any public inputs (x ∈ / S(P)) have the same initial value, and any secret input
have an initial tag of > in order to let the dynamic analysis know that their
content is secret. Theorem 1 states that if the final tag of a variable is ⊥ —
therefore, that the analysis considers its value to be safely accessible — then
the final values of this variable for both executions — any 2 executions which
have the same public inputs — are the same. Therefore, an attacker, who looks
at the final value of a variable whose final tag is ⊥, sees the same value for all
the executions which have different secret inputs but the same public inputs.
Hence, an attacker does not learn anything about the secret inputs by observing
the final values of variables whose final tag is ⊥. Therefore, to sanitize the final
state of an execution, it is sufficient to “securely” (as will be explained in the
following discussion) reset the value of observable variables (O(P)) whose final
tag is >.
    As shown by Le Guernic and Jensen [7], if the correction of “bad” information
flows is done without enough care, the correction mechanism itself can become
a new covert channel carrying secret information. Indeed, theorem 1 states that
the final value of a variable, whose final tag is ⊥, is not influenced by the value
of the secret inputs. However, if the final tag of a variable is influenced by the
values of the secret inputs, then it means that for some secret input values the
final tag of this variable will be ⊥ and for other secret input values it will be
>. Hence, it means that, for some secret input values, the correction mechanism
will reset the final value of the variable; and for other secret input values, the
final value of the variable will not be reset. Therefore, by checking if the final
value of this variable has been reset or not, an attacker can learn information
about the secret input values. Theorem 2 proves that, for the analysis presented
in this paper, the final tag of a variable does not depend on the secret inputs of
the program. Therefore, any variable belonging to O(P) whose final tag is not
⊥ can safely be reset to a default value without creating a new covert channel.
Theorem 2 (Correction Soundness).
For all programs P and states (σ1 , ρ1 ), (σ10 , ρ01 ), (σ2 , ρ2 ) and (σ20 , ρ02 ):
                                  
 (σ1 , ρ1 ), ⊥ ` P ⇓M (σ10 , ρ01 )
                                  
 (σ2 , ρ2 ), ⊥ ` P ⇓M (σ20 , ρ02 )
                                  
                                                     
 ∀x ∈ / S(P). σ1 (x) = σ2 (x)       ⇒ ρ01 = ρ02
 ∀x ∈ S(P). ρ1 (x) = >
                                  
                                  
                                  
                                  
 ρ1 = ρ2
                                  
92     Gurvan Le Guernic


Proof (Proof summary). The detailed formal proof can be found in the companion
technical report [6]. In order to be able to use induction, a generalization of the
theorem stated above is proved with two differences in its statement: program
counters (tpc ) must only be the same for the two executions and variables whose
tag is ⊥ must have the same value in both executions instead of the hypotheses
based on S(P). The case for skip is direct. For assignments, the only tag modified
is the one of the variable assigned. As the two tag stores are initially equal, ρ1 (e)
is equal to ρ2 (e). Therefore, both tag stores are equal after the execution of the
assignment. The case for sequences goes by induction. For if statements, if the
same branch is executed then by induction the theorem holds. If two different
branches are executed then the tag of the program counter is >. Therefore, the
tag of every assigned variable becomes >. Additionally, the unexecuted branch is
analyzed and the tag of every variable which may have been assigned to is set to
>. Fig. 2 constrains the analysis to make the same choices as the execution with
regard to which subbranches to ignore and which ones to analyze. Therefore, the
set of variables returned by the analysis of the unexecuted branch is exactly the set
of variables whose tag would have been set to > by an execution of this branch.
Therefore, whatever branch is executed or analyzed, the same set of variables
have their tag set to >. Hence, the final tag stores are equal after the execution
of the if statement. For while statements, if the condition evaluates to the same
value then the inductive hypothesis implies that the theorem holds. Otherwise,
it means that its tag is >. Therefore the final tag store is the least upper bound
of the initial tag store and a new tag store ρe . This new tag store is either the
tag store returned by the execution of the while statement (executing the body
at least once) with program counter tag (tpc ) equal to > or the analysis of the
same statement. As for if statements, in both cases the tags of the exact same
set of variables are set to >. Hence, the theorem holds.


5    Benefits of Monitoring Compared to Static Analyses
Monitoring an execution has a cost. So, what are the main benefits of nonin-
terference monitoring compared to static analyses? The first concerns the pos-
sibility that a monitoring mechanism can be used to change the security policy
for each execution. In the majority of cases, running a static analysis before ev-
ery execution would be more costly than using a monitor. The second reason is
that noninterference is a rather strong property. Many programs are rejected by
static analyses of noninterference. In such cases it is still possible to use a mon-
itoring mechanism with the possibility that some executions will be altered by
the monitoring mechanism. However behavior alteration is an intrinsic feature of
any monitoring mechanism. Monitoring noninterference ensures confidentiality
while still allowing testing with regard to other specifications using unmonitored
executions as perfect oracle — at least as perfect as the original program.
                                     Precise Dynamic Verification of Confidentiality   93


   There are two main reasons why it is interesting to use a noninterference
monitor on a program rejected by a static analysis. The first one is that a
monitoring mechanism may be more precise than static analyses because during
execution the monitoring mechanism gets some accurate information about the
“path behavior” of the program. As an example, let us consider the following
program where h is the only secret input and l the only other input (a public
one).

  1   i f ( t e s t 1 ( l ) ) then tmp := h e l s e skip end;
  2   i f ( t e s t 2 ( l ) ) then x := tmp e l s e skip end;
  3   output x


Without information on test1 and test2 (and often, even with), a static analysis
would conclude that this program is unsafe because the secret input information
could be carried to x through tmp and then to the output. However, if test1 and
test2 are such that no value of l makes both predicates true, then any execution
of the program is perfectly safe. In that case, the monitoring mechanism would
allow any execution of this program. The reason is that, l being a public input,
only executions following the same path as the current execution are taken care
of by the monitoring mechanism. So, for such configurations where the branching
conditions are not influenced by the secret inputs, a monitoring mechanism is
at least as precise as any static analysis — and often more precise.
    The second reason lies in the granularity of the noninterference property.
Static analyses have to take into consideration all possible executions of the
program analyzed. This implies that if a single execution is unsafe then the pro-
gram (thus all its executions) is rejected. Whereas, even if some executions of a
program are unsafe, a monitor still allows this program to be used. The unsafe
executions, which are not useful, are altered to enforce confidentiality while the
safe executions are still usable. For example, the program on page 83 being inter-
fering, any static noninterfering analysis rejects this program. Therefore, users
would be advised not to use this program at all. However, using a noninterfer-
ence monitor, it is possible to safely use this program. When communicating
with any user other than sexyPirate, monitored executions of this program have
their normal behavior. When communicating with sexyPirate, monitored execu-
tions are safely detected as potentially interfering and can therefore be corrected
to prevent any secret leakage. Of course, when attempting to communicate with
sexyPirate, executions of this program are altered and it is therefore not possi-
ble to communicate with sexyPirate. However, this is the desired behavior of a
noninterference monitor when confidentiality is more important than the service
provided by the program.
94    Gurvan Le Guernic


6    Related Work

The vast majority of research on noninterference concerns static analyses and
involves type systems [13, 15]. Some “real size” languages together with secu-
rity type system have been developed (for example, JFlow/JIF [11] and Flow-
Caml [14]).
    Dynamic information flow analyses [1, 3, 19, 20] are not as popular as static
analyses for information flow, but there has been interesting research. For exam-
ple, RIFLE [17] is a complete run-time information flow security system based
on an architectural framework and a binary translator. Masri et al. [10] present
a dynamic information flow analysis for structured or unstructured languages.
Venkatakrishnan et al. [18] propose a program transformation for a simple de-
terministic procedural language that ensures a sound detection of information
flows. Trishul [12] is an interesting implementation of a Java Virtual Machine
integrating a dynamic information flow control mechanism. Yoshihama et al. [21]
propose a Java Architecture for Web Applications with “direct” and “explicit
indirect” information flow control abilities (as they acknowledge, they do not
handle “implicit indirect” flows). One of the main interest of those works is the
size of the language addressed. However, none of those five later works prove that
the correction mechanisms of “bad” flows proposed do not create a new covert
channel that can reveal secret information — see, e.g., [7] — or even, for some
of them, that the detection mechanism is sound with regard to their notion of
information flow. In fact, those analyses and correction mechanisms are likely to
create a new covert channel. Theorem 2 proves that a correction mechanism of
“bad” flows can be based on the dynamic analysis proposed in this paper as the
results of the dynamic analysis are the same for every executions started with
the same public inputs. More recently, Shroff, Smith, and Thober [16] proposed a
dynamic information flow analysis which tracks direct flows and collects indirect
flows dynamically. The information collected about indirect flows is transferred
from one execution to another using a cache mechanism. After an undetermined
number of executions, the analysis will know about all indirect flows in the pro-
gram and thus will then be sound with regard to the detection of all information
flows. This information about indirect flows can be precomputed using a static
analysis at the cost of a decrease of precision. Using this approach they are able
to handle a language including alias and method calls.
    Contrary to common assumption, none of the related works on dynamic in-
formation flow analysis known to the author take enough context information
into account to detect that the program on page 83 is noninterfering when using
it to communicate with users other than sexyPirate. For example, the transfor-
mation of Venkatakrishnan et al. [18] updates the security label of c with the
security label of the condition on line 3 before executing line 7. Therefore, at
line 7, c is always considered as secret even if the user is not communicating
                                      Precise Dynamic Verification of Confidentiality   95


with sexyPirate. When executing the assignment of line 5, the program counter
of Shroff et al.’s work [16] contains a reference to the program point of line 3
and therefore is added to the set of source of implicit flows to c. Consequently,
any complete implicit dependency cache contains a reference to the implicit flow
from line 3 to line 7. As the test of line 3 is always executed, Shroff et al.’s work
always considers line 7 as displaying secret information. The dynamic analy-
sis proposed in this paper is able to detect the noninterfering behavior of the
program on page 83 when communicating with someone other than sexyPirate.


7    Conclusion
This article addresses the problem of information flow verification and correction
at run time in order to enforce the confidentiality of secret data. The confiden-
tiality property to monitor is expressed using the property of noninterference be-
tween secret inputs of the execution and its public outputs. The language taken
into consideration is a sequential language with assignments and conditionals
(including loops). The main difference between the monitoring mechanism pro-
posed in this article and the ones of related works lies in the static analysis used
to detect implicit indirect flows. The static information flow analyses used by
the dynamic analysis proposed in this article are sensitive to the current pro-
gram state. This allows the overall dynamic information flow analysis to increase
the precision of the detection of implicit and explicit indirect flows. In Sect. 4,
the proposed noninterference monitor is proved to be sound both with regard
to the detection of information flows and with regard to their correction when
necessary.

Acknowledgments: The author is grateful to Anindya Banerjee, Gérard Boudol,
Thomas Jensen, Andreı̈ Sabelfeld and David Schmidt for their helpful feedback
on an earlier version of this work.
                                     Bibliography

 [1] J. Brown and T. F. Knight, Jr. A minimal trusted computing base for dynamically ensuring
     secure information flow. Technical Report ARIES-TM-015, MIT, 2001.
 [2] E. S. Cohen. Information transmission in computational systems. ACM SIGOPS Operating
     Systems Review, 11(5):133–139, 1977.
 [3] J. S. Fenton. Memoryless subsystems. The Computer Journal, 17(2):143–147, 1974.
 [4] G. Le Guernic. Confidentiality Enforcement Using Dynamic Information Flow Analyses. PhD
     thesis, Kansas State University, 2007.
 [5] G. Le Guernic. Automaton-based Confidentiality Monitoring of Concurrent Programs. In Proc.
     Computer Security Foundations Symp.. IEEE, 2007.
 [6] G. Le Guernic. Precise dynamic verification of noninterference. Technical report, INRIA, July
     2008. http://hal.inria.fr/inria-00162609/fr/.
 [7] G. Le Guernic and T. Jensen. Monitoring Information Flow. In Proc. W. on Foundations of
     Computer Security, pages 19–30. DePaul University, 2005.
 [8] G. Le Guernic, A. Banerjee, T. Jensen, and D. Schmidt. Automata-based Confidentiality Moni-
     toring. In Proc. Annual Asian Computing Science Conf., volume 4435 of LNCS, 2006.
 [9] J. Leyden. Trojan planted on US Consulate website. The Register, Sept. 2007.
[10] W. Masri, A. Podgurski, and D. Leon. Detecting and debugging insecure information flows. In
     Proc. Int. Symp. on Software Reliability Engineering, pages 198–209. IEEE, 2004.
[11] A. C. Myers. JFlow: Practical mostly-static information flow control. In Proc. Symp. Principles
     of Programming Languages, pages 228–241, 1999.
[12] S. K. Nair, P. N. D. Simpson, B. Crispo, and A. S. Tanenbaum. A virtual machine based
     information flow control system for policy enforcement. In Proc. W. on Run Time Enforcement
     for Mobile and Distributed Systems, volume 197, pages 3–16, 2007. Elsevier.
[13] F. Pottier and S. Conchon. Information flow inference for free. In Proc. Int. Conf. on Functional
     Programming, pages 46–57. ACM Press, 2000.
[14] F. Pottier and V. Simonet. Information flow inference for ML. ACM Trans. on Programming
     Languages and Systems, 25(1):117–158, 2003.
[15] A. Sabelfeld and A. C. Myers. Language-based information-flow security. IEEE J. Selected Areas
     in Communications, 21(1):5–19, 2003.
[16] P. Shroff, S. F. Smith, and M. Thober. Dynamic dependency monitoring to secure information
     flow. In Proc. Computer Security Foundations Symp.. IEEE, 2007.
[17] N. Vachharajani, M. J. Bridges, J. Chang, R. Rangan, G. Ottoni, J. A. Blome, G. A. Reis,
     M. Vachharajani, and D. I. August. RIFLE: An architectural framework for user-centric
     information-flow security. In Proc. Int. Symp. on Microarchitecture, 2004.
[18] V. N. Venkatakrishnan, W. Xu, D. C. DuVarney, and R. Sekar. Provably correct runtime enforce-
     ment of non-interference properties. In Proc. Int. Conf. on Information and Communications
     Security, volume 4307 of LNCS, pages 332–351. Springer-Verlag, 2006.
[19] C. Weissman. Security controls in the adept-50 timesharing system. In Proc. AFIPS Fall Joint
     Computer Conf., volume 35, pages 119–133, 1969.
[20] J. P. L. Woodward. Exploiting the dual nature of sensitivity labels. In Proc. Symp. Security and
     Privacy, pages 23–31, 1987.
[21] S. Yoshihama, T. Yoshizawa, Y. Watanabe, M. Kudoh, and K. Oyanagi. Dynamic information
     flow control architecture for web applications. In Proc. European Symp. on Research in Computer
     Security, volume 4734 of LNCS, pages 267–282. Springer-Verlag, 2007.