=Paper= {{Paper |id=Vol-1337/paper27 |storemode=property |title=Capturing and Manipulating Context-sensitive Program Information |pdfUrl=https://ceur-ws.org/Vol-1337/paper27.pdf |volume=Vol-1337 |dblpUrl=https://dblp.org/rec/conf/se/TrappHLL15 }} ==Capturing and Manipulating Context-sensitive Program Information== https://ceur-ws.org/Vol-1337/paper27.pdf
           Capturing and manipulating context-sensitive program
                              information

               Martin Trapp                Mathias Hedenborg                     Jonas Lundberg                  Welf Löwe
                Senacor Technologies AG                       Linnaeus Universitet, Software Technology Lab,
                Martin.Trapp@senacor.com                  {Mathias.Hedenborg|Jonas.Lundberg|Welf.Lowe}@lnu.se




                                                                 Abstract
                          Designers of context-sensitive program analyses need to take special care of
                          the memory consumption of the analysis results. In general, they need to
                          sacrifice accuracy to cope with restricted memory resources. We introduce χ-
                          terms as a general data structure to capture and manipulate context-sensitive
                          analysis results. A χ-term is a compact representation of arbitrary forward
                          program analysis distinguishing the effects of different control-flow paths.
                          While χ-terms can be represented by trees, we propose a memory efficient
                          representation generalizing ordered binary decision diagrams (OBDDs).

1    Introduction
Program analysis is an important part of optimizing compilers and software engineering tools. Analyses approximate the
run-time behavior of a given program. This is done by abstracting from the concrete semantics of programs and from
concrete data values. Such analyses can be context-sensitive or -insensitive, i.e., an analysis may or may not distinguish
different analysis results for different execution paths, i.e. different contexts, e.g., call contexts of a method. Context-
sensitive analyses are, in general, more precise than their context-insensitive counterparts but also more expensive in terms
of time and memory consumption. For languages with conditional execution, the number of different contexts grows, in
general, exponentially with the program size. Adding iterated execution leads, in general, to countably (infinitely) many
contexts. Hence, merging the analysis values of different contexts is needed for analyzing relevant programs of any realistic
programming language. Having said this, we ignore this merging throughout the remainder of the present paper. Instead,
we focus on capturing context-sensitive analysis information, i.e. contexts and analysis values for each program point, in
a memory efficient way. In other words, we strive to delay merging the analysis values of different context for increasing
analysis precision.
   Our approach is based on so-called χ-terms that capture the analysis results of different contexts. We introduce a
χ-operation over analysis values as the analysis semantics of φ-nodes in a Static Single Assignment (SSA) [CFR+ 91]
representation of a program. These χ-terms can be implemented without any redundancy by exploiting a generalization of
the idea behind Binary Decision Diagrams (BDDs) [Ake78]. This paper is very much inspired by the ideas first presented
by Martin Trapp in his dissertation [Tra99].
   The remainder of the paper is structured as follows: In Section 2, we give an informal introduction to χ-terms. In
Section 3, we define χ-terms and introduce a number of basic concepts and notations. In Section 4, we discuss operations
on χ-terms and introduce the concept of χ-induced operators. In Section 5, we conclude the paper.

2    Informal overview over χ-terms
In this section, we give an informal overview over χ-terms. The aim here is merely to introduce and motivate the idea. A
more formal presentation will be given in the next section. We use a SSA-based program representation and we will refer
to the used program representation as the SSA graph.

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
Submission to: 8. Arbeitstagung Programmiersprachen, Dresden, Germany, 18-Mar-2015, to appear at http://ceur-ws.org




                                                                      154
                                                                                          1              1             2
                  if (...)                                                        entry
                    x = 1;
                                                                              2                      3             4
                  else                                                                                       Phi                 2         3         4
                                                                        x=1                    x=2
                    x = 2;
                  if (...){                                                               4                                  7                       7
                    y = x;                                                                                             Phi                     Phi
                    b = 3;                                                    5                      6
                  }                                                     y=x                    y=2
                                                                        b=3                    b=4             10
                  else {                                                                                     Phi
                    y = 2;
                                                                                          7
                    b = 4;
                                                                                                                                      10
                  }                                                                                                                  +
                                                                              8                      9
                  if (...)                                              a=x                    a=y
                    a = x;
                  else                                                                    10
                                                                                  a +b                                 return
                    a = y;
                  return a+b;                                                          11
                                                                                  return


               Figure 1: A source code example with corresponding basic block and SSA graph structures.
    In Figure 1, we show a simple piece of code containing three if-statements and the corresponding basic block structure
and SSA graph. In the SSA graph, we have annotated each φ-function P hib with their basic block number b. The leaves
1, 2, 3 and 4 (in boxes) in the rightmost part of the figure represent integer values.
    The φ-functions in the SSA graph forbid static data-flow analysis to select a unique definition of a given variable.
For example, the value of variable x in Figure 1 depends on which branch of the first if-statement that is chosen. This
uncertainty, which we in general can not resolve, is in the SSA graph symbolized by the node P hi4 .
    In a context-insensitive data-flow analysis, the interpretation of the φ-functions assume that any possible definition is a
value of that variable. This approach results in a set approximation of the run-time values. For example, the value of the
variables x, y, and a in Figure 1 is {1, 2}, and the value of b is {3, 4}. Notice that in the context-insensitive interpretation,
the variable value sets resulting from a join point in the control-flow graph do not contain any information about the
different flow-path options that generated all the different values. It simply contains the possible values.
    Context-sensitive data-flow analyses are more precise and map control-flow path(s) to possible value(s). As the number
of possible paths grows exponentially with the number of if-statements (and is even unbounded if loops are involved) we
should aim for compact representations of this mapping. Such a compact context-sensitive description of a given variable
value can be achieved with χ-functions.
    A χ-function is a representation of how different control-flow options affect the value of a variable. For example, we
can write down the value of variable b in Figure 1 using χ-functions as b = χ7 (3, 4). Interpretation: variable b has the
value 3 if it was reached from the first predecessor to block 7 in the control-flow graph, and the value 4 if it was reached
from the second predecessor block. That is, a value expressed using χ-functions (a so-called χ-term) does not only contain
all possible values, it also contains which control-flow options that generated each of these values. Using this approach we
can write down the χ-term values for the variables in Figure 1 as:

                                            x   = χ4 (1, 2),
                                            y   = χ7 (χ4 (1, 2), 2),
                                            b   = χ7 (3, 4),
                                            a   = χ10 (χ4 (1, 2), χ7 (χ4 (1, 2), 2))

Notice that values of variables depending on several control-flow options (e.g., the value of variable a) correspond to
χ-terms over χ-terms, i.e., they are constructed as a composition of the χ-functions representing the different options.

2.1   χ-term Construction
The construction of the χ-term values and the numbering of the χ-functions is a part of a context-sensitive analysis. Every
φ-node in an SSA graph represents a join point for several possible definitions of a single variable, say x. When the analysis
reaches a block b containing a φ-node for x it “asks” all the predecessor blocks to give their definition of x and constructs
a new χ-term χb (x1 , . . . , xn ) where xi is the χ-term value for x given by the i:th predecessor.
   If the i:th predecessor block does not define x by itself, it “asks” its predecessor for the value. This process continues
recursively until each predecessor has presented a χ-term value for x. The process will terminate if any use of a value also
has a corresponding definition.




                                                               155
                                                                                     1

                            x = 1;
                            while ( ... ) {
                               if ( ... ) {                                                     φw
                                  x++;
                                } else {
                                  x--;                                               y               ++         --
                                }
                            }
                            y = x;
                                                                                                          φi



      Figure 2: A simple code fragment with a loop that encloses an if-statement and the corresponding SSA graph.
    Loops in the control-flow cause some problems as they would lead to ever growing χ-term, i.e., analysis would not
reach a fixed point. We will ignore the termination problem for now, and discuss and solve this problem later. For a first
try, we may have a (possibly countable) number of χ-terms with the same block number b since every φ-node in b generates
a new χ-term value when the analysis reaches that block.

2.2   Switching Behavior
As discussed in the introduction to Section 2, χ-term construction is tightly coupled to a context-sensitive analysis. A new
set of χ-terms is generated every time t the analysis reaches a block b containing φ-nodes. We denote this set with χbt .
   All χ-terms generated of block b at time t have the same switching behavior. That is, for any two χ-terms χbi (x1 , . . . , xn )
and χbj (y1 , . . . , yn ), and for any execution of the program, it holds that if i = j and branch xk in χbi (x1 , . . . , xn ) is selected,
then also branch yk in χbj (y1 , . . . , yn ) is selected. This property will be important later on when we introduce operations
on χ-terms. Anyhow, two χ-terms generated from the same block but at different times of the analysis will not necessarily
have the same switching behavior.
   This situation is illustrated in Figure 2 where we show a while-loop that encloses an if-statement that assigns a new
value to a variable x. At first entry of a block w containing the while loop header, we assign x the value χw                        0 (1, ⊥)
where the bottom element ⊥ symbolizes “value undefined”. This value is later propagated inside the loop body where the
if-statement is handled. After the if statement (in block i) we assign x the value χi0 (χw                     w
                                                                                                  0 (2, ⊥), χ0 (0, ⊥)), a value that is
later propagated to the loop header block w. Using this approach we can compute all possible values for x after an arbitrary
number of loop iterations. The first three x values that might escape the loop are:

                                     xw
                                      0     = χw
                                               0 (1, ⊥)
                                     xw
                                      1     = χw      i   w          w
                                               1 (1, χ0 (χ0 (2, ⊥), χ0 (0, ⊥)))
                                     xw
                                      2     = χw      i   w      i   w          w
                                               2 (1, χ1 (χ1 (2, χ0 (χ0 (3, ⊥), χ0 (1, ⊥))),
                                                             χw      i   w          w
                                                              1 (0, χ0 (χ0 (1, ⊥), χ0 (−1, ⊥)))))

There are a few things to notice. First, xw       n is the value of x after the n:th loop iteration. Secondly, the χ-term values xn
                                                                                                                                       w

can be interpreted as: Either x has the value 1, or the value after 1, . . . , n loop iterations, i.e., x ∈ [−(n − 1), n + 1] after
n loop iterations.
   Finally, χ-terms generated at different iterations of the loop do not have the same switching behavior. For example, the
two χ-functions χi0 and χi1 will have different switching behavior since we can not say that if the control flows through
one branch of the if-statement in one iteration, it will flow through the same branch in the following iteration. Thus, only
two χ-terms χbi (x1 , . . . , xn ) and χbj (y1 , . . . , yn ) from the same block b, with equal iteration index (i.e. i = j), will have a
similar switching behavior.

2.3   χ-term Representation
Every term can be naturally viewed as a tree and χ-terms are no exception. This is illustrated in Figure 3 (left) where we
show the tree representation of the χ-term values for the variable a in Figure 1. Each edge represents a particular control
flow option in this view and each path from the root node to a leaf value contains the sequence of control flow decisions
required for that particular leaf value to come into play.




                                                                    156
                                                      1         2                          1         2




                                        1        2         χ4        2                          χ4




                                            χ4                  χ7                                   χ7




                                                     χ10                                  χ10



Figure 3: Term for variable a = χ10 (χ4 (1, 2), χ7 (χ4 (1, 2), 2)) is illustrated by its tree and directed acyclic graph repre-
sentations (edge direction from bottom to top).



                  public int m(int a, int b) {                                            a+1                 a-1         b-1        b+1
                      if ( ... ) {
                        a = a+1;
                        b = b-1;
                      }                                                                                  φi                     φi
                      else {
                        a = a-1;
                        b = b+1;                                                                                     +
                      }
                      return a + b;
                  }
                                                                                                                    ret




      Figure 4: A method with the corresponding SSA graph of the basic block containing the return-statement.
   The tree representation of a χ-term is easy to understand and important from theoretical point of view: many of the
notations to be presented are easiest to understand in terms of operations on the tree representation.
   To actually represent each χ-term as a tree is in practice much to costly. More cost efficient representations are avail-
able [Bry86, Bry92]. They avoid redundant subtrees, cf. Figure 3 (right).

2.4   Operations on χ-terms
The code fragment on the left-hand side of Figure 4 assigns different values to the two variables a and b in the two different
branches of the if-statement and then returns the sum a + b.
   Using χ-terms, we can express the values of a and b before we add them as:

                                   a = χ(a + 1, a − 1),                        b = χ(b − 1, b + 1)

This situation is illustrated in Figure 5 where we now consider the addition as an operator

                                                      + : Xint × Xint 7→ Xint

working on χ-terms rather than integers.
  Furthermore, we know that any execution takes either the first or the second branch of the if-statement (but we do not
know which). This observation leads us to the following rewriting:

                     +(χ(a + 1, a − 1), χ(b − 1, b + 1))                 =      χ( first branch +, second branch +)
                                                                         = χ(+(a + 1, b − 1), +(a − 1, b + 1))




                                                                         157
                                      a+1       a-1         b-1       b+1           a+1       a-1         b-1       b+1



                                            χ                     χ                       χ                     χ



                                                       +                                             +



                                                      ret                                           ret


                            Figure 5: Symbolic input of the "+"- and the "ret"-SSA-nodes in Figure 4.



                                a+1       b-1         a-1       b+1



                                      +                     +                     a+b         a+b



                                                  χ                                      χ                            a+b



                                                ret                                     ret                           ret



Figure 6: The Shannon expansion can be used to "push" operators toward the leaf values where the context-insensitive
operator definitions can be applied.
That is, we can in this case make use of the fact that both χ-terms have the same switching behavior and apply the +
operator on each of the two branches separately before we merge the result. This transformation is illustrated Figure 6.
That we can rewrite the addition of two χ-terms as a χ-term over the addition of a and b for each individual branch is in
this case quite obvious. However, this rule can be applied on any operator τ and any χ-function χbj . This rewrite rule for
χ-term expressions is called a Shannon expansion1 .
   Finally, once we applied the Shannon expansion of the operator + in Figure 4, we are in a position where we can
apply + on a set of leaf values. In this case + is well defined and we can fall back on ordinary integer arithmetics. The
resulting simplifications are shown in Figure 6 where we also used the redundancy rule χ(t, t) = t. This manipulation can
symbolically be written as:

                       χ(+(a + 1, b − 1), +(a − 1, b + 1))                  =   χ((a + 1) + (b − 1), (a − 1) + (b + 1))
                                                                            = χ(a + b, a + b)
                                                                            = a + b.

The result indicates that no matter what branch of the if-statement we use, we will always get the result a + b. This simple
example illustrates one of the strengths of using χ-terms, we can by using a few simple rewrite-rules make use of having
stored flow-path information and "compute" more precise results than would have been possible in a context-insensitive
approach.
   Finally, it should be noted that our presentation of χ-terms as a tree, and all operations on χ-terms as tree manipulations,
is very much inspired by Orded Binary Decision Diagrams2 (OBDD) as presented in [Bry86, Bry92]. Furthermore, the
   1 The word "Shannon expansion" is taken from the OBDD literature where a similar procedure is used to rewrite boolean functions represented as

OBDDs.
   2 An OBDD is a clever (and slim) representation of Boolean functions as a directed acyclic graph that allows an efficient computation of frequently

used operations such as disjunction, conjunction and function composition.




                                                                            158
algorithms describing various operations on χ-terms presented in this chapter are intended to be easy to understand, more
efficient algorithms can be found in the above cited papers.

3     Definition and Notations
In this section, we present a number of definitions and notations related to χ-terms that are of general interest and that will
be used in the remainder of this paper later on. Many of the ideas and notations are taken from [Tra99] and [Bry86, Bry92]
and adjusted to fit our needs.

3.1    χ-functions
The set of χ-functions depends on the set of φ-functions which depends on the program we are analyzing. Furthermore,
the exact number of χ-functions at use depends on how we handle loops and other approximations we do in the analysis
(e.g. finite k approximations). To simplify what follows, we assume that each program p has a (possibly infinite) set of
χ-functions X (p).
   Each χ-function χbj ∈ X (p) is identified by a pair (b, j) where:

    1. The block number b = block(χbj ), b ∈ [1, Bp ] indicates in what basic block its generating φ-node is contained. Here
       Bp is the number of basic blocks in a program p.
    2. The iteration index j = index(χbj ) indicates on what analysis iteration over block b the χ-function was generated.
       The χ-functions generated from the first analysis iteration over b always have iteration index 0.
       Two χ-terms χbi (x1 , . . . , xn ) and χbj (y1 , . . . , yn ) from the same block b have the same switching behavior if they have
       the same iteration index (i.e. i = j). That is, for any execution of the program it holds that

                                                  branch xk is selected ⇔ branch yk is selected

       Thus, the switching behavior of a χ-function is determined by a pair (b, i) where b is the block number and i is the
       iteration index. In what follows, we will often skip the iteration index to simplify the notations. In these cases we
       assume that all involved χ-function have the same iteration index.
Finally, the arity of a χ-function, denoted arity(χbj ), with n arguments is n.3

3.2    χ-terms
A programming language type is a set of concrete values. In program analysis, concrete values are abstracted to abstract
values requiring a Galois connection between concrete and abstract types [CC77]. Associated with each abstract type
V , we have a set of χ-terms. These χ-terms form the context-sensitive counterpart to the abstract values V as used in a
context-insensitive analysis.

Definition 1 (χ-terms) Let LV = {V, u, t, >, ⊥} be a lattice4 of abstract values of type V and let X (p) be the set of
χ-functions associated with a given program p. The set of all χ-terms XV (p) over the abstract values V is then defined
recursively as:

                                                               v∈V         ⇒      v ∈ XV (p)
                           t1 , . . . , tn ∈ XV (p) ∧ χbj ∈ X (p) ⇒               χbj (t1 , . . . , tn ) ∈ XV (p), n ∈ arity(χbj )

Notice, even if the set of abstract values V is finite, XV (p) of a program with loops is infinite since there is no upper limit
on the composition depth.
   We have previously associated all χ-functions χbj with a block number b = block(χbj ). This notation can be extended to
χ-terms as well.
Definition 2 (Block number) The block number of a χ-term t ∈ XV , denoted block(t), is defined as:
                                               
                                                  0 if t ∈ V ,
                                    block(t) =
                                                  b if t = χb (t1 , . . . , tn ).
    3 The arity is independent of the loop iteration j. However, we index χb with j to be consistent in our notations.
                                                                              j
    4 The lattice property is introduced because it is required to widen the abstract values. This, in turn, is required for an analysis to terminate (in general).

However, widening and termination of analysis are deliberately omitted from the present discussion.




                                                                               159
Notice that we have associated all the values v ∈ V with a (default) block number 0 not corresponding to any existing
block in the program.

Definition 3 (Function set) The function set of a χ-term t ∈ XV (p), denoted f unc(t), is the set of all χ-functions χbj ∈
X (p) used to construct t.

For example:

                            t = χ3 (χ1 (1, 2), χ2 (χ1 (3, 4), 2)) ⇒ f unc(t) = {χ3 , χ2 , χ1 }

From now on we will drop the reference to a specific program p in X (p) and XV (p) and assume that it is implicit. It should
by now be clear that all these entities are directly related to a given analysis instantiation.

3.3   The Tree Representation of χ-terms
Each χ-term t ∈ XV has a unique tree representation Gt = {N, E, r} where N ⊆ XV is a set of nodes, E ⊆ XV × XV
is a set of edges, and r ∈ N is the root of the tree. The two sets N and E can be defined recursively as;

                                                 r   =      t,
                                           n∈N       ⇒ ni ∈ N ∧ (n, ni ) ∈ E.

where ni denotes the i:th argument of n = χbj (n1 , . . . , nn ).
  Our reason for introducing the tree representation of a χ-term is that many notions and χ-term operations are easiest to
understand in terms of tree manipulations. Basic operations include

  • The leaves of a χ-term t ∈ XV , denoted leaves(t) ⊆ V , is the set of all leaf values in the tree representation of t.

  • The subterms of a χ-term t ∈ XV , denoted subterms(t) ⊆ XV , is the set of all subtrees of t in the tree representation
    of t. The leaves and t are not included.

  • The children of χ-term t ∈ XV , denoted children(t) ⊆ XV , is defined as:
                                                     
                                                         ∅                    if t ∈ V
                                    children(t) =
                                                         {t1 , . . . , tn }   if t = χbj (t1 , . . . , tn )

  • The depth of a χ-term t ∈ XV , denoted depth(t), is the depth of the tree representation of t.

  • Let n ∈ XV be a node in the tree representation of a χ-term t ∈ XV . The depth of n in t, denoted depth(t, n) =
    depth(t) − depth(n), is the depth of the node n in the tree representation of t.

For example,
                                                     
                                                     
                                                      leaves(t) = {1, 2, 3, 4}
                                                      subterms(t) = {χ4 (1, 2), χ7 (χ4 (3, 4), 2), χ4 (3, 4)}
                                                     
                                                     
            t = χ10 (χ4 (1, 2), χ7 (χ4 (3, 4), 2)) ⇒   children(t) = {χ4 (1, 2), χ7 (χ4 (3, 4), 2)}
                                                       depth(t) = 3,
                                                     
                                                     
                                                     
                                                     
                                                       depth(t, χ4 (3, 4)) = 2
                                                     


3.4   Basic χ-term operations
In this section we present further basic operations on χ-terms. Most important is the Shannon expansion which can be used
to manipulate χ-term expressions without affecting their value.

Definition 4 (Restriction) The restriction of a χ-term t ∈ XV to the k:th branch of χbj , denoted t|(b,j):k , is a new χ-term
where every subterm χbj (t1 , . . . , tn ) with switching behavior (b, j) in t has been replaced by its k:th child tk .




                                                                 160
For example:

                                                                                      t|(3,0):2 = χ20 (χ10 (1, 2), 2)
                                                                                 
                            t = χ30 (χ10 (1, 2), χ20 (χ10 (1, 2), 2)) ⇒
                                                                                      t|(1,0):1 = χ30 (1, χ20 (1, 2))

A restriction f |xi =true in the OBDD description of a Boolean formula f (x1 , . . . , xn ) is a new formula where the i:th
variable xi has been assigned the constant value true. The χ-term counterpart is that we define a new χ-term t|(b,j):k by
only considering the value from the k:th predecessor of block b (in iteration j) in the construction of t. It should also be
noticed that if we restrict us to the k:th branch of χbj , when χbj 6∈ f unc(t), t is left unaffected.

Definition 5 (Equivalence) Two χ-terms are equivalent (≡) if they are (syntactically) the same or represent the same
χ-term value, i.e. if they can be transformed into the same χ-term by sequences of Shannon expansions or redundancy
eliminations.

Definition 6 (Shannon expansion) The Shannon expansion of a χ-term t ∈ XV over χbj is a new χ-term defined as:

                              t = χbj (t|(b,j):1 , t|(b,j):2 , . . . , t|(b,j):n )         where n = arity(χbj )

Notice, the Shannon expansion creates a new χ-term but not a new χ-term value. It is just a rewrite rule that can be used
to manipulate a χ-term expression. For example,

                    t =       χ31 (χ11 (1, 2), χ21 (χ11 (1, 2), 2))
                        ≡     χ11 (χ31 (χ21 (1, 1), χ21 (1, 2)), χ31 (χ21 (2, 2), χ21 (2, 2)))                   (expansion over χ11 )
                        ≡     χ21 (χ31 (χ11 (1, 2), χ11 (1, 2)), χ31 (χ11 (1, 2), χ11 (2, 2)))                   (expansion over χ21 )

Definition 7 (Redundancy and redundancy elimination) A χ-term is redundant if all its sub-terms are equivalent. That
is, χbj (t1 , . . . , tn ) ≡ t if t1 ≡ . . . ≡ tn . The process of removing redundant χ-terms is called redundancy elimination.

   Obviously, a χ-term t containing a redundant subterm χr can, without any loss of information, be reduced according to
the pattern

                                t = . . . χi (. . . , χr (t, . . . , t), . . .) . . . ≡ . . . χi (. . . , t, . . .) . . .

In the tree view of a χ-term, this corresponds to replace a subtree rooted by χr by any of its subterms (which are all equal).
For example,

                                 t =        χ11 (χ31 (1, χ21 (1, 2)), χ31 (2, χ21 (2, 2)))
                                     ≡      χ11 (χ31 (1, χ21 (1, 2)), χ31 (2, 2))                (since χ21 (2, 2) ≡ 2)
                                     ≡      χ11 (χ31 (1, χ21 (1, 2)), 2)                         (since χ31 (2, 2) ≡ 2)


4   Operations on χ-terms
When doing a program analysis there is a need of defining an abstract operator related to an existing concrete operation.
The aim of this section is to say that every context-insensitive operator τ has a χ-induced context-sensitive counterpart
τe with a semantics completely determined by the semantics of τ . We will define τe in terms of an algorithm apply that
recursively performs Shannon expansions of the operator τe over the χ-functions in the input χ-term values. We think
of this process as if we "push" the operator towards the leafs of the input χ-term where the ordinary context-insensitive
semantics of the operation can be applied. This idea is inspired by the ideas of evaluating logical formulas represented as
OBDDs [Bry86, Bry92].

Definition 8 (Restriction) Let τe : X1 × . . . × Xn 7→ Xv be an operator on χ-terms. The restriction of an operation
τe(t1 , . . . , tn ) to the k:th branch of χbj , denoted τe|(b,j):k , is defined as

                                           τe|(b,j):k (t1 , . . . , tn ) = τe(t1 |(b,j):k , . . . , tn |(b,j):k )

where ti |(b,j):k is the restriction of the i:th operand ti to the k:th branch of χbj .




                                                                             161
It simply means that we are restricting all input operands to the k:th branch of χbj before we apply the operation τe. Notice
that if χbj 6∈ f unc(t1 ) ∪ . . . ∪ f unc(tn ) we have ti |(b,j):k = ti and consequently that τe|(b,j):k = τe(t1 , . . . , tn ). That is, the
restriction has no effect in such cases.

Definition 9 (Shannon expansion) Let τe : X1 × . . . × Xn 7→: Xv be an operator on χ-terms. The Shannon expansion of
an operation τe(t1 , . . . , tn ) over the χ-function χbj is defined as

                                           τe(t1 , . . . , tn ) = χbj (e
                                                                       τ |(b,j):1 , . . . , τe|(b,j):m )

where m = arity(χbj ) and τe|(b,j):k is the restriction of the operation τe(t1 , . . . , tn ) to the k:th branch of χbj .

The Shannon expansion of an operation τe(t1 , . . . , tn ) over χbj can be seen as if we "push" the operator τe one step towards
the leaf values of the input operands ti by computing τe for each one of the branches of χbj individually, and then merge
these values using χbj .
   The Shannon expansion has the following properties:
  • It is a rewrite rule for a χ-term expression that has no effect on the resulting χ-term value. It can be seen as a first step
    in the computation of the operation.
  • It leaves the χ-expression unchanged if χbj 6∈ f unc(t1 ) ∪ . . . ∪ f unc(tn ) since in that case τe|(b,j):k = τe(t1 , . . . , tn )
    and we can apply the redundancy rule

                                       χbj (e
                                            τ (t1 , . . . , tn ), . . . , τe(t1 , . . . , tn )) = τe(t1 , . . . , tn ).

  • If χbj ∈ f unc(ti ) for any ti in t1 , . . . , tn then

                           |f unc(e
                                  τ |(b,j):1 ) ∪ . . . ∪ f unc(e
                                                               τ |(b,j):m )| < |f unc(t1 ) ∪ . . . ∪ f unc(tn )|,

     m = arity(χbj ). That is, the “size" of the input-operands to the τe operator (in terms of the number of unique χ-
     functions contained) is always decreasing.
This last property indicates that repeated Shannon expansions over the χ-functions in the input operands will result in a
situation where the input-operands to τe are all leaf values. Thus, the semantics of τe is completely defined in how τe operates
on the leaf values. This observation leads to the concept of χ-induced operations to be presented shortly.
    The algorithm apply (see Algorithm 1) is a recursive function that performs repeated Shannon expansion until it
reaches the leaf values where the context-insensitive operator can be applied.
Algorithm 1 apply(τ, t1 , . . . , tn ) 7→ t
  let f unc_set = f unc(t1 ) ∪ . . . ∪ f unc(tn )
  if f unc_set = ∅ then
     return τ (t1 , . . . , tn )
  else
     pick χbj from f unc_set
     for all i ∈ 1 . . . arity(χbj ) do
        let ci =apply(τ, t1 |j:i , . . . , tn |j:i )
     end for
     return χbj (c1 , . . . , carity(χbj ) )
  end if

   The test f unc_set = ∅ checks if t1 , . . . , tn are all leaf values. If so, we apply the context-insensitive operator τ .
Otherwise, we push the operator towards the leafs by a Shannon expansion in one of the remaining χ-functions. Note also
that:
                        apply(τ, v1 , . . . , vn ) = τ (v1 , . . . , vn ) if v1 , . . . , vn are all leaf values.
This follows immediately from the base case in the algorithm apply.
   Finally, Algorithm 1 is intended to be easy to understand, not efficient, since it will be used in the definition of χ-induced
context-sensitive operators. More efficient algorithms are available [Bry86].




                                                                            162
Definition 10 (χ-induced context-sensitive operator) For each context-insensitive operator τ : A × B × . . . × N 7→ V
their is a corresponding χ-induced operator τe : XA × XB × . . . × XN 7→ XV defined as

                                         τe(t1 , . . . , tn ) = apply(τ, t1 , . . . , tn ).

One important observation is that the algorithm apply in general increases the depth of the resulting χ-term expression.
More precisely, a closer look at apply shows that

                                         τ (t1 , . . . , tn )) = |f unc(t1 ) ∪ . . . ∪ f unc(tn )|.
                                   depth(e

As a consequence, τe applied on operands with a similar control-flow history (i.e. |f unc(t1 ) ∪ . . . ∪ f unc(tn )| 
|f unc(t1 )| + . . . + |f unc(tn )|) will generate less complex results than if applied on operands with very different control-
flow histories (i.e. |f unc(t1 ) ∪ . . . ∪ f unc(tn )| ≈ |f unc(t1 )| + . . . + |f unc(tn )|).
    In conclusion, every context-insensitive operator τ has a χ-induced counterpart τe and its semantics is completely deter-
mined by the context-insensitive semantics of τ . We compute τe by repeated Shannon expansions where each step means
that we are “pushing" τe closer to the leaf values where the context-sensitive operator τ is defined and can be applied.

5   Conclusions
We defined χ-terms as a means to capture context-sensitive analysis values for programs represented as SSA-graphs. Each
program point, i.e., each SSA-node, is mapped to a corresponding χ-term representing a context-sensitive analysis value for
that point. Especially, each meet point of execution paths in the program, i.e., each phi-node, is mapped to a χ-term whose
children represent the alternative analysis values of these paths. χ-terms are represented by graphs without any redundancy
which generalizes the idea behind BDDs. This leads to a memory efficient representation of context-sensitive analysis
information. Operations over context-insensitive analysis values, needed to implement transfer functions of the analysis,
are defined using restriction and Shannon expansion which are also similar to the corresponding BDD operations. As a
result, any context-insensitive analysis can automatically be transformed into a corresponding context-sensitive analysis.
   We deliberately ignored the merging of analysis values of different contexts, which is needed for making context-
sensitive analysis practical for large and iterative programs; otherwise even χ-terms grow beyond any memory bound.
Merging, and, hence, the reduction memory requirements of χ-term at the price of analysis precision, can easily be achieved
by applying the meet operation of the context-insensitive analysis values to leafs of χ-(sub-)terms. Controlling this merging
of sub-terms based on available memory and precision of analysis values is matter of future work.

References
[Ake78]     Sheldon B. Akers. Binary decision diagrams. IEEE Transactions on Computers, 27(6):509–516, June 1978.
[Bry86]     R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers,
            C-35(8):677–691, August 1986.

[Bry92]     R. E. Bryant. Symbolic boolean manipulation with ordered binary decision diagrams. ACM Computing Sur-
            veys, 24(3):293–318, September 1992.
[CC77]      P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs
            by construction of approximations of fixed points. In Conference Record of the Fourth Annual ACM
            SIGACT/SIGPLAN Symposium on Principles of Programming Languages, pages 238–252, January 1977.

[CFR+ 91] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assign-
          ment form and the control dependence graph. ACM Transactions on Programming Languages and Systems,
          13(4):451–490, 1991.
[Tra99]     Martin Trapp. Optimerung Objektorientierter Programme. PhD thesis, Universität Karlsruhe, December 1999.




                                                                  163