Capturing and manipulating context-sensitive program information Martin Trapp Mathias Hedenborg Jonas Lundberg Welf Löwe Senacor Technologies AG Linnaeus Universitet, Software Technology Lab, Martin.Trapp@senacor.com {Mathias.Hedenborg|Jonas.Lundberg|Welf.Lowe}@lnu.se Abstract Designers of context-sensitive program analyses need to take special care of the memory consumption of the analysis results. In general, they need to sacrifice accuracy to cope with restricted memory resources. We introduce χ- terms as a general data structure to capture and manipulate context-sensitive analysis results. A χ-term is a compact representation of arbitrary forward program analysis distinguishing the effects of different control-flow paths. While χ-terms can be represented by trees, we propose a memory efficient representation generalizing ordered binary decision diagrams (OBDDs). 1 Introduction Program analysis is an important part of optimizing compilers and software engineering tools. Analyses approximate the run-time behavior of a given program. This is done by abstracting from the concrete semantics of programs and from concrete data values. Such analyses can be context-sensitive or -insensitive, i.e., an analysis may or may not distinguish different analysis results for different execution paths, i.e. different contexts, e.g., call contexts of a method. Context- sensitive analyses are, in general, more precise than their context-insensitive counterparts but also more expensive in terms of time and memory consumption. For languages with conditional execution, the number of different contexts grows, in general, exponentially with the program size. Adding iterated execution leads, in general, to countably (infinitely) many contexts. Hence, merging the analysis values of different contexts is needed for analyzing relevant programs of any realistic programming language. Having said this, we ignore this merging throughout the remainder of the present paper. Instead, we focus on capturing context-sensitive analysis information, i.e. contexts and analysis values for each program point, in a memory efficient way. In other words, we strive to delay merging the analysis values of different context for increasing analysis precision. Our approach is based on so-called χ-terms that capture the analysis results of different contexts. We introduce a χ-operation over analysis values as the analysis semantics of φ-nodes in a Static Single Assignment (SSA) [CFR+ 91] representation of a program. These χ-terms can be implemented without any redundancy by exploiting a generalization of the idea behind Binary Decision Diagrams (BDDs) [Ake78]. This paper is very much inspired by the ideas first presented by Martin Trapp in his dissertation [Tra99]. The remainder of the paper is structured as follows: In Section 2, we give an informal introduction to χ-terms. In Section 3, we define χ-terms and introduce a number of basic concepts and notations. In Section 4, we discuss operations on χ-terms and introduce the concept of χ-induced operators. In Section 5, we conclude the paper. 2 Informal overview over χ-terms In this section, we give an informal overview over χ-terms. The aim here is merely to introduce and motivate the idea. A more formal presentation will be given in the next section. We use a SSA-based program representation and we will refer to the used program representation as the SSA graph. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. Submission to: 8. Arbeitstagung Programmiersprachen, Dresden, Germany, 18-Mar-2015, to appear at http://ceur-ws.org 154 1 1 2 if (...) entry x = 1; 2 3 4 else Phi 2 3 4 x=1 x=2 x = 2; if (...){ 4 7 7 y = x; Phi Phi b = 3; 5 6 } y=x y=2 b=3 b=4 10 else { Phi y = 2; 7 b = 4; 10 } + 8 9 if (...) a=x a=y a = x; else 10 a +b return a = y; return a+b; 11 return Figure 1: A source code example with corresponding basic block and SSA graph structures. In Figure 1, we show a simple piece of code containing three if-statements and the corresponding basic block structure and SSA graph. In the SSA graph, we have annotated each φ-function P hib with their basic block number b. The leaves 1, 2, 3 and 4 (in boxes) in the rightmost part of the figure represent integer values. The φ-functions in the SSA graph forbid static data-flow analysis to select a unique definition of a given variable. For example, the value of variable x in Figure 1 depends on which branch of the first if-statement that is chosen. This uncertainty, which we in general can not resolve, is in the SSA graph symbolized by the node P hi4 . In a context-insensitive data-flow analysis, the interpretation of the φ-functions assume that any possible definition is a value of that variable. This approach results in a set approximation of the run-time values. For example, the value of the variables x, y, and a in Figure 1 is {1, 2}, and the value of b is {3, 4}. Notice that in the context-insensitive interpretation, the variable value sets resulting from a join point in the control-flow graph do not contain any information about the different flow-path options that generated all the different values. It simply contains the possible values. Context-sensitive data-flow analyses are more precise and map control-flow path(s) to possible value(s). As the number of possible paths grows exponentially with the number of if-statements (and is even unbounded if loops are involved) we should aim for compact representations of this mapping. Such a compact context-sensitive description of a given variable value can be achieved with χ-functions. A χ-function is a representation of how different control-flow options affect the value of a variable. For example, we can write down the value of variable b in Figure 1 using χ-functions as b = χ7 (3, 4). Interpretation: variable b has the value 3 if it was reached from the first predecessor to block 7 in the control-flow graph, and the value 4 if it was reached from the second predecessor block. That is, a value expressed using χ-functions (a so-called χ-term) does not only contain all possible values, it also contains which control-flow options that generated each of these values. Using this approach we can write down the χ-term values for the variables in Figure 1 as: x = χ4 (1, 2), y = χ7 (χ4 (1, 2), 2), b = χ7 (3, 4), a = χ10 (χ4 (1, 2), χ7 (χ4 (1, 2), 2)) Notice that values of variables depending on several control-flow options (e.g., the value of variable a) correspond to χ-terms over χ-terms, i.e., they are constructed as a composition of the χ-functions representing the different options. 2.1 χ-term Construction The construction of the χ-term values and the numbering of the χ-functions is a part of a context-sensitive analysis. Every φ-node in an SSA graph represents a join point for several possible definitions of a single variable, say x. When the analysis reaches a block b containing a φ-node for x it “asks” all the predecessor blocks to give their definition of x and constructs a new χ-term χb (x1 , . . . , xn ) where xi is the χ-term value for x given by the i:th predecessor. If the i:th predecessor block does not define x by itself, it “asks” its predecessor for the value. This process continues recursively until each predecessor has presented a χ-term value for x. The process will terminate if any use of a value also has a corresponding definition. 155 1 x = 1; while ( ... ) { if ( ... ) { φw x++; } else { x--; y ++ -- } } y = x; φi Figure 2: A simple code fragment with a loop that encloses an if-statement and the corresponding SSA graph. Loops in the control-flow cause some problems as they would lead to ever growing χ-term, i.e., analysis would not reach a fixed point. We will ignore the termination problem for now, and discuss and solve this problem later. For a first try, we may have a (possibly countable) number of χ-terms with the same block number b since every φ-node in b generates a new χ-term value when the analysis reaches that block. 2.2 Switching Behavior As discussed in the introduction to Section 2, χ-term construction is tightly coupled to a context-sensitive analysis. A new set of χ-terms is generated every time t the analysis reaches a block b containing φ-nodes. We denote this set with χbt . All χ-terms generated of block b at time t have the same switching behavior. That is, for any two χ-terms χbi (x1 , . . . , xn ) and χbj (y1 , . . . , yn ), and for any execution of the program, it holds that if i = j and branch xk in χbi (x1 , . . . , xn ) is selected, then also branch yk in χbj (y1 , . . . , yn ) is selected. This property will be important later on when we introduce operations on χ-terms. Anyhow, two χ-terms generated from the same block but at different times of the analysis will not necessarily have the same switching behavior. This situation is illustrated in Figure 2 where we show a while-loop that encloses an if-statement that assigns a new value to a variable x. At first entry of a block w containing the while loop header, we assign x the value χw 0 (1, ⊥) where the bottom element ⊥ symbolizes “value undefined”. This value is later propagated inside the loop body where the if-statement is handled. After the if statement (in block i) we assign x the value χi0 (χw w 0 (2, ⊥), χ0 (0, ⊥)), a value that is later propagated to the loop header block w. Using this approach we can compute all possible values for x after an arbitrary number of loop iterations. The first three x values that might escape the loop are: xw 0 = χw 0 (1, ⊥) xw 1 = χw i w w 1 (1, χ0 (χ0 (2, ⊥), χ0 (0, ⊥))) xw 2 = χw i w i w w 2 (1, χ1 (χ1 (2, χ0 (χ0 (3, ⊥), χ0 (1, ⊥))), χw i w w 1 (0, χ0 (χ0 (1, ⊥), χ0 (−1, ⊥))))) There are a few things to notice. First, xw n is the value of x after the n:th loop iteration. Secondly, the χ-term values xn w can be interpreted as: Either x has the value 1, or the value after 1, . . . , n loop iterations, i.e., x ∈ [−(n − 1), n + 1] after n loop iterations. Finally, χ-terms generated at different iterations of the loop do not have the same switching behavior. For example, the two χ-functions χi0 and χi1 will have different switching behavior since we can not say that if the control flows through one branch of the if-statement in one iteration, it will flow through the same branch in the following iteration. Thus, only two χ-terms χbi (x1 , . . . , xn ) and χbj (y1 , . . . , yn ) from the same block b, with equal iteration index (i.e. i = j), will have a similar switching behavior. 2.3 χ-term Representation Every term can be naturally viewed as a tree and χ-terms are no exception. This is illustrated in Figure 3 (left) where we show the tree representation of the χ-term values for the variable a in Figure 1. Each edge represents a particular control flow option in this view and each path from the root node to a leaf value contains the sequence of control flow decisions required for that particular leaf value to come into play. 156 1 2 1 2 1 2 χ4 2 χ4 χ4 χ7 χ7 χ10 χ10 Figure 3: Term for variable a = χ10 (χ4 (1, 2), χ7 (χ4 (1, 2), 2)) is illustrated by its tree and directed acyclic graph repre- sentations (edge direction from bottom to top). public int m(int a, int b) { a+1 a-1 b-1 b+1 if ( ... ) { a = a+1; b = b-1; } φi φi else { a = a-1; b = b+1; + } return a + b; } ret Figure 4: A method with the corresponding SSA graph of the basic block containing the return-statement. The tree representation of a χ-term is easy to understand and important from theoretical point of view: many of the notations to be presented are easiest to understand in terms of operations on the tree representation. To actually represent each χ-term as a tree is in practice much to costly. More cost efficient representations are avail- able [Bry86, Bry92]. They avoid redundant subtrees, cf. Figure 3 (right). 2.4 Operations on χ-terms The code fragment on the left-hand side of Figure 4 assigns different values to the two variables a and b in the two different branches of the if-statement and then returns the sum a + b. Using χ-terms, we can express the values of a and b before we add them as: a = χ(a + 1, a − 1), b = χ(b − 1, b + 1) This situation is illustrated in Figure 5 where we now consider the addition as an operator + : Xint × Xint 7→ Xint working on χ-terms rather than integers. Furthermore, we know that any execution takes either the first or the second branch of the if-statement (but we do not know which). This observation leads us to the following rewriting: +(χ(a + 1, a − 1), χ(b − 1, b + 1)) = χ( first branch +, second branch +) = χ(+(a + 1, b − 1), +(a − 1, b + 1)) 157 a+1 a-1 b-1 b+1 a+1 a-1 b-1 b+1 χ χ χ χ + + ret ret Figure 5: Symbolic input of the "+"- and the "ret"-SSA-nodes in Figure 4. a+1 b-1 a-1 b+1 + + a+b a+b χ χ a+b ret ret ret Figure 6: The Shannon expansion can be used to "push" operators toward the leaf values where the context-insensitive operator definitions can be applied. That is, we can in this case make use of the fact that both χ-terms have the same switching behavior and apply the + operator on each of the two branches separately before we merge the result. This transformation is illustrated Figure 6. That we can rewrite the addition of two χ-terms as a χ-term over the addition of a and b for each individual branch is in this case quite obvious. However, this rule can be applied on any operator τ and any χ-function χbj . This rewrite rule for χ-term expressions is called a Shannon expansion1 . Finally, once we applied the Shannon expansion of the operator + in Figure 4, we are in a position where we can apply + on a set of leaf values. In this case + is well defined and we can fall back on ordinary integer arithmetics. The resulting simplifications are shown in Figure 6 where we also used the redundancy rule χ(t, t) = t. This manipulation can symbolically be written as: χ(+(a + 1, b − 1), +(a − 1, b + 1)) = χ((a + 1) + (b − 1), (a − 1) + (b + 1)) = χ(a + b, a + b) = a + b. The result indicates that no matter what branch of the if-statement we use, we will always get the result a + b. This simple example illustrates one of the strengths of using χ-terms, we can by using a few simple rewrite-rules make use of having stored flow-path information and "compute" more precise results than would have been possible in a context-insensitive approach. Finally, it should be noted that our presentation of χ-terms as a tree, and all operations on χ-terms as tree manipulations, is very much inspired by Orded Binary Decision Diagrams2 (OBDD) as presented in [Bry86, Bry92]. Furthermore, the 1 The word "Shannon expansion" is taken from the OBDD literature where a similar procedure is used to rewrite boolean functions represented as OBDDs. 2 An OBDD is a clever (and slim) representation of Boolean functions as a directed acyclic graph that allows an efficient computation of frequently used operations such as disjunction, conjunction and function composition. 158 algorithms describing various operations on χ-terms presented in this chapter are intended to be easy to understand, more efficient algorithms can be found in the above cited papers. 3 Definition and Notations In this section, we present a number of definitions and notations related to χ-terms that are of general interest and that will be used in the remainder of this paper later on. Many of the ideas and notations are taken from [Tra99] and [Bry86, Bry92] and adjusted to fit our needs. 3.1 χ-functions The set of χ-functions depends on the set of φ-functions which depends on the program we are analyzing. Furthermore, the exact number of χ-functions at use depends on how we handle loops and other approximations we do in the analysis (e.g. finite k approximations). To simplify what follows, we assume that each program p has a (possibly infinite) set of χ-functions X (p). Each χ-function χbj ∈ X (p) is identified by a pair (b, j) where: 1. The block number b = block(χbj ), b ∈ [1, Bp ] indicates in what basic block its generating φ-node is contained. Here Bp is the number of basic blocks in a program p. 2. The iteration index j = index(χbj ) indicates on what analysis iteration over block b the χ-function was generated. The χ-functions generated from the first analysis iteration over b always have iteration index 0. Two χ-terms χbi (x1 , . . . , xn ) and χbj (y1 , . . . , yn ) from the same block b have the same switching behavior if they have the same iteration index (i.e. i = j). That is, for any execution of the program it holds that branch xk is selected ⇔ branch yk is selected Thus, the switching behavior of a χ-function is determined by a pair (b, i) where b is the block number and i is the iteration index. In what follows, we will often skip the iteration index to simplify the notations. In these cases we assume that all involved χ-function have the same iteration index. Finally, the arity of a χ-function, denoted arity(χbj ), with n arguments is n.3 3.2 χ-terms A programming language type is a set of concrete values. In program analysis, concrete values are abstracted to abstract values requiring a Galois connection between concrete and abstract types [CC77]. Associated with each abstract type V , we have a set of χ-terms. These χ-terms form the context-sensitive counterpart to the abstract values V as used in a context-insensitive analysis. Definition 1 (χ-terms) Let LV = {V, u, t, >, ⊥} be a lattice4 of abstract values of type V and let X (p) be the set of χ-functions associated with a given program p. The set of all χ-terms XV (p) over the abstract values V is then defined recursively as: v∈V ⇒ v ∈ XV (p) t1 , . . . , tn ∈ XV (p) ∧ χbj ∈ X (p) ⇒ χbj (t1 , . . . , tn ) ∈ XV (p), n ∈ arity(χbj ) Notice, even if the set of abstract values V is finite, XV (p) of a program with loops is infinite since there is no upper limit on the composition depth. We have previously associated all χ-functions χbj with a block number b = block(χbj ). This notation can be extended to χ-terms as well. Definition 2 (Block number) The block number of a χ-term t ∈ XV , denoted block(t), is defined as:  0 if t ∈ V , block(t) = b if t = χb (t1 , . . . , tn ). 3 The arity is independent of the loop iteration j. However, we index χb with j to be consistent in our notations. j 4 The lattice property is introduced because it is required to widen the abstract values. This, in turn, is required for an analysis to terminate (in general). However, widening and termination of analysis are deliberately omitted from the present discussion. 159 Notice that we have associated all the values v ∈ V with a (default) block number 0 not corresponding to any existing block in the program. Definition 3 (Function set) The function set of a χ-term t ∈ XV (p), denoted f unc(t), is the set of all χ-functions χbj ∈ X (p) used to construct t. For example: t = χ3 (χ1 (1, 2), χ2 (χ1 (3, 4), 2)) ⇒ f unc(t) = {χ3 , χ2 , χ1 } From now on we will drop the reference to a specific program p in X (p) and XV (p) and assume that it is implicit. It should by now be clear that all these entities are directly related to a given analysis instantiation. 3.3 The Tree Representation of χ-terms Each χ-term t ∈ XV has a unique tree representation Gt = {N, E, r} where N ⊆ XV is a set of nodes, E ⊆ XV × XV is a set of edges, and r ∈ N is the root of the tree. The two sets N and E can be defined recursively as; r = t, n∈N ⇒ ni ∈ N ∧ (n, ni ) ∈ E. where ni denotes the i:th argument of n = χbj (n1 , . . . , nn ). Our reason for introducing the tree representation of a χ-term is that many notions and χ-term operations are easiest to understand in terms of tree manipulations. Basic operations include • The leaves of a χ-term t ∈ XV , denoted leaves(t) ⊆ V , is the set of all leaf values in the tree representation of t. • The subterms of a χ-term t ∈ XV , denoted subterms(t) ⊆ XV , is the set of all subtrees of t in the tree representation of t. The leaves and t are not included. • The children of χ-term t ∈ XV , denoted children(t) ⊆ XV , is defined as:  ∅ if t ∈ V children(t) = {t1 , . . . , tn } if t = χbj (t1 , . . . , tn ) • The depth of a χ-term t ∈ XV , denoted depth(t), is the depth of the tree representation of t. • Let n ∈ XV be a node in the tree representation of a χ-term t ∈ XV . The depth of n in t, denoted depth(t, n) = depth(t) − depth(n), is the depth of the node n in the tree representation of t. For example,    leaves(t) = {1, 2, 3, 4}  subterms(t) = {χ4 (1, 2), χ7 (χ4 (3, 4), 2), χ4 (3, 4)}   t = χ10 (χ4 (1, 2), χ7 (χ4 (3, 4), 2)) ⇒ children(t) = {χ4 (1, 2), χ7 (χ4 (3, 4), 2)} depth(t) = 3,     depth(t, χ4 (3, 4)) = 2  3.4 Basic χ-term operations In this section we present further basic operations on χ-terms. Most important is the Shannon expansion which can be used to manipulate χ-term expressions without affecting their value. Definition 4 (Restriction) The restriction of a χ-term t ∈ XV to the k:th branch of χbj , denoted t|(b,j):k , is a new χ-term where every subterm χbj (t1 , . . . , tn ) with switching behavior (b, j) in t has been replaced by its k:th child tk . 160 For example: t|(3,0):2 = χ20 (χ10 (1, 2), 2)  t = χ30 (χ10 (1, 2), χ20 (χ10 (1, 2), 2)) ⇒ t|(1,0):1 = χ30 (1, χ20 (1, 2)) A restriction f |xi =true in the OBDD description of a Boolean formula f (x1 , . . . , xn ) is a new formula where the i:th variable xi has been assigned the constant value true. The χ-term counterpart is that we define a new χ-term t|(b,j):k by only considering the value from the k:th predecessor of block b (in iteration j) in the construction of t. It should also be noticed that if we restrict us to the k:th branch of χbj , when χbj 6∈ f unc(t), t is left unaffected. Definition 5 (Equivalence) Two χ-terms are equivalent (≡) if they are (syntactically) the same or represent the same χ-term value, i.e. if they can be transformed into the same χ-term by sequences of Shannon expansions or redundancy eliminations. Definition 6 (Shannon expansion) The Shannon expansion of a χ-term t ∈ XV over χbj is a new χ-term defined as: t = χbj (t|(b,j):1 , t|(b,j):2 , . . . , t|(b,j):n ) where n = arity(χbj ) Notice, the Shannon expansion creates a new χ-term but not a new χ-term value. It is just a rewrite rule that can be used to manipulate a χ-term expression. For example, t = χ31 (χ11 (1, 2), χ21 (χ11 (1, 2), 2)) ≡ χ11 (χ31 (χ21 (1, 1), χ21 (1, 2)), χ31 (χ21 (2, 2), χ21 (2, 2))) (expansion over χ11 ) ≡ χ21 (χ31 (χ11 (1, 2), χ11 (1, 2)), χ31 (χ11 (1, 2), χ11 (2, 2))) (expansion over χ21 ) Definition 7 (Redundancy and redundancy elimination) A χ-term is redundant if all its sub-terms are equivalent. That is, χbj (t1 , . . . , tn ) ≡ t if t1 ≡ . . . ≡ tn . The process of removing redundant χ-terms is called redundancy elimination. Obviously, a χ-term t containing a redundant subterm χr can, without any loss of information, be reduced according to the pattern t = . . . χi (. . . , χr (t, . . . , t), . . .) . . . ≡ . . . χi (. . . , t, . . .) . . . In the tree view of a χ-term, this corresponds to replace a subtree rooted by χr by any of its subterms (which are all equal). For example, t = χ11 (χ31 (1, χ21 (1, 2)), χ31 (2, χ21 (2, 2))) ≡ χ11 (χ31 (1, χ21 (1, 2)), χ31 (2, 2)) (since χ21 (2, 2) ≡ 2) ≡ χ11 (χ31 (1, χ21 (1, 2)), 2) (since χ31 (2, 2) ≡ 2) 4 Operations on χ-terms When doing a program analysis there is a need of defining an abstract operator related to an existing concrete operation. The aim of this section is to say that every context-insensitive operator τ has a χ-induced context-sensitive counterpart τe with a semantics completely determined by the semantics of τ . We will define τe in terms of an algorithm apply that recursively performs Shannon expansions of the operator τe over the χ-functions in the input χ-term values. We think of this process as if we "push" the operator towards the leafs of the input χ-term where the ordinary context-insensitive semantics of the operation can be applied. This idea is inspired by the ideas of evaluating logical formulas represented as OBDDs [Bry86, Bry92]. Definition 8 (Restriction) Let τe : X1 × . . . × Xn 7→ Xv be an operator on χ-terms. The restriction of an operation τe(t1 , . . . , tn ) to the k:th branch of χbj , denoted τe|(b,j):k , is defined as τe|(b,j):k (t1 , . . . , tn ) = τe(t1 |(b,j):k , . . . , tn |(b,j):k ) where ti |(b,j):k is the restriction of the i:th operand ti to the k:th branch of χbj . 161 It simply means that we are restricting all input operands to the k:th branch of χbj before we apply the operation τe. Notice that if χbj 6∈ f unc(t1 ) ∪ . . . ∪ f unc(tn ) we have ti |(b,j):k = ti and consequently that τe|(b,j):k = τe(t1 , . . . , tn ). That is, the restriction has no effect in such cases. Definition 9 (Shannon expansion) Let τe : X1 × . . . × Xn 7→: Xv be an operator on χ-terms. The Shannon expansion of an operation τe(t1 , . . . , tn ) over the χ-function χbj is defined as τe(t1 , . . . , tn ) = χbj (e τ |(b,j):1 , . . . , τe|(b,j):m ) where m = arity(χbj ) and τe|(b,j):k is the restriction of the operation τe(t1 , . . . , tn ) to the k:th branch of χbj . The Shannon expansion of an operation τe(t1 , . . . , tn ) over χbj can be seen as if we "push" the operator τe one step towards the leaf values of the input operands ti by computing τe for each one of the branches of χbj individually, and then merge these values using χbj . The Shannon expansion has the following properties: • It is a rewrite rule for a χ-term expression that has no effect on the resulting χ-term value. It can be seen as a first step in the computation of the operation. • It leaves the χ-expression unchanged if χbj 6∈ f unc(t1 ) ∪ . . . ∪ f unc(tn ) since in that case τe|(b,j):k = τe(t1 , . . . , tn ) and we can apply the redundancy rule χbj (e τ (t1 , . . . , tn ), . . . , τe(t1 , . . . , tn )) = τe(t1 , . . . , tn ). • If χbj ∈ f unc(ti ) for any ti in t1 , . . . , tn then |f unc(e τ |(b,j):1 ) ∪ . . . ∪ f unc(e τ |(b,j):m )| < |f unc(t1 ) ∪ . . . ∪ f unc(tn )|, m = arity(χbj ). That is, the “size" of the input-operands to the τe operator (in terms of the number of unique χ- functions contained) is always decreasing. This last property indicates that repeated Shannon expansions over the χ-functions in the input operands will result in a situation where the input-operands to τe are all leaf values. Thus, the semantics of τe is completely defined in how τe operates on the leaf values. This observation leads to the concept of χ-induced operations to be presented shortly. The algorithm apply (see Algorithm 1) is a recursive function that performs repeated Shannon expansion until it reaches the leaf values where the context-insensitive operator can be applied. Algorithm 1 apply(τ, t1 , . . . , tn ) 7→ t let f unc_set = f unc(t1 ) ∪ . . . ∪ f unc(tn ) if f unc_set = ∅ then return τ (t1 , . . . , tn ) else pick χbj from f unc_set for all i ∈ 1 . . . arity(χbj ) do let ci =apply(τ, t1 |j:i , . . . , tn |j:i ) end for return χbj (c1 , . . . , carity(χbj ) ) end if The test f unc_set = ∅ checks if t1 , . . . , tn are all leaf values. If so, we apply the context-insensitive operator τ . Otherwise, we push the operator towards the leafs by a Shannon expansion in one of the remaining χ-functions. Note also that: apply(τ, v1 , . . . , vn ) = τ (v1 , . . . , vn ) if v1 , . . . , vn are all leaf values. This follows immediately from the base case in the algorithm apply. Finally, Algorithm 1 is intended to be easy to understand, not efficient, since it will be used in the definition of χ-induced context-sensitive operators. More efficient algorithms are available [Bry86]. 162 Definition 10 (χ-induced context-sensitive operator) For each context-insensitive operator τ : A × B × . . . × N 7→ V their is a corresponding χ-induced operator τe : XA × XB × . . . × XN 7→ XV defined as τe(t1 , . . . , tn ) = apply(τ, t1 , . . . , tn ). One important observation is that the algorithm apply in general increases the depth of the resulting χ-term expression. More precisely, a closer look at apply shows that τ (t1 , . . . , tn )) = |f unc(t1 ) ∪ . . . ∪ f unc(tn )|. depth(e As a consequence, τe applied on operands with a similar control-flow history (i.e. |f unc(t1 ) ∪ . . . ∪ f unc(tn )|  |f unc(t1 )| + . . . + |f unc(tn )|) will generate less complex results than if applied on operands with very different control- flow histories (i.e. |f unc(t1 ) ∪ . . . ∪ f unc(tn )| ≈ |f unc(t1 )| + . . . + |f unc(tn )|). In conclusion, every context-insensitive operator τ has a χ-induced counterpart τe and its semantics is completely deter- mined by the context-insensitive semantics of τ . We compute τe by repeated Shannon expansions where each step means that we are “pushing" τe closer to the leaf values where the context-sensitive operator τ is defined and can be applied. 5 Conclusions We defined χ-terms as a means to capture context-sensitive analysis values for programs represented as SSA-graphs. Each program point, i.e., each SSA-node, is mapped to a corresponding χ-term representing a context-sensitive analysis value for that point. Especially, each meet point of execution paths in the program, i.e., each phi-node, is mapped to a χ-term whose children represent the alternative analysis values of these paths. χ-terms are represented by graphs without any redundancy which generalizes the idea behind BDDs. This leads to a memory efficient representation of context-sensitive analysis information. Operations over context-insensitive analysis values, needed to implement transfer functions of the analysis, are defined using restriction and Shannon expansion which are also similar to the corresponding BDD operations. As a result, any context-insensitive analysis can automatically be transformed into a corresponding context-sensitive analysis. We deliberately ignored the merging of analysis values of different contexts, which is needed for making context- sensitive analysis practical for large and iterative programs; otherwise even χ-terms grow beyond any memory bound. Merging, and, hence, the reduction memory requirements of χ-term at the price of analysis precision, can easily be achieved by applying the meet operation of the context-insensitive analysis values to leafs of χ-(sub-)terms. Controlling this merging of sub-terms based on available memory and precision of analysis values is matter of future work. References [Ake78] Sheldon B. Akers. Binary decision diagrams. IEEE Transactions on Computers, 27(6):509–516, June 1978. [Bry86] R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, August 1986. [Bry92] R. E. Bryant. Symbolic boolean manipulation with ordered binary decision diagrams. ACM Computing Sur- veys, 24(3):293–318, September 1992. [CC77] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximations of fixed points. In Conference Record of the Fourth Annual ACM SIGACT/SIGPLAN Symposium on Principles of Programming Languages, pages 238–252, January 1977. [CFR+ 91] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assign- ment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–490, 1991. [Tra99] Martin Trapp. Optimerung Objektorientierter Programme. PhD thesis, Universität Karlsruhe, December 1999. 163