An Approach for Formal Verification of Updated Java Bytecode Programs Razika Lounas1,2 Mohamed Mezghiche 1 University of M’hamed Bougara of Boumerdes University of M’hamed Bougara of Boumerdes Facutly of Sciences, LIMOSE Laboratory Facutly of Sciences, LIMOSE Laboratory Avenue de l’independance, 35000 Boumerdes Avenue de l’independance, 35000 Boumerdes Algeria Algeria 2 University of Limoges mohamed-mezghiche@umbb.dz 123 Avenue Albert Thomas, 87700 Limoges, France razika lounas@umbb.dz Jean-Louis Lanet INRIA LHS-PEC 263 Avenue Général Leclerc, 35000 Rennes France jean-louis.lanet@inria.fr This paper deals with formal specification and verification of Java bytecode update. Programs update for java applications has gained a wide interest since it is used for several purposes: transforming semantics of a program, adding features to a program or performing optimizations. In this paper, we focus on program transformations for java programs at the bytecode level. Because these transformations may introduce errors, our goal is to provide a formal way to verify the update and establish its correctness. Our approach for formal specification and verification of updated Java bytecode programs is based on four ingredients: a formal interpretation of the semantics of update operations, a functional representation of bytecode, bytecode annotation and predicate transformation calculus. We use the concept of Hoare predicate transformation to derive a specification of an annotated bytecode. Annotations are used to express update operations within the code. A functional representation is used to model annotations and bytecode. The approach derives then a new specification for the annotated bytecode using a weakest precondition calculus defined to deal with update operations. Verification conditions are then generated and proved to establish the correction of the update. Bytecode transformation, formal semantics, weakest precondition calculus, bytecode verification. 1. INTRODUCTION some cases, the source code is not available (or not distributed). Transforming a program at bytecode During their life cycle, programs need to be level is an interesting alternative since several updated in order to alter their semantics, perform languages like Java or Java Card are based on optimizations or add features. Several techniques virtual machines executing bytecode. Transforming were presented for this purpose in literature, for programs at bytecode level offers some advantages: example, (Neamtiu et al. (2006) and Gupta et al. it does not require to recompile which can be a time (1996)) present systems for C programs updating consuming task as in the case of transformations and (Orso et al. (2002), Hlopko et al. (2013)) at source code level. On the other hand, bytecode present systems to update Java programs. level transformation is more complex than source- level manipulation for the users because they have Updating programs leads to the transformation to know bytecode language very well and because of their elements such as code, data structures of the many low-level details one needs to use. and state. We focus on the transformation of Java codes. In this context, several tools were Java bytecode transformation is used in several developed, for example, Java Syntactic Extender applications and several tools were developed to (JSE) (Bachrach and Playford (2001)) and ixj manipulate Java bytecode programs such as BCEL (Boshernitsan and Graham (2004)). However, in •• (Dahm (1999)) and RuggedJ (McGachey et al. the following contributions: the definition of a new (2009)). In (Sakamoto et al. (2000)), the authors weakest precondition calculus as the base of the developed an algorithm to ensure portable thread verification process, a formal interpretation of the migration in Java. This algorithm is based on semantics of the update operations, a functional bytecode transformation. Bytecode is transformed in representation of bytecode programs and bytecode order to enable programs to save and restore their annotation. The choice of functional representation execution state after migration through the network. is motivated by our interest in capturing the behavior Another purpose for bytecode transformation is of the initial bytecode and the updated version and presented in (Binder and Hulaas (2005)) where the mature existing tools for formal reasoning about a framework based on bytecode transformation is functional programming languages. developed in order to enable Java applications to perform CPU management. This paper is organized as follows: in section 2 we give an overview of embedDSU. Section 3 In some cases, the transformation occurs at runtime. introduces the language and the formal semantics of The update is then said to be dynamic (Dynamic the updates. In section 4, we present an overview Software Update: DSU). In (Noubissi (2011)) and of our approach in its steps. We present the in (Noubissi et al. (2011)), the authors presented a specification languages is section 5. In section 6, system to perform DSU: while the Java Card virtual we give our functional modelisation of Java bytecode machine is executing the program, the bytecode is and annotations. We propose a predicate calculus updated. for update operations in section 7 and give the notion of a correct update. This section ends with an This large interest of Java bytecode transformation example to show how the logic works. We discuss and its use in many critical applications raise the related work in section 8 and conclude in section 9. question of its correctness. In fact, a transformation may introduce an error which may alter the bytecode leading the system to an unexpected 2. OVERVIEW OF EMBEDDSU state. Besides, in some cases, the update is EmbedDSU (Noubissi (2011), Noubissi et al. (2011), critical (e.g. EmbedDSU) in such a way that an Noubissi et al. (2010)), is a software-based DSU attacker can take advantage of an incorrect update. technique for Java-based smart cards which relies In these applications where security issues are on the Java virtual machine. It is based on involved the update must pass some certification the modification of an embedded virtual machine. procedure for example Common Criteria (Common EmbedDSU is divided in two parts: off-card and on- Criteria (2015)). For a certain certification level card: one has to provide a formal proof of the security mechanism implemented. A formal way to specify (i) In off-card, a module called DIFF generator transformations and verify their correctness is then determines the syntactic changes between necessary. versions of classes in order to apply the update only to the parts of the application that are Formal methods offer rigourous means in specifying really affected by the update. The changes are software properties and establishing the correctness expressed using a Domain Specific Language of programs regarding their formal specifications. (DSL). Then, the DIFF file result is transfered In this work, we present an approach for formal to the card and used to perform the update. verification of bytecode update. We focus on Java bytecode and the system presented in (Noubissi et (ii) The on-card part is divided into two layers: al. (2011)) called embedDSU: a system developed 1) Application Layer: The binary DIFF file to implement DSU functionalities in Java Card is uploaded into the card. After a signature applications. It is based on two parts: off-card in check with the wrapper, the binary DIFF is which a module called DIFF generator computes interpreted and the resulting instructions are the syntactic changes between the old and the new transferred to the patcher in order to perform version of the application and generates a DIFF file the update. The patcher initializes data (called also a patch). This patch is then sent on structures for update. These data structures the card to perform the update by other modules are read by the updater module to determine implemented by extending the Java Card virtual what to update and how to update, by the machine. safeUpdatePoint detector module to determine when to apply the update and by the rollbacker In this work, we propose to formally verify that to determine how to return to the previous the obtained bytecode is semantically equivalent to version in case of update failure. These the one written by the programmer and used to points require the introspection of the virtual perform the DIFF file. Our approach is based on machine. 2) System Layer: the modified virtual Figure 2: An example of a patch (DIFF file) Figure 1: Architecture of EmbedDSU update and where it occurs in the bytecode. An example is shown figure 2: the patch indicates that the instruction iadd in the method compute sum is machine supports the followings features: (1) deleted and the instruction isub is added at the same Introspection module which provides search place provided by the program counter. functions to go through VM data structures like the references tables, the threads table, the class table, the static object table, 3. LANGUAGE AND SEMANTICS the heap and stack frames for retrieving information necessary to other modules; 3.1. The language (2) updater module which modifies object For the definition of the semantics, we extend the instances, method bodies, class metadata, formalism used by Freund and Mitchell (Freund and references, affected registers in the stack Mitchell (1999)). The authors define a type system thread and affected VM data structures; (3) for a small subset of Java bytecode. We define a SafeUpdatePoint detector module permits to subset and propose to extend it with instructions detect safe point in which we can apply the to indicate updates called update instructions update by preserving coherence of the system. (Upd instr ) for instruction addition, deletion and The system EmbedDSU is suitable for smard cards modification. In this definition, x is a local variable; especially in term of resource limitations. It was L is an instruction address; A is a class name; f is a established that sending a DIFF file is less ressource field name; l is a method name and pc the program consuming than sendig the whole new version to the counter. card and perform updates and that the resources Instruction ::= |pop |if L |store x |load x |new A implied by the update modules are acceptable in |binop |neg |const a |invokevirtual A l t |goto L term of memory occupation (Noubissi (2011)). The |getf ield A f t |putf ield A f t |return system EmbedDSU updates three principal parts: (i) The bytecode: the process updates first the U pd Instr ::= Add Inst Instruction pc bytecode of the updated class and the meta |Dlt Inst Instruction pc data associated with it e.g., constant pool, |M od Inst Instruction instruction pc fields table, methods table... In this language, the instruction pop extracts the top (ii) The heap: The process updates the instances of the stack and const a pushes a constant a on of the updated class in the heap, obtains new the top of the stack. The instruction load x pushes references for modified objects and updates the value in the variable x on the top of the stack instances using these references. whereas the instruction store x pops the top of the stack and stores it in the variable x. The instruction (iii) The frames: The process updates in each frame if L jumps to L if the top of the stack is not zero else in the thread stack the references of updated it performs the following instruction. Goto L jumps to objects to point to new instances. L. The instruction N ew A allocates a new object of type A and pushes it on the top of the stack. The This paper addresses the first part: bytecode update instructions manipulating fields are : getf ield A f t at the method level. The types of updates that and putf ield A f t. Getfield reads the field f , which may occur are: adding, modifying or suppressing has the type t of the object of class A whose bytecode instructions, changing the signatures of a reference is on the top of the stack and pushes its method or modifying local variables. These updates value on the top of the stack and putfield modifies are contained in the DIFF file which indicates the the field f with the value popped form the stack. •• The instruction invokevirtual invokes the method l the semantics to establish that an updated program of signature t and the class A. The instruction Binop is well typed. It is also used in further section to is used to gather arithmetic binary operations: add, derive specifications for program transformations. In mult and sub. The instruction neg negates the top of the rules shown in tables 2 and 3, F is a mapping the stack and return is for method return. from a program point to a mapping from a frame variable to a type. S is a mapping from a program Update instructions are respectively: adding an point to an ordered sequence of types, i denotes instruction, deleting instruction and modifying an a program point or an address of code. The map instruction. We indicate the place of the update Fi gives a type of local variables at program point operation with pc. i. The string Si gives the types of entries in the operand stack at program point i. These F and S 3.2. Operational semantics for bytecode are useful to our semantics since they contain typing instructions information about valid local variables and entries in the operand stack respectively. SD represents We model the interpretation of the instructions the stack depth and M (mapping) is a function that of the bytecode instructions using the standard associates a number to each line. Dom is the set framework for operational semantics (Freund and of addresses used by the method. A configuration Mitchell (1999), Bannwart and Müller (2005)). Each at line i is represented by < (F, S, SD, M ), i >. The instruction is characterised by the transformation of judgement that expresses that a bytecode BC is well a configuration. A configuration < M, s, h, f, pc > typed by F , S, SD and M is: representing a step execution consists of an operand stack s, a heap h, a local variables map f , a F1 = F⊤ , SD1 = 0 program counter pc and the body M . Operational S1 = ε, M 1 = M ap(BC) semantics is defined by a transition relation over ∀i ∈ DOM (BC), F, S, SD, M, i ⊢ BC configurations. A transition < M, s, h, f, pc >→< F,S,M,SD⊢BC M, s2, h2, f 2, pc2 > takes the state from the configuration < M, s, h, f, pc > to the configuration The first two lines of the judgement represent the < M, s2, h2, f 2, pc2 >. initial configuration: all variables are mapped to the value top (default initial value), stack depth is zero, The rules for the instructions of our language are the sequence of types is initially empty (ε) and M 1 represented in table 1. The instruction new A creates is the mapping of the initial bytecode. The last line a new object of class A, thereby modifying the expresses that each instruction (update instruction) current heap. A reference to the new object is in the bytecode is well typed. This is ensured by pushed onto the stack. store x pops a value from the rules given in tables 2 and 3. For illustration, the evaluation stack and assigns it to a variable,f is the insertion of the instruction new A at line i + modified accordingly. load x put the value of x on the 1 allows us to obtain a new configuration if the top of the stack. The binop operation which pops two stack depth is incremented, local variables are not values from the stack, performs the binary operation, affected and in the stack, the type A is inserted. and pushes the result. if l has two rules; wether it In the instruction invokevirtual the function dom jumps to the indicated line or performs the following represents the domain of the invoked function (types instruction according to the value of the top of stack. of its arguments) and the function card represents The instruction putf ield updates the heap with the the number of elements in the domain. The rule new value of the field of the object which is on the expresses that these arguments are popped from the top of the stack. The new value is popped from the stack of type and then the result is pushed. For the second element of the stack. invokevirtual invokes insertion of an instruction representing an arithmetic the method l on an object reference and parameters binary operation Binop, we show the rule of the on the stack and replaces these values by the return instruction add: this operation pops two elements value v of the invoked method after its execution. (integers) from the stack and then pushes the result. mult and sub have analogous explanations by writing 3.3. Formal semantics for update instructions the right operation. In the rules, the mapping M 2 is the result of operations on M 1. The operations which We propose a static semantics to express the effects represent manipulations on bytecode are: range and of update instructions on a configuration of the shift. The operation range extracts from a mapping bytecode. This semantics was introduced in our M 1 a part M 2 included between line n and line m. initial paper (Lounas et al. (2012)). The purpose The second operation shifts a part from a mapping of the semantics is to express formally the effects between n and m for p positions which is determined and the conditions of update instructions and thus by the number of added instructions. prevent type errors in the updated bytecode. In this paper, we give more rules and show how to use •• We define the operations look f or jumps and update jumps to take into account jumps in bytecode transformation: look f or jumps returns from a mapping a list of jumps instructions represented by their line number and the operation update jumps updates jump instructions: Look f or jumps : mapping → int list U pdate jumps : mapping ∗ int list ∗ int → mapping These operations updates jumps within the bytecode if necessary. When we add for instance an instruction at pc, the instructions after this position Figure 3: Approach for verification are shifted and their numbers change. It is then necessary to update goto and if instructions accordingly. These modifications keep the structure semantically equivalent. This equivalence ensures of the bytecode coherent. In the rules for instructions that the system indeed implemented the desired suppression (table 3), Ef f ect ST K, Ef f ect F and transformation. This problem can be expressed Ef f ects SD are used to express the effects of equationally by: an instruction of the stack and the local variables and stack depth. They are used to readjust these ∀P 1, P 2, P 2′ , ∆ = DIF F (P 1, P 2), P 2′ = elements to the instruction at (i + 1) in the App P AT CH(P 1, ∆) ⇒ P 2 ≡ P 2′ new bytecode after the suppression. The notation (M 2)F (Respectively, (M 2)S) is used to express F This raises two major issues: 1- how to model (Respectively, S) in the mapping M 2. We notice that the application of the DIFF file on an existing in this formalisation, a modification is considered as program? and 2- how to express the equivalence a suppression followed by an insertion. which guarantees the correctness of the update? We present the overview of our approach for transformation verification. Figure 3 represents an 4. APPROACH FOR FORMAL VERIFICATION overview of our approach which is split in three parts: The mechanism of EmbedDSU implies the modifica- (i) The transformation block: in this stage, we obtain tion of the bytecode of a running application on-card from a first version of a bytecode program after the conventional verification during the process BC V 1 and a second version BC V 2 (Version of its life cycle. In this process, bytecode passes one transformed), a DIFF file. This DIFF file verification process based especially on type veri- will be applied to the on-card first version. fication. The applications of update operations on- We obtain a new version on-card. The goal card is performed with insertion and suppression of of our approach is to establish that the on- instructions according to the DIFF file. Consequently, card new version and BC V 2 are semantically we obtain on-card, after the update process, a new equivalent. At this level, the specifications of bytecode that was not submitted to the conventional both BC V 1 and BC V 2 are provided by verification process. Our framework allows to: the programmer using existing specification languages. (i) Ensure the validity of update operations of the DIFF file according to the formal specification (ii) The functional block: we define a functional of the Java Card virtual machine specification. model for representing and manipulating (ii) Guarantee that the application of the update the Java Card bytecode. We implement an leads to a bytecode with the specification automatic translator called functional reader that is conform to the intended specification which takes a program written in bytecode (provided by the programmer). and produces a functional representation of it. The application of the DIFF file is represented The first point is ensured by the formalisation at this level as annotations of the functional of the semantics of update operations. In the representation with expressions indicating the second point, we aim to establish that given an place of the update operation and its nature initial program P 1, its new version P 2 and a (addition of instructions, deletion . . . ) DIFF file ∆ containing the specification of the (iii) The verification block: our goal is to verify transformation derived from the differences between that the bytecode obtained by transformation P 1 and P 2, the application of the DIFF file on- is equivalent to the one written by the program- card on P 1 (noted App PATCH) leads to P 2′ . mer i.e., it satisfies the same specification. The The two programs P 2 and P 2′ are verified to be •• Table 1: Rules for operational semantics M [pc]=pop M [pc]=new A,h′ =h[create(A,ref )] M [pc]=load x M [pc]=store x M [pc]=if l M [pc]=if l,v̸=0 M [pc]=const a M [pc]=getf ield a f t,v=h[o.f ] M [pc]=neg M [pc]=binop,op∈{+,−,∗} M [pc]=putf ield A f t,h′ =h[o.f ←v] M [pc]=goto l M [pc]=invokevirtual A l t , Table 2: Rules for update operations (insertion of instructions) Add inst goto L(i + 1) Add inst store x(i + 1) Add inst add(i + 1) SDi+1 = SDi P C M AX + + SDi+1 = SDi − 1 P C M AX + + SDi+1 = SDi − 1 Si+1 = Si Fi+1 = Fi Si = t.S0 Fi+1 = Fi [x ← t] Si = int.int.S0 ⇒ M2 = Si+1 = S0 Si+1 = int.S0 Add inst(M 1, goto L, i + 1) M 2 = Add inst(M 1, store x, i + 1) M 2 = Add inst(M 1, add, i + 1) i + 1, L ∈ DOM (BC) i + 1 ∈ DOM (BC) x ∈ V AR(BC) i + 1 ∈ DOM (BC) Fi+1 = Fi F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Add inst new A(i + 1) Add inst pop (i + 1) Add inst putf ield(A, f, t)(i + 1) SDi+1 = SDi + 1 SDi+1 = SDi − 1 Fi+1 = Fi SDi+1 = SDi − 2 Fi+1 = Fi Si+1 = A.Si Fi+1 = Fi Si = t.S0 → Si+1 = S0 Si = t.A.S0 ⇒ Si+1 = S0 M2 = M 2 = Add inst(M 1, pop, i + 1) M2 = Add inst(M 1, new A, i + 1) P C M AX + + Add inst(M 1, putf ield(A, f, t), i + 1) P C M AX + + i + 1 ∈ DOM (BC) P C M AX + 3 i + 1 ∈ DOM (BC) i + 1 ∈ DOM (BC) F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Add inst load x(i + 1) Add inst if L(i + 1) Add inst getf ield(A, f, t)(i + 1) SDi+1 = SDi + 1 SDi+1 = SDi SDi+1 = SDi P C M AX + + P C M AX + + Si = A.S0 ⇒ Si+1 = t.S0 Si+1 = Fi [x].Si Fi+1 = Fi Si+1 = Si Fi+1 = Fi M2 = M2 = M2 = Add inst(M 1, getf ield(A, f, t), i + 1) Add inst(M 1, load x, i + 1) Add inst(M 1, if L, i + 1) P C M AX + 3 Fi+1 = Fi i + 1 ∈ DOM (BC) x ∈ V AR(BC) i + 1, L ∈ DOM (BC) F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Add inst invokevirtuel(A, l, t)(i + 1) Add inst const a(i + 1) Add inst neg (i + 1) SDi+1 = SDi − (card(dom(t)) + 1) SDi+1 = SDi + 1 SDi+1 = SDi Fi+1 = Fi Si+1 = tn1 .tn2 . . . tnn .S0 → Si+1 = S0 P C M AX + + Si = int.S0 = Si+1 M2 = Si+1 = int.Si Fi+1 = Fi M2 = Add inst(M 1, invokevirtuel(A, l, t), i + 1) M2 = Add inst(M 1, negi + 1) i + 1 ∈ DOM (BC) Fi+1 = Fi Add inst(M 1, const a, i + 1) P C M AX + + P C M AX + 3 i + 1 ∈ DOM (BC) i + 1 ∈ DOM (BC) F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC •• Table 3: Rules for update operations (suppression of instructions) Dlt inst goto L (i + 1) Dlt inst (store x (i + 1)) SDi = a → SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, store x, i + 1) M 2 = Dlt inst(M 1, goto L, i + 1) Si = t.S0 , Fi [x] = t → (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 Ef f ects ST K(M 2[i + 1], t.S0 ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1, L ∈ DOM (BC) P C M AX − − i + 1 ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Dlt inst (add (i + 1)) M 2 = Dlt inst(M 1, add, i + 1) Dlt inst (pop (i + 1)) SDi = a → SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, pop, i + 1) Si = int.int.S0 → Si = t.S0 → (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], t.S0 ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1 ∈ DOM (BC) P C M AX − − i + 1 ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Dlt inst (putf ield(A, f, t) (i + 1)) Dlt inst (getf ield(A, f, t) (i + 1)) SDi = a → SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, putif ield(A, f, t), i + 1) M 2 = Dlt inst(M 1, getif ield(A, f, t), i + 1) Si = A.t.S0 → Si = A.S0 → (M 2)Si+1 Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 Ef f ects ST K(M 2[i + 1], A.S0 ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1 ∈ DOM (BC) P C M AX − − i + 1 ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Dlt inst new A (i + 1) Dlt inst if L (i + 1) SDi = a → SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, if L, i + 1) M 2 = Dlt inst(M 1, new A, i + 1) Si = int.S0 → (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1 ∈ DOM (BC) P C M AX − − i + 1, L ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Dlt inst (neg (i + 1)) SDi = a → Dlt inst (load x (i + 1)) SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, neg, i + 1) M 2 = Dlt inst(M 1, load x, i + 1) Si = int.S0 → (M 1)Si+1 = t.S0 → (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 Ef f ects ST K(M 2[i + 1], S0 ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1 ∈ DOM (BC) P C M AX − − i + 1 ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC Dlt inst (const a (i + 1)) SDi = a → Dlt inst (invokevirtuel(A, l, t) (i + 1)) SDi+1 = Ef f ects SD(a, M 2[i + 1]) SDi = a → SDi+1 = Ef f ects SD(a, M 2[i + 1]) M 2 = Dlt inst(M 1, const a, i + 1) M 2 = Dlt inst(M 1, invokevirtuel(A, l, t), i + 1) Si = S0 → Si = tn1 .tn2 . . . tnn .S0 → (M 2)Si+1 = Ef f ects ST K(M 2[i + 1], Si ) (M 2)Si+1 Ef f ects ST K(M 2[i + 1], Si ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) (M 2)Fi+1 = Ef f ects F (M 2[i + 1], Fi ) i + 1 ∈ DOM (BC) P C M AX − − i + 1 ∈ DOM (BC) P C M AX − − F,S,M 2,SD,i+1⊢BC F,S,M 2,SD,i+1⊢BC •• specification of the obtained bytecode in its functional representation with annotations is performed by a weakest precondition calculus that we define specially to deal with update operations. A verification condition generator gives then statements to be verified to estab- lish that the obtained specification matches the specification given by the programmer at the level one. A proof assistant is used to Figure 4: Bytecode annotation with update instructions discharge verification conditions. obtained by applying a compiler JML2BML and will 5. JML AND BML SPECIFICATIONS be used by the next stages of the approach to The starting point is a new version BC V 2 of perform verification condition generation and ensure un existing program BC V 1. First the programmer the transformation correctness. writes the new version with its specification in terms of pre/post conditions. The specification language 6. ANNOTATION AND FUNCTIONAL used is JML (Java Modeling Language). REPRESENTATION OF BYTECODE JML (Burdy et al. (2005)) is a specification language The DIFF file containing the update instructions for Java/Java Card programs. It allows assertions is calculated at bytecode level and then sent to to be included in the source code, specifying for perform the update on-card. In order to ensure that example pre- and postconditions and invariants. JML we send the right one, we model its application on annotations are a special kind of Java comments: an initial version of bytecode P1 as annotations. they are preceded by / / @, or written between /* The operation of annotating a bytecode with @ and @* /. expressions indicating where an update instruction occurs and what is the operation involved can be A simple method specifications is of the form: defined recursively as an annotation function which /*@ normal_behavior transforms a program to an annotated program. requires : ; Annot(ε, P ) ≡ P ensures : ; Annot([U pdi |∆], P ) ≡ let P′ = @*/ Add Annot Line(U pdi , P ) in Annot(∆, P ′ ) This specification means that if the precondition The annotation of a program with an empty DIFF (requires) holds at the beginning of a method file (ε) is the program itself otherwise, the function invocation, then the method terminates normally iterates over the update operations (U pdi ) in the and the postcondition (ensures) will hold at the patch and adds a corresponding annotated line end of the method. Constructs are defined to write (Add Annot Line(U pdi , P )) to the program. Figure assertion such as: \old, to denotes the old value of a 4 shows an annotated program obtained by the variable,\result to denote the result of a method and application of a DIFF file on an initial byte the quantifiers, \f orall and \exists. code. The annotations are represented as special commentaries. For example, Del 4 : deletes the The DIFF file in the system EmbedDSU is created instruction at program counter (pc) 4 and add isub from the program’s bytecode. To ensure the 4, adds the instruction isub at pc 4. correctness of the transformation, the verification of the specification will be done at bytecode level. In our framework, we use a functional representation The language BML (Burdy et al. (2007)), allows for both bytecode programs and annotation function. to express specifications of bytecode programs. Its Figure 5 shows a fragment of the formalisation formalism is based on JML and the structures of written in OCaml. We start by defining the data specifications in both languages are very similar. manipulated by the program (integers, objects and variables, then, we formalise the instructions of the At the transformation block, specifications for sub language. The definition of an instruction is both first version and second version are written given by the name of a construct (representing the in JML. Starting from a specified source code name of the instruction) followed by its arguments. {Pjml }codesource {Qjml }, with Pjml and Qjml rep- For example, for the instruction new, we have resenting respectively precondition and postcondi- the construct New taking an Object as argument tion of codesource , we obtain a specified bytecode and the instruction putfield is represented by the program {Pbml }codeBC {Qbml }. This information is construct Putfield followed by a triple representing the machine state where Heap represents the contents of the heap, Frame represents the ex- ecution state of the current Method and, Stack- Frame is a list of frames corresponding to the call stack. A frame contains the following ele- ments : the stack of operands OperandStack and the values of the local variables LocalV ar at the program point P C of the method M ethod ( < H, M ethod, P C, OperandStack, LocalV ar > ). The definition of the update interpretation is based on the notion of step. Definition 1. Step The semantics of an instruction (update instruction) is specified as a function step: Bytecode P rog ∗ State ∗ Specif ication− > State ∗ StepN ame ∗ Specif ication that, given a bytecode P, a state S and a specification SP, computes the next state S’, the name of the next step and a new specification. Figure 5: An extract of functional modelisation of bytecode Definition 2. Java bytecode update interpreter We define now an update interpreter (U pd int) which iterates over steps, take as parameters an annotated the arguments: the class (Object) and the names of program in its functional representation, an initial the type of the field and its name as strings. state and an initial specification and relies on predicate calculus and update interpretation function A bytecode line is defined as a number (representing to produce a new state and a new specification. The the program counter) with an instruction. The interpreter is defined as U pd int(BC, S) = (S ′ , Sp′ ) bytecode is represented as a list of bytecode with S = initial(BC, Sp) the function for defining an lines. An annotated line is represented by the initial state for the execution of the bytecode BC with product of a bytecode line and a string representing the initial specification Sp. The Code BC is given the annotation. An annotated bytecode is a list with its parameters and an initial heap. The result of of annotated bytecode lines. The result of this the interpreter is a state S ′ and a new specification modelisation is used to derive specifications of Sp′ . updated programs. Definition 3. Verified updated bytecode 7. VERIFICATION • Let P 1 and P 2 be the first and the new version of a program and P a patch, Our approach for verification is based on the fact that the transformation of a bytecode (of • let P 2′ = annot(P 1, P ) be the program its semantics) implies the transformation of its obtained by annotation of P1 with P, specification. In Hoare Logic (Hoare (1969)), a program P 1 and its specification is represented by • let f (P 2′ ) the functional representations of P 2′ , a triple {pre1}P 1{post1} where pre1 (post1) is the • let spec(P 1) = (pre1, post1) the specification precondition (postcondition) of the program P 1. A of P 1 and spec(P 2) = (pre2, post2) the new version of this triple written off-card by the specification of P2, programmer is {pre2}P 2{post2} (a target triple). The DIFF file is performed with P1 and P2 and then sent We say that P 2′ is a successfully verified update of to the card to perform update operations, meaning, P 1 if and only if: verif ication(spec(P 2), spec(P 2′ )) obtaining a new bytecode and a new spacification. succeeds where spec(P 2′ ) is obtained by predicate Our goal is to establish that the target triple and the transformation on f (P 2′ ) starting from post2. obtained triple match. 7.2. Weakest precondition calculus 7.1. Interpretation of the update In this section, we define a bytecode update In order to formally define our update interpreter, logic in terms of a weakest precondition calculus. we need to define some notions. In this in- The proposed weakest precondition (WP) considers terpretation, a state is modeled by a 3-tuple:< that each (update) instruction has a precondition. Heap, F rame, Stack − F rame > which represents An instruction with its precondition is called an •• Table 4: Defining rules for weakest precondition calculus for update operations wp( Add instr(pop,i)) = (shif t exp2 (@Ei )) wp( Add instr(store x,i)) = shif t exp2 (@Ei )(S(0)/x) wp(Add instr(if L,i)) = ((S(0) = 0) ⇒ shif t exp2 (EL )) ∧ ((S(0) ̸= 0) ⇒ shif t exp2 (@Ei )) wp(Add instr(load x,i)) = unshif t exp(shif t exp(@Ei ))(x/S(0)) wp(Add instr(const a,i)) = unshif t exp(shif t exp(@Ei ))(a/S(0)) wp(Add instr (new A,i)) = unshif t exp(shif t exp(@Ei [create(H, A)/S(0), A :: H/H]) wp(Add instr(add,i) = (shif t exp2 (@Ei ))[(s(1) + S(0))/S(1)] wp(Add instr(neg,i) = (unshif t exp(@Ei ))[−S(0)/S(0)] wp(Add instr (getfield a f t,i) ) = shif t exp(@Ei [(val(S(0), (a, f )))/S(0)]) ∧ S(0) ̸= null wp(Add instr(putfield a f t,i)) = (shif t exp3 (@Ei ))[H((S(0), (a, f )) := S(1))/H] ∧ S(0) ̸= null wp(goto l1) = shif t exp(El1 ) instruction specification and is noted as: Ei : Ii becomes the postcondition of the inserted instruction where Ii is the instruction and the expression Ei and thus the calculated precondition starts from its specification. This notation expresses that the this old postcondition (@Ei ). The function shif t exp precondition Ei holds when the program pointer is is used twice (shif t exp2 ) to express also the at the program counter i. Table 4 shows the calculus impact due to the insertion of the instruction on the of the WP rules for the update operations (inserting specifications of the following instructions. instructions). The instructions new, putf ield and getf ield are heap Functions and notations used. The functions manipulating instructions. The function create used shif t exp and unshif t exp are used to express: in the instruction new A returns a new object of the effect of pushing (popping) elements to (from) type A in the heap H. This obtained heap (A :: H) the stack S and the effect of shifting an expression replaces the old heap. The function val used in the regarding to the stack elements due to the insertion definition of getfield to get the value of the field f of of instructions. They are defined as follows: the class a from the address (top of the stack). This value is then pushed on the stack. In putf ield, the shif t exp(Exp) = Exp[s(i + 1)/s(i) f orall i ∈ N ] value of the field designated by the top of the stack unshif t exp = shif t exp−1 is updated with the value at the second elements of the stack. The insertion of this instruction which pops The elements of the stack are represented by two values implies three applications of shif t exp. positive integers, the top stack is 0. The symbol @ is used to express the old specification associated In order to establish semantical equivalence of a to a position i: when we add an instruction at code written by the programmer and a program position i, the program and the specification are obtained by applying a DIFF file, we check the shifted from i and then a new instruction is inserted. equivalence of the weakest precondition of an Its precondition is calculated with the specification annotated program obtained by WP calculus and a of the instruction that was at position i before the precondition written by the programmer before DIFF update. file is performed. In the rules, for the instructions store x, load x, 7.3. Example and pop, a precondition is obtained, as in Hoare’s assignment (Hoare (1969)) by substituting the right- In order to illustrate how the logic works, we take hand side by the left-hand side in the postcondition. the example of the function abs that returns the The precondition of an instruction store x under a absolute value of an integer taken as argument. postcondition Ei+1 (the precondition of the following This function is then transformed in order to get instruction) is given by: shif t exp(Ei+1 )(S(0)/x) the double of the result in the initial calculus: for meaning that if the expression E holds after the an integer given as argument, the new function execution of store x then it also holds for the top of returns the abstract value multiplied by two (modified the stack before storing it in x. The function shif t exp abs). The specifications of the two functions are is used to express that before the execution of the respectively: instruction, the top of the stack corresponding to the instruction at i + 1 was at index 1. {p = P } abs {(P ≥ 0 → result = P ) ∧ (P < 0 → result = −P )} Inserting an instruction, e.g. store x at line i means that the precondition of the old instruction at i {p = P } modif ied abs {(P ≥ 0 → result = 2 ∗ P ) ∧ (P < 0 → result = −2 ∗ P )} •• In the specification, P denotes the logical value at The authors present a Hoare-style logic combined the entry and result is the result of the function. with instruction specification in term of precondition Figure 6 shows the bytecode of the first version for sequential bytecode. We adopted such instruction (a) and the second version (b) of the described specification in our logic for weakest precondition for function. The part (c) of the figure shows the DIFF update operation. file generated from the two versions. The last part of the figure (d) shows the bytecode of the function abs In some studies, manipulating and analysing byte- annotated with update instructions. We notice that code requires its modelisation in flexible representa- in this bytecode local variables are represented by tions suitable to the manipulation required. In (Puder integers: in load 1 for example, the number 1 means and Lee (2009)), bytecode is represented by XML the local variable 1. The same notation is applied to trees in order to use the technologies supporting other local variables. XML to ease the injection and extraction of bytecode. In (Albert and al. (2007)), bytecode is represented In figure 7, The WP calculus is performed on by clauses written in Prolog to perform verification of the bytecode (without annotation) starting from the bytecode programs. Generally, functional modelisa- postcondition of the new version. The WP calculus tion is used when the goal is to consider programs is applied on the annotated bytecode as shown on as mathematical models whose meaning is inde- figure 8. The specification for the update instructions pendent of runtime states. Therefore, it is possible are in bold. This example shows that we obtain to apply equational rewriting and reasoning to them the same precondition {P = v0} which means (Guodong (2010)) and use several proof systems that at the beginning of the calculus the logical that are built on or uses functional languages in value P is in the first local variable of the function. specifications. This result expresses the equivalence of the two bytecodes according to our definition of verified updated program. 9. CONCLUSION In this paper, we proposed an approach for 8. RELATED WORK formalisation and verification of java bytecode updated programs. Our approach relies on four Several studies have been conducted in order main concepts. We showed first how to use existing to use formal semantics to prevent type errors specification languages for Java and Java bytecode in bytecode. Our work extends the formalism programs to write specification and transform presented in (Freund and Mitchell (1999)). This them. Then, we defined a formal semantics which work defined semantics and a type system to study constitute a formal mean to establish the validity object initialization in bytecode. The original idea of update operations with regard to Java type was developed in (Stata and Abadi (1999)) to safety. We proposed a functional representation of study bytecode subroutines. In (Freund and Mitchell bytecode in order to model the application of update (2003)), the authors extended the work (Freund operations with the use of the notion of bytecode and Mitchell (1999)) to bytecode subroutines, virtual annotation. We presented a predicate transformation method invocation and exceptions. On another calculus based on weakest precondition for update side, using predicate transformation to reason operations to derive a specification for the annotated about bytecode properties has been studied in bytecode and showed how to establish the (Grégoire,Sacchini and Sivan (2008)). The authors correctness of the update. presented a verification condition generator for bytecode formalized in the Coq proof assistant and The approach presented is implemented using based on weakest precondition calculus. Another the OCaml language. Our study started with work using weakest precondition to generate considering the system EmbedDSU but this is verification conditions from an annotated bytecode not restrictive, the framework proposed can be is presented in (Burdy and Pavlova (2006), Burdy et generalised to specification and verification of al. (2007)). updated programs written in languages that are complied to bytecode. The use of the functional Our work is close to ( Freund and Mitchell (1999)) language and representation eases its integration in the sense of the use of static semantics to with existing formal methods. Our immediate future analyze bytecode. The specificity of our work is work is to define WP calculus for instruction the definition of semantics for updates. We use suppression. We plan to define another predicate predicate transformation to reason about bytecode transformation calculus (strongest postcondition) for properties using existing tools for specification and update operation and the integration of our approach proofs. Our bytecode logic for weakest precondtion in an existing formal method supporting verification calculus is inspired by (Bannwart and Müller (2005)). condition generation for functional programs. Figure 6: An example of an annotated bytecode (abs) Figure 7: WP calculus on the modified function Figure 8: WP calculus on an annotated bytecode •• REFERENCES Bannwart, F and Müller, P, (2005) A Program Logic for Bytecode. In Electron. Notes Theor. Comput. Freund, S. N and Mitchell, J. C, (1999) A type Sci.vol 141, Elsevier Science Publishers B. V., system for object initialization in the Java bytecode 2005, pp 255–273. language. In ACM Trans. Program. Lang. Syst., vol 21, pp.1196–1250. Common Criteria,http://www.commmoncriteria.org McGachey, P, Hosking,A. L and Moss, J.E.B, Grégoire, B, Sacchini, J. L and Sivan, R, (2008) (2009) Pervasive Load-Time Transformation for Combining a verification condition generator for Transparently Distributed Java. In Electron. Notes a bytecode language with static analyses. In Theor. Comput. Sci., vol 253, Elsevier Science Proceedings of the 3rd conference on Trustworthy Publishers B. V., pp.47–64. global computing, Springer-Verlag, pp.23–40. Puder, P and Lee, J, (2009) Towards an XML- Binder, W and Hulaas, J, (2005) Java Bytecode based Bytecode Level Transformation Framework. Transformations for Efficient, Portable CPU Ac- In Electron. Notes Theor. Comput. Sci.,vol 253, counting. In Electron. Notes Theor. Comput. Sci., Elsevier Science Publishers B. V., pp.97–111. Elsevier Science Publishers B. V. vol 141, pp.53– 73. Boshernitsan, M and Graham,S. L, (2004) iXj: Inter- active Source-to-source Transformations for Java. Noubissi,A.C, Iguchi-Cartigny, J and Lanet,J. L, In Companion to the 19th Annual ACM SIGPLAN (2011) Hot updates for Java based smart cards. Conference on Object-oriented Programming Sys- In ICDE Workshops, pp.168-173. tems, Languages, and Applications, pp.212–213. Burdy, L and Pavlova,M, (2006) Java bytecode Guodong, L, (2010) Formal verification of programs specification and verification In SAC 2006, and their transformations. PhD thesis, University pp.1835-1839. of Utah, USA. Burdy,L, Huisman, M and Pavlova,M, (2007) Lounas,R, Mezghiche, M and Lanet,J. L, (2012) To- Preliminary Design of BML: A Behavioral Interface wards a General Framework for Formal Reasoning Specification Language for Java Bytecode In about Java Bytecode Transformation In Proceed- FASE 2007, pp.215-229. ings Fourth International Symposium on Symbolic Computation in Software Science, pp.63–73. Freund, S. N and Mitchell, J. C,(2003) A Type System for the Java Bytecode Language and Verifier. In J. Noubissi,A. C, (2011) Mise á jour dynamique et Autom. Reasoning, vol 30, pp.271-321. sécurisée de composants systéme dans une carte á puce. PhD thesis, University of Limoges, France, Hoare,C. A. R, (1969) An Axiomatic Basis for 2011. Computer Programming. In Commun. ACM, vol 12, pp.576-580. Albert, E, Gomez-Zamalloa, M, Hubert, L and Puebla,G, (2007) Verification of Java Bytecode Stata, R and Abadi, M, (1999) A Type System Using Analysis and Transformation of Logic Pro- for Java Bytecode Subroutine In ACM Trans. grams . In Practical Aspects of Declarative Lan- Program. Lang. Syst., vol21, pp.90-137. guages,2007, Springer Berlin Heidelberg,pp.124- 139. Dahm,M, (1999) Byte Code Engineering. InJava- Informations-Tage, pp.267-277. Neamtiu,I, Hicks,M, Stoyle, G and Oriol, M, (2006) Practical Dynamic Software Updating for C.In Sakamoto, T, Sekiguchi, T and Yonezawa, A, (2000) ACM SIGPLAN Conference on Programming Bytecode Transformation for Portable Thread Language Design and Implementation, pages Migration in Java. In ASA/MA, 2000, pp.16-28. 7283, 2006. Bachrach, J and Playford, K, (2001) The Java Gupta, D, Jalote P and Barua, G. A formal Syntactic Extender. In OOPSLA 2001, pp.31-42. framework for online software version change. Software Engineering, IEEE Transactions on, 22 Noubissi,A. C, Iguchi-Cartigny, J and Lanet, J. (2):120131, 1996. L, (2010) Incremental Dynamic Update for Java-Based Smart Cards. In Fifth International Orso,A, Rao, A and Harrold,M. J. A technique Conference on Systems, pp.110-113. for dynamic updating of Java Software. In ICSM, 2002. Burdy,L, Cheon,Y, Cok, D. R, Ernst, M. D, Kiniry,J. R., Leavens,G. T, Leino, K. R. M, and Poll, E. An Hlopko, M, Kurs,J, and Vrany, J. Towards a Runtime Overview of JML Tools and Applications . In Int. J. Code Update in Java an exploration using Softw. Tools Technol. Transf., vol 7, pp. 212–232. STX:LIBJAVA. In proceeding of Dateso 2013.