=Paper= {{Paper |id=Vol-1431/paper7 |storemode=property |title=State Space Reduction Strategie for Model Checking Concurrent C Programs |pdfUrl=https://ceur-ws.org/Vol-1431/paper7.pdf |volume=Vol-1431 |dblpUrl=https://dblp.org/rec/conf/vecos/MethniLHHB15 }} ==State Space Reduction Strategie for Model Checking Concurrent C Programs== https://ceur-ws.org/Vol-1431/paper7.pdf
   State Space Reduction Strategies for Model
        Checking Concurrent C Programs


   Amira Methni, Belgacem Ben Hedia, Matthieu Lemerre,           Serge Haddad               Kamel Barkaoui
              CEA, LIST, Centre de Saclay,                  LSV, ENS Cachan, CNRS         CEDRIC Lab, CNAM,
          PC172, 91191, Gif-sur-Yvette, FRANCE                  & INRIA, France              Paris, France
           {amira.methni,belgacem.ben-hedia,                haddad@lsv.ens-cachan.fr    kamel.barkaoui@cnam.fr
                 matthieu.lemerre}@cea.fr



   Model checking is an effective technique for uncovering subtle errors in concurrent systems. Unfortunately,
   the state space explosion is the main bottleneck in model checking tools. Here we propose a state space
   reduction technique for model checking concurrent programs written in C. The reduction technique consists
   in an analysis phase, which defines an approximate agglomeration predicate. This latter states whether a
   statement can be agglomerated or not. We implement this predicate using a syntactic analysis, as well
   as a semantic analysis based on abstract interpretation. We show the usefulness of using agglomeration
   technique to reduce the state space, as well as to generate an abstract TLA+ specification from a C program.

               Model checking, TLA, State space reduction, Agglomeration predicate.

1. INTRODUCTION                                               concurrent C programs and we base ourselves on
                                                              previous work reported in (Methni et al. 2015).
Model checking is an attractive formal verification           The reduction technique is based on an analysis
technique because it is automatic. It offers extensive        phase, which defines an approximate agglomeration
and thorough coverage by considering all possible             predicate. This latter states whether a statement
behaviors of a system, unlike traditional testing             can be agglomerated or not. We implement this
methods. Given a set of properties expressed in               predicate using a syntactic analysis, as well as a
a temporal logic and a model, model-checking                  semantic analysis based on abstract interpretation
automatically analyzes the state space of the                 of C code. The particularity of our method is that we
model and checks whether the model satisfies the              apply the reduction technique during the generation
properties (Clarke et al. 1999). However, the main            of TLA+ code and by using the abstract interpretation
obstacle of model checking is the state explosion             technique. We show the usefulness of using this
problem and concurrency is a major contributor to             technique to reduce the state space during the
this problem.                                                 verification of C programs, as well as to generate an
                                                              abstract TLA+ specification from a C program.
Many solutions have already been investigated for
reducing the complexity of model checking. For                1.2. Outline
instance, by getting a simpler model from the original
one using abstraction technique (Clarke et al. 1994),         The rest of the paper is organized as follows. We
or by using on-the-fly model checking to eliminate            give an overview of TLA+ in Section 2. Section 3
part of the search to the automaton representing the          presents how we specify the semantics of C in TLA+.
(negation of the) checked property (Fernandez et al.          Section 4 describes the reduction technique and how
1992).                                                        we implement it on C programs. Experimental results
                                                              are presented in Section 5. We discuss related work
1.1. Contribution                                             in Section 6. Section 7 concludes and illustrates
In this paper, we present a state space reduction             future research directions.
technique for model checking concurrent programs
written in a low level language. We apply this                2. OVERVIEW OF TLA
technique to the verification of C programs by an
explicit model checker. We use TLA+ (Lamport                  TLA+ (Lamport 2002) is the specification language
1994) as a formal specification language for our              of the Temporal Logic of Actions (TLA). TLA is
          hf ormulai           ,   hpredicatei | [hactioni]hstate f unctioni | ¬hf ormulai
                                         | hf ormulai ∧ hf ormulai | hf ormulai
          hactioni             ,   boolean valued expression containing constant symbols, variables,
                                                and primed variables
          hpredicatei          ,   hf ormulai with no primed variables | ENABLED hactioni
          hstate f unctioni    ,    nonboolean expression containing constant symbols and variables

                                         Figure 1: TLA syntax (Lamport 2002)


a variant of linear temporal logic introduced by              where:
(Lamport 1994) for specifying and reasoning about
concurrent systems. Readers interested in a more                 • Init is a state predicate describing the possible
detailed presentation of TLA+ can refer to Lamport’s               initial states by assigning values to all system
book (Lamport 2002).                                               variables,

TLA+ specifies a system by describing its possible               • N ext is an action representing the program’s
behaviors. A behavior is an infinite sequence                      next-state relation,
of states. A state is an assignment of values                    • vars is the tuple of all variables,
to variables. A state function is a nonboolean
expression built from constants, variables and                   • F airness is an optional formula representing
constant operators and it assigns a value to each                  weak or strong assumptions about the execu-
state. For example, y + 3 is a state function that                 tion of actions.
assigns to state s the value 3 plus the value
that s assigns to the variable y. An action is a              Formula Spec is true of a behavior σ iff Init is
boolean expression containing constants, variables            true of the first state of σ and every step of σ is
and primed variables (adorned with “0 ” operator).            either a N ext step or a “stuttering step”, in which
Unprimed variables refer to variable values in the            none of the specified variables change their values,
current state and primed variables refer to their             and F airness holds. The behaviors satisfying the
values in the next-state. Thus, an action represents          specification formula given by Equation (1) are the
a relation between old states and new states. A state         ones that represent correct behaviors of the system,
predicate (or predicate for short) is an action with no       where a behavior represents a conceivable history of
primed variables.                                             a universe that may contain the system.

The syntax of TLA is given in Figure 1 (the symbol            The TLA+ formula Spec ⇒ φ is valid when the
, means equal by definition). TLA+ formulas are               model represented by Spec satisfies the property φ,
built up from actions and predicates using boolean            or implements the model φ.
operators (¬ and ∧ and others that can be derived
from these two), quantification over logical variables        TLA+ has an explicit model checker called TLC that
(∀, ∃), and the unary temporal operator  (always) of         can be used to check the validity of safety and
the linear temporal logic (Manna and Pnueli 1992).            liveness properties. TLC handles specifications that
                                                              have the standard form of the formula (1). For this
The predicate “ENABLED A”, where A is an action, is           reason, we only use specification formula of the form
defined to be true in a state s iff there exists some         of Equation (1). TLC requires a configuration file
state t such that the pair of states hs, ti satisfies         which defines the finite-state instance to analyze. It
A. The formula [A]vars , where A is an action and             begins by generating all states satisfying the initial
vars the tuple of all system variables, is equal to           predicate Init. Then, it generates every possible
(A ∨ (vars0 = vars)) where vars0 is the expression            next-state t such that the pair of states hs, ti satisfies
obtained by priming all variables in vars. It asserts         N ext and the F airness constraints, looking for a
that every step (pair of successive states) is either         state where an invariant is violated. Finally, it checks
an A step or else leaves the values of all variables          temporal properties over the state space.
vars unchanged. TLA+ defines the abbreviation
“UNCHANGED vars” to denote that vars0 = vars.
                                                              3. TRANSLATION FROM C TO TLA+
While TLA+ permits a variety of specification styles,
                                                              Our approach to checking a concurrent C program is
the specification that we use is defined by:
                                                              to first translate it into a TLA+ specification, to which
                                                              the TLA+ tools can be applied. In what follows, we
       Spec , Init ∧ [N ext]vars ∧ F airness       (1)       briefly present how we specify the semantics of C in
         1    int x = 3;
         2    int y = 0;
         3
         4  int inc ( int i )
          5 {
          6   int tmp ;
          7   tmp = i +1;
          8   return tmp ;
          9 }
         10
         11 void p1 () {
         12   int a = 1;
         13   x = inc ( a ) ;
         14   a = x;
         15   return ;
         16 }
         17
         18 void p2 () {
         19   int a ;
         20   x = x - 1;
         21   x = 3;
         22   y = 0;
         23 }




                (a) Code C source                               (b) Memory layout

Figure 2: Example of a C code in which one process (with id equals to 1) executes p1() function and the second one
executes p2(). The top of the stack[1] indicates that process 1 is executing the statement with label 6 of inc() function.
U ndef represents an undefined value such as the value of an uninitialized variable.


TLA+ by describing the memory layout considered                  The memory is separated into four areas that do not
and how we model the control flow of a C program.                overlap:

3.1. Memory Layout                                                  • a shared memory called data that stores
                                                                      global (and static) variables. In the example of
C file is parsed and normalized according to CIL                      Figure 2a, the x variable is shared by the two
(C Intermediate Language) (Necula et al. 2002)                        processes.
which transforms complicated constructs of C into
simpler ones. This transformation makes programs                    • a local memory for each process, called
more amenable to analysis and transformation.                         stack and stores local variables and function
According to the Abstract Syntax Tree (AST) of the                    parameters. The memory stack[id] specifies
C program, C2TLA+ generates automatically a TLA+                      the local memory of process id and is
specification according to a set of translation rules                 composed of stack frames. Each stack frame
detailed in our previous work (Methni et al. 2015).                   corresponds to a call to a function. In the
                                                                      example of Figure 2a, stack[1] is composed
In C2TLA+, a concurrent program consists of                           of two stack frames, one of p1() function and
many interleaved sequences of operations called                       one of inc() function. When a function call
processes, corresponding to threads in C. Each                        terminates, its stack frame is removed.
process has a unique identifier id. The set of all
processes is determined by the TLA+ constant                        • a local memory for each process called
P rocSet.                                                             register modeled as a sequence and stores
                                                                      the program counter of each process. The
Figure 2 presents a C program and the content of                      head of this sequence contains the statement
the memory as modeled by C2TLA+. We consider                          being currently executed by the process id.
that the C code is executed by two processes. One
                                                                    • a local memory called ret which contains
process executes p1() function and the other one
                                                                      values to be returned by processes.
executes p2() function.
The memory is modeled in TLA+ by a variable called         is updated by the label value of the successor
memory. It is a record whose four fields represent         statement given by the control flow graph (CFG) of
the four memory areas. The global memory data              the C program (provided by CIL).
behaves like an array of values, whereas stack
and register behave like a FIFO (First In, First           The control flow of the C program in C2TLA+ is
Out) queues. Access to those memory areas is               ensured by the dispatch() function. For the example
addressed using offsets. So, a memory address is           of Figure 2a, this function is defined as follows:
a couple [loc,of f s] of memory location loc, (data or                              ∆
stack area) and an offset of f s in this location. For        dispatch(id, mem) =
instance, Addr x specified in Figure 2b defines the            CASE Head(mem.register[id]).pc = ”inc 6”
memory address of x variable.                                         → inc 6(id, mem)
                                                                    Head(mem.register[id]).pc = ”inc 7”
The main operations that manage the memory are                        → inc 7(id, mem)
load() and store():                                                 Head(mem.register[id]).pc = ”p1 11”
                                                                      → p1 11(id, mem)
   • load() is the function that given the current                  Head(mem.register[id]).pc = ”p1 12”
     state of memory mem and a memory address                         → p1 12(id, mem)
     addr (in the form of [loc, of f s]), returns the               Head(mem.register[id]).pc = ”p1 13”
     value stored at the address addr in the memory                   → p1 13(id, mem)
     mem,                                                           Head(mem.register[id]).pc = ”p1 14”
                                                                      → p1 14(id, mem)
   • store() is the function that given the current
                                                                    Head(mem.register[id]).pc = ”p2 19”
     state of memory mem, a memory address
                                                                      → p2 19(id, mem)
     addr and a value val, returns the new copy of
                                                                    Head(mem.register[id]).pc = ”p2 20”
     the memory after storing the value val at the
                                                                      → p2 20(id, mem)
     memory address addr.
                                                                    Head(mem.register[id]).pc = ”p2 21”
3.2. Specifying the C control-flow                                    → p2 21(id, mem)
                                                                    OTHER → mem
Each C statement i is identified in C2TLA+ by a
label assigned by CIL and is modeled by a TLA+             The dispatch() function calls, according to the
function, noted stmti (), which takes as arguments         value of the pc field contained at the top
the process identifier id and the memory mem,              the process register (determined by the expres-
and returns the new content of the memory after            sion Head(mem.register[id]).pc), the corresponding
executing the statement.                                   TLA+ function to execute, i. e., the C instruction to
                                                           execute.
Each stmti () updates the program counter register
of the process id and may change the content of            The C program is thus simulated by the Spec formula
mem, stack, and/or ret memory areas depending              given by equation (1). The Init predicate specifies
on the type of the statement (assignment, jump             the initial values of the memory and the N ext action
statements, etc.). For instance, the statement             is defined as follows:
on line 20 is translated into the TLA+ function                          ∆

p2 20(id, mem) defined as follows:                                N ext =
                                                                   ∨ ∃ id ∈ P rocSet :
 p2 20(id, mem) =
                  ∆
                                                                        ∧ memory.regsiter[id] 6= hi
               ∆
 LET mcopy = load(id, Addr x, [val 7→ 3]) IN                            ∧ memory 0 = dispatch(id, mem)
  [data 7→ mcopy.mem, stack → mcopy.stack,                         ∨ ∀ id ∈ P rocSet :
  register 7→ [mem.register EXCEPT ![id] =                              ∧ memory.regsiter[id] = hi
  h[pc 7→ h“p2 21”i, f p 7→ Head(mem.register[id]).f p]i                ∧ UNCHANGED memory
         ◦ T ail(mem.register[id]),
  ret 7→ mem.ret]                                          It states that one of the processes that has not
                                                           finished execution (its register[id] is not empty) is
The definition of p2 20() function uses the TLA+           nondeterministically chosen to execute one action
construct LET/IN to define a temporary variable that       until all processes finish execution, i. e., all registers
stores the value of the memory after affecting the         become empty. Executing an action consists in
value 3 to the memory address Addr x. The symbol           calling dispatch() function. For example, when
◦ defines the concatenation operator for TLA+              Head(mem.register[id]).pc equals to ”inc 6”, calling
sequences. Head(s) is a TLA+ function that returns         the function inc 6(id, mem) will update the value of
the head of the sequence s and T ail(s) returns            stack[id] (as tmp is stored in the local memory) as
the tail of the sequence s. Then, the register[id]         well as the top of register[id]. As the register[id] is
                                          Figure 3: Example of a state space


still not empty, the control flow is thus passed to the        are many different ways to implement concurrency
successor statement.                                           synchronization in C. For instance, by using locks
                                                               and semaphores, or by providing low level hard-
The behavior of the C program modeled by the Spec              ware instructions (e. g., test-and-set and compare-
formula can be given in terms of a state transition            and-swap). To support synchronization mechanism,
system.                                                        generated TLA+ specifications by C2TLA+ can be
Definition 1. A state transition system is a 3-tuple           completed with manually written TLA+ specifications
T = (Q, I, δ) given by                                         to provide concurrency primitives and atomic instruc-
                                                               tions. More detailed information about integrating
   • a finite set of states Q,                                 synchronization primitives in TLA+ specifications can
   • a set I ⊆ Q of initial states, specified by the           be found in our previous work (Methni et al. 2015).
     Init predicate,
   • a transition relation δ ⊆ Q × Q that links two            4. APPLYING REDUCTION ON C PROGRAMS
     states. This latter corresponds to satisfying the
     predicate N ext.                                          The process of generating an optimized TLA+
                                                               specification is illustrated in Figure 4. To apply
The state transition system encodes the state space            reduction on C programs, it is necessary to define
of the corresponding TLA+ specification of the C               the agglomeration predicate. The C program is
program.                                                       first analyzed. This analysis phase defines an
                                                               approximate agglomeration predicate which takes
Figure 3 illustrates the state space of the corre-             as argument a C statement and returns true or
sponding TLA+ specification of the C code given in             false depending on whether the statement can be
Figure 2a. It consists in all the possible interleaving        agglomerated or not. This predicate can be safe or
of process execution. In order to simplify, we rep-            unsafe. The meaning of safe predicate depends on
resent only the content of pc field contained at the           how the analysis is performed.
top of the register[1] and register[2] memories. Each
state of the graph matches a valuation of memory                  • The predicate is said safe when the analysis
variable, i. e., its four fields.                                   is a safe approximation. Its definition is as
                                                                    follows:
3.3. Process Synchronization
                                                                       – if the statement does not modify the
All processes interact with each other through the                       shared memory, the predicate returns
shared memory data. Concurrent access to this lat-                       true,
ter is ensured via synchronization mechanism. There
        – if the statement modifies the shared
          memory or it is unknown, it returns false.
     The unknown predicate states that we have no
     idea if the statement modifies the memory or
     not.
   • The predicate is called unsafe when a
     statement is agglomerated and we are not sure
     if it modifies the global memory or not.




                                                            (a) Before reduction         (b) After reduction

                                                          Figure 6: The control flow graph of a C code example


                                                         4.2. Using syntactic and semantic analysis
             Figure 4: Reduction process
                                                         Syntactic analysis
In this Section, we introduce the agglomeration          A syntactic analysis is performed to detect state-
technique by an example. Then, we describe the           ments on which reduction can be applied. Often, C
implementation of a safe agglomeration predicate         functions make use of local variables, and when a
by using a syntactic and semantic analysis on C          statement refers only to local variables, the value
programs. Then, we show the interest of using            for which the statement is executed by a process
an unsafe agglomeration predicate to generate an         cannot change the execution of other processes.
abstract TLA+ specification of a C program.              Furthermore, we assume that statements involving
                                                         local pointer variable cannot be agglomerated as
In what follows, we use the expression agglomer-         they may reference shared memory.
ating TLA+ actions, to designate agglomerating the
corresponding statements in the C program.               Moreover, we consider that jump statements, namely
                                                         goto, break and continue, can be agglomerated
4.1. An introducing example                              with its successor (designated by computing the
                                                         CFG), as they only change the local register of the
As the semantics of a TLA+ is expressed through          process (register[id] in TLA+ specification).
a state transition system, where transitions between
states are ensured by TLA+ actions, the reduction        Semantic analysis
technique consists in agglomerating consecutive          In many scenarios, a concurrent C program could
actions into one atomic action which performs the        contain, in its global memory, data blocks that are
effects of the original ones. The reduction idea based   accessed only by one process at a time. In that case,
on agglomeration has been widely used in Petri Nets      syntactic analysis is insufficient. Therefore, we use
(Haddad and Pradat-Peyre 2006; Berthelot 1986).          a semantic approximation predicate. The C program
                                                         is thus analyzed and an approximation of memory
Figure 5 shows three consecutive states linked by        access is computed using the Mthread Frama-C
two actions x0 = x + 1 and x0 = x + 2. The result        plugin (Mth). This latter provides information about
of this agglomeration (represented by −→), is two        the memory zones that are accessed concurrently
states linked by one atomic action which is the result   by more than one process and those that are not. In
of executing the action x0 = x + 1, then x0 = x + 2.     this case, the agglomeration predicate is safe as the
                                                         analysis is based on an over-approximation of the
            x=2                 x=2                      memory.
                0
               x =x+1
                                                         We consider the example given by Figure 2a.
            x=3             →    x0 = (x + 1) + 2        To illustrate the agglomeration technique on this
                                                         example, we represent the C program by its control
               x0 = x + 2
                                                         flow graph, illustrated by Figure 6a, where each
            x=5                 x=5                      state of the graph represents a C statement and
                                                         edges represent the control flow. After applying the
           Figure 5: Agglomerating actions               syntactic and semantic analysis on this example,
the control flow graph is transformed into a smaller       1 # define BUFFER_SIZE 5
graph, given in Figure 6b. Each state of the               2 mutex m ;
graph corresponds to one statement or a block of           3 sem full = 0 , empty = BUFFER_SIZE ;
                                                           4 int buffer [ BUFFER_SIZE ]; /* the buffer */
statements. As the inc() function uses only local          5 int count ; /* buffer count */
memory, its block definition is combined into one          6
basic block. For p1() function, statement on line          7  void Producer ( int item ) {
11 is agglomerated with statement on line 12 and           8    while ( TRUE ) {
                                                            9     item = rand () ; /* g e n e r a t e a random number */
statement on line 14 is agglomerated with the return       10     P (& empty ) ; /* acquire the empty lock */
instruction.                                               11     P (& m ) ; /* acquire the mutex lock */
                                                           12     if ( count < BUFFER_SIZE ) {
4.3. Generating an abstract specification                  13        buffer [ count ] = item ;
                                                           14        count ++; }
                                                           15     V (& m ) ; /* release the mutex lock */
Agglomerating statements can also be useful to
                                                           16     V (& full ) ; /* signal full */
generate an abstract TLA+ specification of a C             17   }
program. The user can define which C statements            18 }

can be agglomerated using an unsafe predicate. The         19 void Consumer ( void ) {
                                                           20   while ( TRUE ) {
resulting TLA+ specification can be viewed as an           21     int item ;
abstract formal specification of the C program.            22     P (& full ) ; /* acquire the full lock */
                                                           23     P (& m ) ; /* aquire the mutex lock */
As TLA+ is a fragment of LTL\x (Linear Temporal            24     if ( count > 0) {
Logic without the “next operator”), it is well known       25        item = buffer [( count -1) ];
                                                           26        count - -; }
that the equivalence between checking a property           27     V (& m ) ; /* release the mutex lock */
given in LTL\x on an abstract model and checking it        28     V (& empty ) ; /* signal empty */
on the original model is ensured by the preservation       29   }
                                                           30 }
(Goltz et al. 1992).

Example: Agglomerating critical sections                       Figure 7: Example of a producer/consumer model using
Consider the following fragment of C code (Figure 7)           locks
implementing an example of the producer/consumer
model. The two processes share a buffer protected
by a mutex m. The synchronization between                      This property expresses that critical sections cannot
processes is ensured by two semaphores empty and               be executed simultaneously. This property was
full. Mutex and semaphores are implemented as                  verified on the TLA+ specification after reduction.
an integer values and are only accessible through              Thus, we can deduce that the property is also
two atomic operations P() and V().                             verified on the specification generated without the
                                                               reduction technique.
Translating this implementation into a TLA+ spec-
ification and model checking results in verifying              4.4. Integrating the reduction into TLA+
all interleavings of actions between processes. We             specification
define the agglomeration predicate that states that
statements protected by mutex (namely by P() and               In what follows, we show how we implement
V() primitives) can be agglomerated. After reduction,          the reduction on TLA+ specification. As described
the control flow graph of this example is illustrated in       in Section 3, each execution of N ext action
Figure 8b.                                                     corresponds to executing an atomic C statement.
                                                               The reduction in C programs, consists in translating
Therefore, the block statements from line 12 to line           a sequence of C statements into one action instead
14 and that from line 12 to 26 are agglomerated into           of multiple ones. Let i be the identifier of a statement
one state. The state space of the TLA+ specification           and j be the identifier of its successor. To do that, we
generated after agglomerations contains fewer                  generate for each statement i a new function that we
states than the one without agglomerations as the              call stmt longi () defined below.
reduction inside the critical section restricts the                                     ∆

amount of interleaving allowed between processes.               stmt longi (id, mem) =
We define the mutual exclusion property in TLA+ as                            stmt longj (id, dispatch(id, mem))
follows:
                                                               The definition of stmt longi (id, mem) consists in
                           ∆
mut exclusion(lbl1, lbl2) = ((∀ id1, id2 ∈ P rocSet :         calling the function of the successor statement j,
 ∧ (id1 6= id2) ∧ (Head(memory.register[id1]) 6= hi)           noted by stmt longj () and passing as argument the
 ∧(Head(memory.register[id2]) 6= hi)                           memory state returned by dispatch(id, mem).
 ∧(Head(memory.regsiter[id1]).pc = lbl1))
     ⇒ Head(memory.register[id2]).pc 6= lbl2)
                    (a) Before reduction                                         (b) After reduction

         Figure 8: The control flow graph of the Producer() and Consumer() functions before and after reduction


To generate a reduced TLA+ specification, we
iterate over all statements and when the ag-
glomeration predicate returns true for a state-
ment i, its translation consists in calling the func-
tion stmt longj (id, dispatch(id, mem)). Otherwise,                                              ∆
we call dispatch(id, mem) function.                                   dispatch red(id, mem) =
                                                                       CASE Head(mem.st[id]).pc = ”inc 6”
The C program is thus specified by the formula                                   → inc long 6(id, mem)
Spec given by equation (1), except that the N ext                       Head(mem.register[id]).pc = ”inc 7”
action calls a new function dispatch red(), instead of                           → inc long 7(id, mem)
dispatch() function. For the example of figure 2a, the                  Head(mem.register[id]).pc = ”p1 11”
dispatch red() function is defined in Figure 9:                                  → p1 long 11(id, mem)
                                                                        Head(mem.register[id]).pc = ”p1 12”
The dispatch red(id, mem) function calls according                               → p1 long 12(id, mem)
to the value the program counter pc contained at the                    Head(mem.register[id]).pc = ”p1 13”
head of mem.register[id] the corresponding TLA+                                  → p1 long 13(id, mem)
function to execute.                                                    Head(mem.register[id]).pc = ”p1 14”
                                                                                 → p1 long 14(id, mem)
                                                                        Head(mem.register[id]).pc = ”p2 19”
5. EXPERIMENTS
                                                                                 → p2 long 19(id, mem)
The reduction technique is totally automatic and was                    Head(mem.register[id]).pc = ”p2 20”
integrated in C2TLA+ which is a Frama-C (Cuoq                                    → p2 long 20(id, mem)
et al. 2012) plugin, implemented in OCaml. This                         Head(mem.register[id]).pc = ”p2 21”
Section is concerned with our practical experience.                              → p2 long 21(id, mem)
We use the Mthread plugin results and the syntactic                     OTHER → mem
analysis as described in Section 4 to implement our             Figure 9: Example of the dispatch red() function definition
agglomeration technique.

We consider one sequential C program and four
concurrent programs:
                  Table 1: Comparing Model Checking Results with & without Reduction (time in seconds)


                                   Without reduction                   With reduction
        Prorgam      #Proc                                                                          Factor
                                   #St                 #T             #St               #T
        Zunebug         1           389             0.147              2             0.136           99.48

         Dekker         2           173             0.128             70             0.109           59.53
                        2           107              1.37             22             0.131           79.43
        Peterson
                        4        1.080.161           59.2           31.221            4.82           97.10
                        2          2.389             1.91            223              1.67           90.66
         Bakery
                        4       50.515.927           1560          835.355            76.6           98.36
                        4      9.791.509             366           146.106               12          98.5
         Philos
                        5     >619.309.984          25340         4.179.520             352          99.32


   • Zunebug which is a bug in the internal clock               specification without agglomeration, checking this
     driver of Zune 30GB music player. The source               property causes TLC to report an error. This error
     code is taken from (Weimer et al. 2010).                   occurs when the code takes as input the last day
                                                                of a leap year, causing the code to enter into an
   • Lamport’s Bakery and Peterson algorithms ob-               infinite loop. After applying the reduction technique
     tained from (Raynal 2013) and Dekker mu-                   for the zunebug program, the state space size of its
     tual exclusion algorithm presented in (Dijkstra            corresponding TLA+ specification equals 2. This is
     1968).                                                     due to the fact that the program is sequential. Model
   • Dining philosopher problem. We use the                     checking the TLA+ code with the last day of a leap
     solution that appears in Tanenbaum’s book                  year causes the TLC model to report an incorrect
     (Tanenbaum 2007).                                          recursive function definition.

These programs make typical examples for demon-                 For the concurrent programs, the mutual exclusion
strating the strength of the state space reduction.             property has been successfully verified on Peterson,
C2LTA+ takes as input a C program and generates                 Bakery, Dekker and Philosophers benchmarks. As
for each one the corresponding TLA+ specification.              expected, the size of the state space with agglom-
                                                                erations is always smaller than the one without
Using the TLC model checker, we compute the total               agglomerations. For the philosopher example with
number of generated states and we verify a set of               5 processes, the state space without agglomeration
properties on the two specifications.                           takes more than 7 hours to be model checked.
                                                                However, using the reduction technique the specifi-
Results of experiments are shown in Table 1, where              cation is verified in 6 minutes. The reduction factor
#Proc denotes the process number, #St denotes the               in this cas reaches 99.32. The reduction technique
numbers of states and #T denotes the time for model             obtains good results on these benchmarks due to the
checking in seconds. Columns 3 to 6 give information            elimination of some intermediate states.
about the state space generated with and without
applying the reduction technique. The last column
indicates the reduction factor, the ratio between the           6. RELATED WORK
state space generated without reduction and the one
                                                                There are a wealth of research contributions on
after applying the reduction technique.
                                                                formal verification of software as well as techniques
All experimental results were performed on an                   for the reduction of the state space.
Intel Core Pentium i7-2760QM machine with 8
                                                                Program slicing is a technique introduced by
cores (2.40GHz each), with 8Gb of RAM memory.
                                                                (Weiser 1981) for simplifying sequential programs for
For zunebug, one property to verify is program
                                                                debugging and program understanding. It consists
termination, which is a liveness property that we
                                                                in removing from the program features that are
express as follows:
                                                                irrelevant for the property to be verified. Recently,
               ∆
 termination = (Head(memory.register[1]) = hi)                 slicing technique has been used to reduce the state
                                                                space of a system in model checking. It has been
This property asserts that the register of the                  applied to Promela (Millett and Teitelbaum 2000),
program will eventually be empty. For the TLA+                  the input language for the Spin model checker
(Holzmann 1997). The interested reader can refer to      C2TLA+ to translate the semantics of C to the
(Tip 1995) for a detailed description of the different   formal specification language TLA+. This reduction
approaches used in the program slicing.                  technique is based on an analysis phase, which
                                                         defines an approximate agglomeration predicate that
Predicate abstraction (Graf and Saı̈di 1997) is a        states whether a statement can be agglomerated
technique in which a set of predicates over the          or not. We implemented this predicate by applying
programs variables is used to construct an abstract      a syntactic and semantic analysis on C Programs.
program. This technique is being used in SLAM (Ball      We illustrated the effectiveness of applying the
and Rajamani 2002), BLAST (Henzinger et al. 2003)        agglomeration technique to reduce the state space
and MAGIC (Chaki et al. 2004).                           during the verification of C programs and also as
                                                         well as to define an abstract TLA+ specification that
Other approaches perform reduction during explo-         model the behavior of C programs.
ration of the state space of the program. For exam-
ple, partial order reduction (Valmari 1989) is a tech-   We aim to integrate a mechanism for structuring
nique which explores only a representative subset        large TLA+ specifications from C programs using
of the state space of a model. The basic idea is to      a refinement process between different levels of
exploit the commutativity caused by the interleav-       abstraction. Finally, we are planning to apply the
ings of transitions, which result in the same state.     methodology on a critical part of the microkernel of
This technique was first introduced for checking the     the PharOS (Lemerre et al. 2011) real-time operating
absence of deadlock. Subsequently, a number of           system (RTOS).
variants of this technique have been developed and
integrated in verification tools, like Spin (Holzmann
1997) and Verisoft (Godefroid 1997).                     REFERENCES

Although we have mentioned some projects in the          Mthread plugin.        URL http://frama-c.com/
C context, there are also significant works interested     mthread.html.
in model checking the Java language. For example,
                                                         Thomas Ball and Sriram K. Rajamani. The SLAM
JPF (Visser et al. 2003) uses state compression
                                                           project: Debugging System Software via Static
technique to handle big states, partial order and
                                                           Analysis. SIGPLAN Not, 2002.
symmetry reduction, slicing, abstraction and runtime
analysis techniques to reduce the state space.           Gérard Berthelot.   Checking properties of nets
                                                           using transformation. In Advances in Petri Nets
In this work, the state space reduction technique that     1985, Covers the 6th European Workshop on
we propose is closer to that originally introduced by      Applications and Theory in Petri Nets-selected
(Berthelot 1986) in Petri nets formalism. Berthelot        Papers, pages 19–40, London, UK, UK, 1986.
developed a large set of reduction rules for reducing      Springer-Verlag.
the complexity of verification. Extended work has
been proposed by (Haddad and Pradat-Peyre 2006).         Sagar Chaki, Edmund M. Clarke, Alex Groce,
Our work differs from this latter by the fact that the     Somesh Jha, and Helmut Veith.           Modular
model of our TLA+ specification is a state transition      verification of software components in c. IEEE
system and the agglomeration predicate depends             Trans. Software Eng., 30(6):388–402, 2004.
on the analysis of the C program. Our reduction
technique is applied during the generation of TLA+       Edmund M. Clarke, Orna Grumberg, and David E.
code unlike the partial order reduction technique          Long. Model Checking and Abstraction. ACM
which performs reduction during the construction of        Trans. Program. Lang. Syst., 16(5):1512–1542,
the state space. Besides, we use TLA+ as formal            1994.
framework which provides an expressive power to          Edmund M. Clarke, Jr., Orna Grumberg, and
specify the semantics of a programming language            Doron A. Peled. Model Checking. MIT Press,
and can reason about concurrent systems and                Cambridge, MA, USA, 1999.
can express safety and liveness properties unlike
SLAM and BLAST which have limited support for            Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov,
concurrent properties as they only check safety            Virgile Prevosto, Julien Signoles, and Boris
properties.                                                Yakobowski.      Frama-C: A Software Analysis
                                                           Perspective.      In Proceedings of the 10th
                                                           international conference on Software Engineering
7. CONCLUSION AND FUTURE WORK                              and Formal Methods, SEFM’12, pages 233–247,
                                                           Berlin, Heidelberg, 2012. Springer-Verlag. URL
We have proposed a technique to reduce the state
                                                           http://frama-c.com/.
space for model checking C programs. We used
Edsger W. Dijkstra.     Cooperating sequential            TLA+. In Formal Techniques for Safety-Critical
  processes. In F. Genuys, editor, Programming            Systems, volume 476 of Communications in Com-
  Languages: NATO Advanced Study Institute,               puter and Information Science, pages 206–222.
  pages 43–112. Academic Press, 1968.                     Springer, 2015.
Jean-Claude Fernandez, Laurent Mounier, Claude          Lynette I. Millett and Tim Teitelbaum.   Issues
  Jard, and Thierry Jron. On-the-fly verification         in Slicing PROMELA and its Applications to
  of finite transition systems. Formal Methods in         Model Checking, Protocol Understanding, and
  System Design, 1(2-3):251–273, 1992.                    Simulation. International Journal on Software
                                                          Tools for Technology Transfer, 2(4):343–349,
Patrice Godefroid. Model Checking for Programming         2000.
  Languages using VeriSoft. In In Proceeedings
  of the 24th ACM Symposium on Principles of            George C. Necula, Scott Mcpeak, Shree P. Rahul,
  Programming Languages, pages 174–186. ACM              and Westley Weimer. CIL: Intermediate Language
  Press, 1997.                                           and Tools for Analysis and Transformation of
                                                         C Programs.     In International Conference on
Ursula Goltz, Ruurd Kuiper, and Wojciech Penczek.        Compiler Construction, pages 213–228, 2002.
  Propositional Temporal Logics and Equivalences.
  In W.R. Cleaveland, editor, CONCUR ’92, volume        Michel Raynal. Concurrent Programming: Algo-
  630 of Lecture Notes in Computer Science, pages         rithms, Principles, and Foundations. Springer,
  222–236. Springer Berlin Heidelberg, 1992.              Heidelberg, 2013.
Susanne Graf and Hassen Saı̈di. Construction of         Andrew S. Tanenbaum. Modern Operating Systems.
  Abstract State Graphs with PVS. In Proceedings          Prentice Hall Press, Upper Saddle River, NJ, USA,
  of the 9th International Conference on Computer         3rd edition, 2007.
  Aided Verification, CAV ’97, pages 72–83, London,
  UK, UK, 1997. Springer-Verlag.                        Frank Tip. A Survey of Program Slicing Techniques.
                                                          Journal of Programming Languages, 3:121–189,
Serge Haddad and Jean-François Pradat-Peyre.             1995.
  New Efficient Petri Nets Reductions for Parallel
  Programs Verification. Parallel Processing Letters,   Antti Valmari. Stubborn Sets for Reduced State
  16(1):101–116, 2006.                                    Space Generation.        In Proceedings of the
                                                          Tenth International Conference on Application and
Thomas A. Henzinger, Ranjit Jhala, Rupak Majum-           Theory of Petri Nets, pages 1–22, 1989.
  dar, and Gregoire Sutre. Software Verification with
  BLAST. pages 235–239. Springer, 2003.                 Willem Visser, Klaus Havelund, Guillaume Brat,
                                                         Seungjoon Park, and Flavio Lerda.      Model
Gerard J. Holzmann. The Model Checker SPIN.              Checking Programs. Automated Software Engg.,
 IEEE Trans. Software Eng., 23(5):279–295, 1997.         10(2):203–232, April 2003.
Leslie Lamport. The Temporal Logic of Actions.          Westley Weimer, Stephanie Forrest, Claire
  ACM Trans. Program. Lang. Syst., 16(3):872–923,        Le Goues, and ThanhVu Nguyen. Automatic
  1994.                                                  Program Repair with Evolutionary Computation.
                                                         Commun. ACM, 53(5):109–116, May 2010.
Leslie Lamport. Specifying Systems, The TLA+
  Language and Tools for Hardware and Software          Mark Weiser. Program Slicing. In Proceedings of
  Engineers. Addison-Wesley, 2002.                       the 5th International Conference on Software En-
                                                         gineering, ICSE ’81, pages 439–449, Piscataway,
M. Lemerre, E. Ohayon, D. Chabrol, M. Jan, and           NJ, USA, 1981. IEEE Press.
  M-B. Jacques. Method and Tools for Mixed-
  Criticality Real-Time Applications within PharOS.
  In Proceedings of AMICS 2011: 1st International
  Workshop on Architectures and Applications for
  Mixed-Criticality Systems, 2011.
Zohar Manna and Amir Pnueli. The Temporal Logic
  of Reactive and Concurrent Systems. Springer-
  Verlag New York, Inc., New York, NY, USA, 1992.
Amira Methni, Matthieu Lemerre, Belgacem Ben He-
 dia, Serge Haddad, and Kamel Barkaoui. Spec-
 ifying and Verifying Concurrent C Programs with