=Paper=
{{Paper
|id=Vol-1531/paper7
|storemode=property
|title=Optimizing the Symbolic Execution of Communicating and Evolving State Machines
|pdfUrl=https://ceur-ws.org/Vol-1531/paper7.pdf
|volume=Vol-1531
|dblpUrl=https://dblp.org/rec/conf/models/Khalil15a
}}
==Optimizing the Symbolic Execution of Communicating and Evolving State Machines==
<pdf width="1500px">https://ceur-ws.org/Vol-1531/paper7.pdf</pdf>
<pre>
          Optimizing the Symbolic Execution of
        Communicating and Evolving State Machines
                                                           Amal Khalil
                                            School of Computing, Queen’s University
                                                   Kingston, Ontario, Canada
                                                  Email: khalil@cs.queensu.ca


   Abstract—This paper describes research investigating two com-        This research introduces two complementary optimization
plementary optimization techniques that leverage the similarities    techniques that leverage the similarities between two successive
between state machines versions to reduce the cost of symbolic       state machine versions to reduce the cost of symbolic execution
execution of the evolved version.
                                                                     of the evolved version.
         I. R ESEARCH P ROBLEM AND M OTIVATION                          This research is motivated by a number of facts. First,
   Model Driven Engineering (MDE) is a model-centric                 symbolic execution has been shown to be a very powerful
software engineering approach that aims at improving the             method for the analysis of programs and there are already
productivity and the quality of software artifacts by focusing       several commercial code analysis tools built based on it (e.g.,
on models as first-class artifacts in place of code. MDE has         CodeSonar [6]). Similarly, the technique has been adopted
been widely used for over a decade in many domains such              and applied in the context of state-based models (e.g., the
as automotive and telecommunication industries. Iterative-           IAR visualSTATE - Verificator [7]). Second, there is an
incremental development and model-based analysis are central         interest from our industrial partner to improve the model-level
to MDE in which artifacts typically undergo several iterations       analysis capabilities of the IBM Rhapsody tool. Third, research
and refinements during their lifetime that may require changes       on optimizing the symbolic execution of evolving programs
to their initial design versions. As these models evolve, it         has been recently addressed [8, 9], however to the best of
is necessary to assess their quality by repeating the analysis       knowledge we are the first to consider such optimizations for
and the verification of these models after every iteration or        the symbolic execution of evolving state machines.
refinement. This process, if not optimized, can be very tedious                          II. R ELATED W ORK
and time consuming.
                                                                        Although the majority of existing analysis methods for
   The IBM Rational Rhapsody framework [1] is one of the
                                                                     Statecharts-like models (e.g., UML state machines, UML-RT,
MDE commercial tools that is heavily used in practice (e.g.,
                                                                     STATEMATE and Simulink Stateflow) make use of some
in the automotive industry). Rhapsody Statecharts (also known
                                                                     existing formal verification tools, mainly model checkers such
as Harel’s Statecharts [2]) are a visual state-based formalism
                                                                     as SPIN [10, 11] or SMV [12], there are also approaches that
implemented in the IBM Rational Rhapsody framework to
                                                                     adopt the symbolic execution analysis technique of programs
describe the behavior of reactive systems. They extend Mealy
                                                                     and use it in the context of state-based models including:
Machines - a type of Finite State Machines (FSMs) that perform
their action only on firing transitions - with state entry and         • Modechart specifications (a variation of Harel Statecharts
exit actions and hierarchical composite states with orthogonal           that incorporates timing constraints in the models) for test
regions.                                                                 sequences generation [13];
                                                                       • Labelled Transition Systems (LTSs) for test case genera-
   Symbolic execution is a well-known analysis technique
that systematically explores all possible execution paths of             tion [14] and test case selection [15];
                                                                       • Input Output Symbolic Transition Systems (IOSTSs) [4]
behavioral software artifacts (e.g., programs [3] and state-
based models [4, 5]) using symbolic inputs such that we                  for test purpose definition;
                                                                       • STATEMATE          statecharts [16] and UML State
can derive precise characterizations of the circumstances in
which a specific path is taken. The output of the analysis is            Machines [17] for verifying temporal properties of
a symbolic execution tree (SET) which provides the basis for             the subject models;
                                                                       • Simulink/Stateflows [18] for analysis and test case genera-
various types of analysis and verification, including reachability
analysis, guard analysis, invariant checking and instant test case       tion of flight software using Java PathFinder and Symbolic
generation. One of the key challenges of symbolic execution              PathFinder;
                                                                       • UML-RT State Machines [5] to support a variety of
is scalability, especially when applied to big, complex artifacts
where the size of the output SET becomes very large. Repeating           analyses for this type of models.
the entire analysis even after small changes is not the best         In our work, we followed the approach of Zurowska and Dingel
solution.                                                            [5] with some variations to develop a symbolic execution
module for Rhapsody Statecharts. In contrast to their work          symbolic execution to implement what we call dependency-
[5], our intermediate machine representation of Rhapsody            based symbolic execution (DSE)1 . The dependence analysis
Statecharts takes the form of Mealy Machines instead of             allows us to identify the unaffected parts of an evolved artifact;
their Functional Finite State Machines (FFSMs). Additionally,       complete exploration of these parts is not necessary, and only a
our symbolic execution module is based on an off-the-shelf          single representative path needs to be found during exploration.
symbolic execution engine, KLEE [19], to symbolically execute       Consequently, the resulting SET is not complete, but it is
action code encountered in the Statecharts, whereas they use        sufficient to determine the impact of the change on any SET-
an in-house symbolic execution engine.                              based analysis.
   In his statement paper “Evolution, Adaptation, and the Quest        A well-known example of applications that can benefit from
for Incrementality”, Ghezzi [20] argues that supporting software    such optimization techniques is regression testing which is a
evolution requires building incremental methods and tools to        crucial software evolution task that aims at ensuring that no
speed up the maintenance process with the focus on the analysis     unintended behavior has been introduced to the system after
and the verification activities. An incremental approach in         the change. A naive approach to regression testing usually
such contexts would try to characterize exactly what has been       depends on re-running some existing test suite which has been
changed and reuse (as much as possible) the results of previous     generated to validate the previous version of the system. This
processing steps in the steps that must be rerun after the          process is expensive and time consuming and it is usually
change. The motivation for this is twofold: time efficiency and     supported with some optimization mechanisms including test
scalability. Given the iterative development approach suggested     case selection and prioritization. Since symbolic execution can
by MDD, we believe that this vision needs to be employed by         be used to generate these test suites, optimizing the symbolic
modern analysis and verification methods.                           execution of the new versions of an artifact can also be added
   Existing approaches to alleviate the scalability problem         to these optimization mechanisms.
and improve the efficiency of symbolic execution when re-
                                                                                         IV. R ESEARCH M ETHOD
applying on an evolving version of a program are discussed
in [9] and [8]. In [9], Yang et al. present memoized symbolic          To start, we decided to consider individual state machines
execution of source code - a technique that stores the results      only. Under this assumption, we have developed the approach
of symbolic execution from a previous run and reuses them as        described in Section IV-A. The extension of the approach
a starting point for the next run of the technique to avoid the     to collections of communicating state machines is subject to
re-execution of common paths between the new version of a           ongoing work. Our research methodology involved three phases:
program and its previous one, and to speed up the symbolic          the design phase; the implementation phase; and the validation
execution for the new version. The same idea has been applied       phase. In the sequel, we firstly provide a brief description
earlier by Lauterburg et al. in [21] but in the context of state-   of what has been accomplished toward the completion of
space exploration for evolving programs using model checking.       our proposed research for individual state machines; a more
In [8], Person et al. introduce DiSE (directed incremental          thorough description of this part of the research can be found
symbolic execution) - a technique that uses static analysis and     in a technical report [23]. Secondly, we explain our plan for
change impact analysis to determine the differences between         extending this work for communicating state machines.
program versions and the impact of these differences on other
locations in the program, and uses this information to direct the   A. Accomplished Work for Individual State Machines
symbolic execution to only explore those impacted locations. A        Research Design The first phase involves defining the
similar approach has been proposed by Yang et al. in [22] for       architecture of our standard symbolic execution of Rhapsody
regression model checking. In contrast to all the aforementioned    Statecharts and its optimizations. Figure 1 shows the main
approaches, which work for optimizing the analysis of evolving      components of the architecture implementing the standard
programs, our work tries to realize the same concept for            symbolic execution (SE) of Rhapsody Statecharts and the two
evolving state machine models.                                      proposed optimization techniques of an evolved Rhapsody
                                                                    Statechart: the memoization-based symbolic execution (MSE)
                 III. P ROPOSED S OLUTION
                                                                    and the dependency-based symbolic execution (DSE). The
   Inspired by the research work for optimizing the symbolic        components of the architecture are listed below with a brief
execution of evolving programs [8, 9], we propose two different     description of the purpose of each component.
optimization techniques that leverage the similarities between        •   SC2MLM - A transformation that transforms a Statechart
two successive state machine versions to reduce the cost of               model into our Mealy-like Machine (MLM) representation.
symbolic execution of the evolved version. The first technique,           The basic structure of our MLM formalism consists of a
called memoization-based symbolic execution (MSE), reuses                 set of global variables (sometimes called attributes), a set
the SET generated from a previous analysis and incrementally              of simple states with one of them marked as an initial state,
updates the relevant parts corresponding to the changes made              and a set of transitions between these states. Simple states
to the new version. The second technique integrates a change
impact analysis (i.e., dependency analysis) technique with the         1 We could also call it regression or partial symbolic execution.
Fig. 1: The architecture of our standard symbolic execution (SE) of Rhapsody Statecharts and its optimizations (MSE and DSE).


    in MLMs do not have entry/exit actions. Transitions are          actions of states can be represented by an update of their
    characterized in the same way as in Rhapsody Statecharts         incoming/outgoing transitions, respectively. Therefore, the
    by an event that triggers them, an optional guard and            output from this component is a set of updated transitions
    an action that occurs upon firing them. More advanced            in the modified version of the model T updated including
    features that are found in Rhapsody Statecharts such as          all transitions that have been added to, deleted from or
    composite states, concurrent states, states with entry/exit      updated in the modified version of the model.
    actions, condition connectors and junction connectors need     • MLM2MSET - The main component of our MSE tech-
    to be mapped to fit the structure of the MLMs formalism.         nique. The three inputs to this component are: 1) the
    Our current transformation supports the mapping of these         MLM representation of the modified version of the model
    specific features.                                               M LM modV er , 2) the set of transitions that have been
  • MLM2SET - Our standard symbolic execution module                 updated in the modified version of the model T updated ,
    that traverses an MLM model and symbolically executes            and 3) the SET to be reused (i.e., SET baseV er ). Two
    the action code encountered in each transition to build the      successive tasks are performed by this component. The
    model’s symbolic state space. We developed an interface to       first task is to load and explore the input SET baseV er in
    the KLEE symbolic execution engine [19] to symbolically          order to remove all the edges representing any transition
    execute the action code of each transition. The result from      belonging to the set of updated transitions T updated (note
    this interface is a set of variable assignments and path         that removing an edge leads to removing the entire subtree
    constraints that represent different feasible paths in the       rooted at the target node of the edge). The second task is
    action code.                                                     to re-explore the SET resulting from the previous task to
  • SC2MLM DiffMap - A mapping of the differences                    find all symbolic states representing states with outgoing
    between two Rhapsody Statecharts obtained from the               transitions belonging to T updated . For each of these
    Rhapsody DiffMerge tool into their MLM correspon-                symbolic states, we symbolically execute M LM modV er
    dences. Currently, we consider only the following types of       based on some parameters. These parameters determine
    model changes: adding/removing states; adding/removing           where the exploration starts (i.e., the symbolic state to
    transitions; updating the actions of states; and updating        begin the exploration at), which updated transitions to
    transitions. All these types of changes can be represented       consider when exploring the state representing the start
    as changes to transitions in our MLM representation. For         symbolic state for the first time, and the set of symbolic
    example, adding/removing states can be represented by            states that have been previously explored so far to be used
    an addition/deletion of transitions connecting these states      for the subsumption checking. The resulting SETs are
    with other states in the model; also updating the entry/exit     then merged with the SET resulting from the previous
     task. The output from the component is a memoization-               • The second plug-in implements the SC2MLM DiffMap
     based SET (MSET) of the modified version of the model                 component.
     M SET modV er that shares all what can be reused from the           • The third plug-in realizes our symbolic execution compo-
     old tree SET baseV er but also contains the modifications             nents: MLM2SET, MLM2MSET and MLM2DSET.
     resulting from the SE of the parts of the new version of            Research Validation To validate the correctness of the
     the model that are found changed.                                implementation and to measure the effectiveness and the
   • Dependency Analysis - A component for computing the
                                                                      performance of the proposed optimization techniques, we
     set of transitions that have an impact or are impacted by decide to run our evaluation on three industrial-sized models
     any of the updated transitions found in T updated . The two from the automotive domain. The first model, the Air Quality
     inputs to this component are: 1) the MLM representation System (AQS), is a proprietary model that we obtained from
     of the modified version of the model M LM modV er and our industrial partner that is responsible for air purification
     2) the set of transitions that are found to have been in the vehicle’s cabin. The second and the third models, the
     updated in the modified version of the model T updated . Lane Guide System (LGS) and the Adaptive Cruise Control
     In this component, we first explore the input MLM System (ACCS), are non-proprietary models designed at the
     model to compute all the dependences that exist between University of Waterloo [26]. The LGS is an automotive feature
     its transitions, then identify the transitions that have a used to assist drivers in avoiding unintentional lane departure
     dependency with any of the input updated transitions. by providing alerts when certain events occur. The ACCS is an
     The output is the union of the input updated transitions automotive feature used to automatically maintain the speed
     set and their dependences. We developed the algorithms of a vehicle set by the driver through the automatic operation
     implementing the definitions presented in [24, 25] for of the vehicle. The composite states in the three models are
     computing the dependencies in conventional Extended nested up to the third level. Additionally, the third one has
     Finite State Machines (EFSMs), which can also be applied a composite state with two orthogonal regions. The goal of
     to our MLMs.                                                     such an evaluation is to answer the following three research
   • MLM2DSET - The main component of our DSE tech-
                                                                      questions:
     nique. The two inputs to this component are: 1) the
     MLM representation of the modified version of the                  RQ1. How effective are our optimizations (i.e., how much
     model M LM     modV er
                             and 2) the set of transitions that are            do they reduce the resource requirements of the
     found to have been updated or impacted in the modified                    symbolic execution of a changed state machine model)
     version of the model T   updated
                                      +T  impacted
                                                    . The main task            compared to standard SE of a changed state machine
     performed by this component is similar to the standard                    model?
     SE technique except that the state-space exploration of            RQ2.   Does the SET generated from MSE match the one
     the input model M LM      modV er
                                       is guided by the input set              generated from standard SE?
     of updated and impacted transitions T     updated
                                                       +T  impacted
                                                                    .   RQ3.   What aspects influence the effectiveness of each
     Two modes of exploration are defined: full and partial.                   technique?
     A full exploration mode requires a complete exploration             For RQ1, we consider the following three correlated
     of all execution paths of an explored transition and is variables: 1) the time taken to run each technique, 2) the
     applied for all transitions that are found to have been number of symbolic states and 3) the number of execution
     updated or impacted. However, a partial exploration mode paths in the resulting SETs. For RQ2, we compare the total
     requires the execution of only one representative path of number of symbolic states in the SETs resulting from standard
     an explored transition and is applied to transitions that SE and MSE. We also perform manual inspection of a subset
     are found neither updated nor impacted. The output from of the two SETs to ensure their equivalence. For RQ3, we
     the component is a dependency-based SET (DSET) of the consider the following two aspects: 1) change impact and 2)
     modified version of the model DSET modV er that can model characteristics. To measure the impact of the change, we
     be smaller/equal in size to the standard SET of the same define a change impact metric (CIM) as the percentage of the
     model, depending on the amount of savings gained from maximal paths of the MLM model involving the change. The
     the partial exploration of the input list of updated/impacted lower the value of this metric is, the lower the impact of the
     transitions.                                                     change on the model and the higher the opportunity to benefit
                                                                      from a previous analysis results, and vise versa. For model
  Research Implementation The second phase is the develop-
                                                                      characteristics, we identify the following two features that
ment of the architecture designed in the first phase. Our current
                                                                      appear likely to have an impact on the effectiveness of DSE: 1)
implementation consists of three Eclipse plug-ins, summarized
                                                                      the number of transitions in the subject model with multi-path
as follows.
                                                                      guards or action code and 2) the number of transitions that have
   • The first plug-in implements the SC2MLM and the “De- a dependency with other transitions in the model. We speculate
     pendency Analysis” components and it has been developed that the first aspect has more influence on the effectiveness of
     in the context of the IBM Rhapsody RulesComposer MSE, whereas the second aspect impacts the effectiveness of
     Eclipse framework.                                               DSE.
B. Ongoing Work for Communicating State Machines                                 [6] G.         CodeSonar,        “GRAMMATECH                      CodeSonar,
                                                                                     http://www.grammatech.com/codesonar/.”
   The results of applying our proposed approaches on indi-
                                                                                 [7] I. visualSTATEl, “IAR visualSTATE, https://www.iar.com/iar-embedded-
vidual evolving state machines showed a significant reduction                        workbench/add-ons-and-integrations/visualstate/.”
in the resources required to symbolically execute a modified                     [8] S. Person, G. Yang, N. Rungta, and S. Khurshid, “Directed incremental
version of a state machine model compared with the standard                          symbolic execution,” in ACM SIGPLAN Notices, vol. 46, no. 6. ACM,
                                                                                     2011, pp. 504–515.
method. This motivates us to extend our approaches to handle a                   [9] G. Yang, C. S. Păsăreanu, and S. Khurshid, “Memoized symbolic
system of asynchronously communicating state machines. This                          execution,” in Proceedings of the 2012 International Symposium on
requires us to extend the functionalities of each of the compo-                      Software Testing and Analysis. ACM, 2012, pp. 144–154.
                                                                                [10] E. Mikk, Y. Lakhnech, M. Siegel, and G. J. Holzmann, “Implementing
nents in Figure 1 to handle a system of communicating state                          Statecharts in PROMELA/SPIN,” in 2nd IEEE Workshop on Industrial
machines. Examples of such functionalities include, but are not                      Strength Formal Specification Techniques, 1998. Proceedings. IEEE,
limited to, extending our current MLM formalism to represent                         1998, pp. 90–101.
                                                                                [11] D. Latella, I. Majzik, and M. Massink, “Automatic verification of a
a system of communicating MLMs (CMLMs); updating our                                 behavioural subset of UML statechart diagrams using the spin model-
current SC2MLM component to map the information related to                           checker,” Formal Aspects of Computing, vol. 11, no. 6, pp. 637–664,
the communication topology of a communicating state machine                          1999.
into their corresponding elements in our extended CMLMs                         [12] E. Clarke and W. Heinle, “Modular translation of Statecharts to SMV,”
                                                                                     Carnegie Mellon University, Pittsburgh, PA, Tech. Rep., 2000.
formalism; building an algorithm for composing the individual                   [13] N. H. Lee and S. D. Cha, “Generating test sequences using symbolic
MLMs corresponding to each individual state machine resulting                        execution for event-driven real-time systems,” Microprocessors and
from our SC2MLM component; and updating our dependency                               Microsystems, vol. 27, no. 10, pp. 523–531, 2003.
                                                                                [14] L. Frantzen, J. Tretmans, and T. A. Willemse, Test generation based on
analysis component to consider the communication dependency                          symbolic specifications. Springer, 2005.
between the communicating state machines.                                       [15] T. Jéron, “Symbolic model-based test selection,” Electronic Notes in
                                                                                     Theoretical Computer Science, vol. 240, pp. 167–184, 2009.
  V. C URRENT S TATUS AND E XPECTED C ONTRIBUTIONS                              [16] A. Thums, G. Schellhorn, F. Ortmeier, and W. Reif, “Interactive
                                                                                     verification of statecharts,” in Integration of Software Specification
   The part of the proposed framework considering individual                         Techniques for Applications in Engineering. Springer, 2004, pp. 355–
state machines has been already developed and evaluated [27].                        373.
                                                                                [17] M. Balser, S. Bäumler, A. Knapp, W. Reif, and A. Thums, “Interactive
Currently, we work on the part considering communicating                             verification of UML state machines,” in Formal methods and software
state machines.                                                                      engineering. Springer, 2004, pp. 434–448.
                                                                                [18] C. S. Pasareanu, J. Schumann, P. Mehlitz, M. Lowry, G. Karsai, H. Nine,
   The proposed research aims to improve the current state-                          and S. Neema, “Model based analysis and test generation for flight
of-the-art in the area of model-based analysis in an evolu-                          software,” in Third IEEE International Conference on Space Mission
tionary software development environment. Our contributions                          Challenges for Information Technology, 2009. SMC-IT 2009. IEEE,
                                                                                     2009, pp. 83–90.
specifically include (1) the development, formalization and
                                                                                [19] C. Cadar, D. Dunbar, and D. R. Engler, “Klee: Unassisted and automatic
proof-of-concept implementation of the proposed optimization                         generation of high-coverage tests for complex systems programs.” in
techniques mentioned in Section IV, (2) an evaluation that                           OSDI, vol. 8, 2008, pp. 209–224.
will provide the results showing the benefits of our research                   [20] C. Ghezzi, “Evolution, adaptation, and the quest for incrementality,”
                                                                                     in Large-Scale Complex IT Systems. Development, Operation and
methodology and (3) an actual example of research that can                           Management. Springer, 2012, pp. 369–379.
enrich the analysis capabilities of existing MDE tools.                         [21] S. Lauterburg, A. Sobeih, D. Marinov, and M. Viswanathan, “Incremental
                                                                                     state-space exploration for programs with dynamically allocated data,” in
                         ACKNOWLEGMENT                                               Proceedings of the 30th international conference on Software engineering.
                                                                                     ACM, 2008, pp. 291–300.
  This work is supervised by Dr. Juergen Dingel and was                         [22] G. Yang, M. B. Dwyer, and G. Rothermel, “Regression model checking,”
partially funded by NSERC (Canada), as part of the NECSIS                            in IEEE International Conference on Software Maintenance (ICSM 2009).
                                                                                     IEEE, 2009, pp. 115–124.
Automotive Partnership with General Motors, IBM Canada                          [23] A. Khalil and J. Dingel, “Incremental Symbolic Execution of Evolving
and Malina Software Corp.                                                            State Machines using Memoization and Dependence Analysis,” Queen’s
                                                                                     University, Tech. Rep. 2015-623, pages 1-42, 2015.
                             R EFERENCES                                        [24] K. Androutsopoulos, D. Clark, M. Harman, Z. Li, and L. Tratt,
                                                                                     “Control dependence for extended finite state machines,” in Fundamental
 [1] I.    Rational,    “Rhapsody        System     Designer,    http://www-         Approaches to Software Engineering. Springer, 2009, pp. 216–230.
     01.ibm.com/software/awdtools/rhapsody/.”                                   [25] V. Chimisliu and F. Wotawa, “Improving test case generation from uml
 [2] D. Harel, “Statecharts: A visual formalism for complex systems,” Science        statecharts by using control, data and communication dependencies,” in
     of Computer Programming, vol. 8, no. 3, pp. 231–274, 1987.                      13th International Conference on Quality Software (QSIC). IEEE, 2013,
 [3] J. C. King, “Symbolic execution and program testing,” Communications            pp. 125–134.
     of the ACM, vol. 19, no. 7, pp. 385–394, 1976.                             [26] A. L. J. Dominguez, “Detection of feature interactions in automotive
 [4] C. Gaston, P. Le Gall, N. Rapin, and A. Touil, “Symbolic execution              active safety features,” Ph.D. dissertation, University of Waterloo, 2012.
     techniques for test purpose definition,” in Testing of Communicating       [27] A. Khalil and J. Dingel, “Incremental Symbolic Execution of Evolving
     Systems. Springer, 2006, pp. 1–18.                                              State Machines,” in 18th International Conference on Model Driven
 [5] K. Zurowska and J. Dingel, “Symbolic execution of UML-RT state                  Engineering Languages and Systems (MODELS). ACM/IEEE, 2015, p.
     machines,” in Proceedings of the 27th Annual ACM Symposium on                   to appear.
     Applied Computing. ACM, 2012, pp. 1292–1299.

</pre>