=Paper=
{{Paper
|id=Vol-1531/paper7
|storemode=property
|title=Optimizing the Symbolic Execution of Communicating and Evolving State Machines
|pdfUrl=https://ceur-ws.org/Vol-1531/paper7.pdf
|volume=Vol-1531
|dblpUrl=https://dblp.org/rec/conf/models/Khalil15a
}}
==Optimizing the Symbolic Execution of Communicating and Evolving State Machines==
Optimizing the Symbolic Execution of
Communicating and Evolving State Machines
Amal Khalil
School of Computing, Queen’s University
Kingston, Ontario, Canada
Email: khalil@cs.queensu.ca
Abstract—This paper describes research investigating two com- This research introduces two complementary optimization
plementary optimization techniques that leverage the similarities techniques that leverage the similarities between two successive
between state machines versions to reduce the cost of symbolic state machine versions to reduce the cost of symbolic execution
execution of the evolved version.
of the evolved version.
I. R ESEARCH P ROBLEM AND M OTIVATION This research is motivated by a number of facts. First,
Model Driven Engineering (MDE) is a model-centric symbolic execution has been shown to be a very powerful
software engineering approach that aims at improving the method for the analysis of programs and there are already
productivity and the quality of software artifacts by focusing several commercial code analysis tools built based on it (e.g.,
on models as first-class artifacts in place of code. MDE has CodeSonar [6]). Similarly, the technique has been adopted
been widely used for over a decade in many domains such and applied in the context of state-based models (e.g., the
as automotive and telecommunication industries. Iterative- IAR visualSTATE - Verificator [7]). Second, there is an
incremental development and model-based analysis are central interest from our industrial partner to improve the model-level
to MDE in which artifacts typically undergo several iterations analysis capabilities of the IBM Rhapsody tool. Third, research
and refinements during their lifetime that may require changes on optimizing the symbolic execution of evolving programs
to their initial design versions. As these models evolve, it has been recently addressed [8, 9], however to the best of
is necessary to assess their quality by repeating the analysis knowledge we are the first to consider such optimizations for
and the verification of these models after every iteration or the symbolic execution of evolving state machines.
refinement. This process, if not optimized, can be very tedious II. R ELATED W ORK
and time consuming.
Although the majority of existing analysis methods for
The IBM Rational Rhapsody framework [1] is one of the
Statecharts-like models (e.g., UML state machines, UML-RT,
MDE commercial tools that is heavily used in practice (e.g.,
STATEMATE and Simulink Stateflow) make use of some
in the automotive industry). Rhapsody Statecharts (also known
existing formal verification tools, mainly model checkers such
as Harel’s Statecharts [2]) are a visual state-based formalism
as SPIN [10, 11] or SMV [12], there are also approaches that
implemented in the IBM Rational Rhapsody framework to
adopt the symbolic execution analysis technique of programs
describe the behavior of reactive systems. They extend Mealy
and use it in the context of state-based models including:
Machines - a type of Finite State Machines (FSMs) that perform
their action only on firing transitions - with state entry and • Modechart specifications (a variation of Harel Statecharts
exit actions and hierarchical composite states with orthogonal that incorporates timing constraints in the models) for test
regions. sequences generation [13];
• Labelled Transition Systems (LTSs) for test case genera-
Symbolic execution is a well-known analysis technique
that systematically explores all possible execution paths of tion [14] and test case selection [15];
• Input Output Symbolic Transition Systems (IOSTSs) [4]
behavioral software artifacts (e.g., programs [3] and state-
based models [4, 5]) using symbolic inputs such that we for test purpose definition;
• STATEMATE statecharts [16] and UML State
can derive precise characterizations of the circumstances in
which a specific path is taken. The output of the analysis is Machines [17] for verifying temporal properties of
a symbolic execution tree (SET) which provides the basis for the subject models;
• Simulink/Stateflows [18] for analysis and test case genera-
various types of analysis and verification, including reachability
analysis, guard analysis, invariant checking and instant test case tion of flight software using Java PathFinder and Symbolic
generation. One of the key challenges of symbolic execution PathFinder;
• UML-RT State Machines [5] to support a variety of
is scalability, especially when applied to big, complex artifacts
where the size of the output SET becomes very large. Repeating analyses for this type of models.
the entire analysis even after small changes is not the best In our work, we followed the approach of Zurowska and Dingel
solution. [5] with some variations to develop a symbolic execution
module for Rhapsody Statecharts. In contrast to their work symbolic execution to implement what we call dependency-
[5], our intermediate machine representation of Rhapsody based symbolic execution (DSE)1 . The dependence analysis
Statecharts takes the form of Mealy Machines instead of allows us to identify the unaffected parts of an evolved artifact;
their Functional Finite State Machines (FFSMs). Additionally, complete exploration of these parts is not necessary, and only a
our symbolic execution module is based on an off-the-shelf single representative path needs to be found during exploration.
symbolic execution engine, KLEE [19], to symbolically execute Consequently, the resulting SET is not complete, but it is
action code encountered in the Statecharts, whereas they use sufficient to determine the impact of the change on any SET-
an in-house symbolic execution engine. based analysis.
In his statement paper “Evolution, Adaptation, and the Quest A well-known example of applications that can benefit from
for Incrementality”, Ghezzi [20] argues that supporting software such optimization techniques is regression testing which is a
evolution requires building incremental methods and tools to crucial software evolution task that aims at ensuring that no
speed up the maintenance process with the focus on the analysis unintended behavior has been introduced to the system after
and the verification activities. An incremental approach in the change. A naive approach to regression testing usually
such contexts would try to characterize exactly what has been depends on re-running some existing test suite which has been
changed and reuse (as much as possible) the results of previous generated to validate the previous version of the system. This
processing steps in the steps that must be rerun after the process is expensive and time consuming and it is usually
change. The motivation for this is twofold: time efficiency and supported with some optimization mechanisms including test
scalability. Given the iterative development approach suggested case selection and prioritization. Since symbolic execution can
by MDD, we believe that this vision needs to be employed by be used to generate these test suites, optimizing the symbolic
modern analysis and verification methods. execution of the new versions of an artifact can also be added
Existing approaches to alleviate the scalability problem to these optimization mechanisms.
and improve the efficiency of symbolic execution when re-
IV. R ESEARCH M ETHOD
applying on an evolving version of a program are discussed
in [9] and [8]. In [9], Yang et al. present memoized symbolic To start, we decided to consider individual state machines
execution of source code - a technique that stores the results only. Under this assumption, we have developed the approach
of symbolic execution from a previous run and reuses them as described in Section IV-A. The extension of the approach
a starting point for the next run of the technique to avoid the to collections of communicating state machines is subject to
re-execution of common paths between the new version of a ongoing work. Our research methodology involved three phases:
program and its previous one, and to speed up the symbolic the design phase; the implementation phase; and the validation
execution for the new version. The same idea has been applied phase. In the sequel, we firstly provide a brief description
earlier by Lauterburg et al. in [21] but in the context of state- of what has been accomplished toward the completion of
space exploration for evolving programs using model checking. our proposed research for individual state machines; a more
In [8], Person et al. introduce DiSE (directed incremental thorough description of this part of the research can be found
symbolic execution) - a technique that uses static analysis and in a technical report [23]. Secondly, we explain our plan for
change impact analysis to determine the differences between extending this work for communicating state machines.
program versions and the impact of these differences on other
locations in the program, and uses this information to direct the A. Accomplished Work for Individual State Machines
symbolic execution to only explore those impacted locations. A Research Design The first phase involves defining the
similar approach has been proposed by Yang et al. in [22] for architecture of our standard symbolic execution of Rhapsody
regression model checking. In contrast to all the aforementioned Statecharts and its optimizations. Figure 1 shows the main
approaches, which work for optimizing the analysis of evolving components of the architecture implementing the standard
programs, our work tries to realize the same concept for symbolic execution (SE) of Rhapsody Statecharts and the two
evolving state machine models. proposed optimization techniques of an evolved Rhapsody
Statechart: the memoization-based symbolic execution (MSE)
III. P ROPOSED S OLUTION
and the dependency-based symbolic execution (DSE). The
Inspired by the research work for optimizing the symbolic components of the architecture are listed below with a brief
execution of evolving programs [8, 9], we propose two different description of the purpose of each component.
optimization techniques that leverage the similarities between • SC2MLM - A transformation that transforms a Statechart
two successive state machine versions to reduce the cost of model into our Mealy-like Machine (MLM) representation.
symbolic execution of the evolved version. The first technique, The basic structure of our MLM formalism consists of a
called memoization-based symbolic execution (MSE), reuses set of global variables (sometimes called attributes), a set
the SET generated from a previous analysis and incrementally of simple states with one of them marked as an initial state,
updates the relevant parts corresponding to the changes made and a set of transitions between these states. Simple states
to the new version. The second technique integrates a change
impact analysis (i.e., dependency analysis) technique with the 1 We could also call it regression or partial symbolic execution.
Fig. 1: The architecture of our standard symbolic execution (SE) of Rhapsody Statecharts and its optimizations (MSE and DSE).
in MLMs do not have entry/exit actions. Transitions are actions of states can be represented by an update of their
characterized in the same way as in Rhapsody Statecharts incoming/outgoing transitions, respectively. Therefore, the
by an event that triggers them, an optional guard and output from this component is a set of updated transitions
an action that occurs upon firing them. More advanced in the modified version of the model T updated including
features that are found in Rhapsody Statecharts such as all transitions that have been added to, deleted from or
composite states, concurrent states, states with entry/exit updated in the modified version of the model.
actions, condition connectors and junction connectors need • MLM2MSET - The main component of our MSE tech-
to be mapped to fit the structure of the MLMs formalism. nique. The three inputs to this component are: 1) the
Our current transformation supports the mapping of these MLM representation of the modified version of the model
specific features. M LM modV er , 2) the set of transitions that have been
• MLM2SET - Our standard symbolic execution module updated in the modified version of the model T updated ,
that traverses an MLM model and symbolically executes and 3) the SET to be reused (i.e., SET baseV er ). Two
the action code encountered in each transition to build the successive tasks are performed by this component. The
model’s symbolic state space. We developed an interface to first task is to load and explore the input SET baseV er in
the KLEE symbolic execution engine [19] to symbolically order to remove all the edges representing any transition
execute the action code of each transition. The result from belonging to the set of updated transitions T updated (note
this interface is a set of variable assignments and path that removing an edge leads to removing the entire subtree
constraints that represent different feasible paths in the rooted at the target node of the edge). The second task is
action code. to re-explore the SET resulting from the previous task to
• SC2MLM DiffMap - A mapping of the differences find all symbolic states representing states with outgoing
between two Rhapsody Statecharts obtained from the transitions belonging to T updated . For each of these
Rhapsody DiffMerge tool into their MLM correspon- symbolic states, we symbolically execute M LM modV er
dences. Currently, we consider only the following types of based on some parameters. These parameters determine
model changes: adding/removing states; adding/removing where the exploration starts (i.e., the symbolic state to
transitions; updating the actions of states; and updating begin the exploration at), which updated transitions to
transitions. All these types of changes can be represented consider when exploring the state representing the start
as changes to transitions in our MLM representation. For symbolic state for the first time, and the set of symbolic
example, adding/removing states can be represented by states that have been previously explored so far to be used
an addition/deletion of transitions connecting these states for the subsumption checking. The resulting SETs are
with other states in the model; also updating the entry/exit then merged with the SET resulting from the previous
task. The output from the component is a memoization- • The second plug-in implements the SC2MLM DiffMap
based SET (MSET) of the modified version of the model component.
M SET modV er that shares all what can be reused from the • The third plug-in realizes our symbolic execution compo-
old tree SET baseV er but also contains the modifications nents: MLM2SET, MLM2MSET and MLM2DSET.
resulting from the SE of the parts of the new version of Research Validation To validate the correctness of the
the model that are found changed. implementation and to measure the effectiveness and the
• Dependency Analysis - A component for computing the
performance of the proposed optimization techniques, we
set of transitions that have an impact or are impacted by decide to run our evaluation on three industrial-sized models
any of the updated transitions found in T updated . The two from the automotive domain. The first model, the Air Quality
inputs to this component are: 1) the MLM representation System (AQS), is a proprietary model that we obtained from
of the modified version of the model M LM modV er and our industrial partner that is responsible for air purification
2) the set of transitions that are found to have been in the vehicle’s cabin. The second and the third models, the
updated in the modified version of the model T updated . Lane Guide System (LGS) and the Adaptive Cruise Control
In this component, we first explore the input MLM System (ACCS), are non-proprietary models designed at the
model to compute all the dependences that exist between University of Waterloo [26]. The LGS is an automotive feature
its transitions, then identify the transitions that have a used to assist drivers in avoiding unintentional lane departure
dependency with any of the input updated transitions. by providing alerts when certain events occur. The ACCS is an
The output is the union of the input updated transitions automotive feature used to automatically maintain the speed
set and their dependences. We developed the algorithms of a vehicle set by the driver through the automatic operation
implementing the definitions presented in [24, 25] for of the vehicle. The composite states in the three models are
computing the dependencies in conventional Extended nested up to the third level. Additionally, the third one has
Finite State Machines (EFSMs), which can also be applied a composite state with two orthogonal regions. The goal of
to our MLMs. such an evaluation is to answer the following three research
• MLM2DSET - The main component of our DSE tech-
questions:
nique. The two inputs to this component are: 1) the
MLM representation of the modified version of the RQ1. How effective are our optimizations (i.e., how much
model M LM modV er
and 2) the set of transitions that are do they reduce the resource requirements of the
found to have been updated or impacted in the modified symbolic execution of a changed state machine model)
version of the model T updated
+T impacted
. The main task compared to standard SE of a changed state machine
performed by this component is similar to the standard model?
SE technique except that the state-space exploration of RQ2. Does the SET generated from MSE match the one
the input model M LM modV er
is guided by the input set generated from standard SE?
of updated and impacted transitions T updated
+T impacted
. RQ3. What aspects influence the effectiveness of each
Two modes of exploration are defined: full and partial. technique?
A full exploration mode requires a complete exploration For RQ1, we consider the following three correlated
of all execution paths of an explored transition and is variables: 1) the time taken to run each technique, 2) the
applied for all transitions that are found to have been number of symbolic states and 3) the number of execution
updated or impacted. However, a partial exploration mode paths in the resulting SETs. For RQ2, we compare the total
requires the execution of only one representative path of number of symbolic states in the SETs resulting from standard
an explored transition and is applied to transitions that SE and MSE. We also perform manual inspection of a subset
are found neither updated nor impacted. The output from of the two SETs to ensure their equivalence. For RQ3, we
the component is a dependency-based SET (DSET) of the consider the following two aspects: 1) change impact and 2)
modified version of the model DSET modV er that can model characteristics. To measure the impact of the change, we
be smaller/equal in size to the standard SET of the same define a change impact metric (CIM) as the percentage of the
model, depending on the amount of savings gained from maximal paths of the MLM model involving the change. The
the partial exploration of the input list of updated/impacted lower the value of this metric is, the lower the impact of the
transitions. change on the model and the higher the opportunity to benefit
from a previous analysis results, and vise versa. For model
Research Implementation The second phase is the develop-
characteristics, we identify the following two features that
ment of the architecture designed in the first phase. Our current
appear likely to have an impact on the effectiveness of DSE: 1)
implementation consists of three Eclipse plug-ins, summarized
the number of transitions in the subject model with multi-path
as follows.
guards or action code and 2) the number of transitions that have
• The first plug-in implements the SC2MLM and the “De- a dependency with other transitions in the model. We speculate
pendency Analysis” components and it has been developed that the first aspect has more influence on the effectiveness of
in the context of the IBM Rhapsody RulesComposer MSE, whereas the second aspect impacts the effectiveness of
Eclipse framework. DSE.
B. Ongoing Work for Communicating State Machines [6] G. CodeSonar, “GRAMMATECH CodeSonar,
http://www.grammatech.com/codesonar/.”
The results of applying our proposed approaches on indi-
[7] I. visualSTATEl, “IAR visualSTATE, https://www.iar.com/iar-embedded-
vidual evolving state machines showed a significant reduction workbench/add-ons-and-integrations/visualstate/.”
in the resources required to symbolically execute a modified [8] S. Person, G. Yang, N. Rungta, and S. Khurshid, “Directed incremental
version of a state machine model compared with the standard symbolic execution,” in ACM SIGPLAN Notices, vol. 46, no. 6. ACM,
2011, pp. 504–515.
method. This motivates us to extend our approaches to handle a [9] G. Yang, C. S. Păsăreanu, and S. Khurshid, “Memoized symbolic
system of asynchronously communicating state machines. This execution,” in Proceedings of the 2012 International Symposium on
requires us to extend the functionalities of each of the compo- Software Testing and Analysis. ACM, 2012, pp. 144–154.
[10] E. Mikk, Y. Lakhnech, M. Siegel, and G. J. Holzmann, “Implementing
nents in Figure 1 to handle a system of communicating state Statecharts in PROMELA/SPIN,” in 2nd IEEE Workshop on Industrial
machines. Examples of such functionalities include, but are not Strength Formal Specification Techniques, 1998. Proceedings. IEEE,
limited to, extending our current MLM formalism to represent 1998, pp. 90–101.
[11] D. Latella, I. Majzik, and M. Massink, “Automatic verification of a
a system of communicating MLMs (CMLMs); updating our behavioural subset of UML statechart diagrams using the spin model-
current SC2MLM component to map the information related to checker,” Formal Aspects of Computing, vol. 11, no. 6, pp. 637–664,
the communication topology of a communicating state machine 1999.
into their corresponding elements in our extended CMLMs [12] E. Clarke and W. Heinle, “Modular translation of Statecharts to SMV,”
Carnegie Mellon University, Pittsburgh, PA, Tech. Rep., 2000.
formalism; building an algorithm for composing the individual [13] N. H. Lee and S. D. Cha, “Generating test sequences using symbolic
MLMs corresponding to each individual state machine resulting execution for event-driven real-time systems,” Microprocessors and
from our SC2MLM component; and updating our dependency Microsystems, vol. 27, no. 10, pp. 523–531, 2003.
[14] L. Frantzen, J. Tretmans, and T. A. Willemse, Test generation based on
analysis component to consider the communication dependency symbolic specifications. Springer, 2005.
between the communicating state machines. [15] T. Jéron, “Symbolic model-based test selection,” Electronic Notes in
Theoretical Computer Science, vol. 240, pp. 167–184, 2009.
V. C URRENT S TATUS AND E XPECTED C ONTRIBUTIONS [16] A. Thums, G. Schellhorn, F. Ortmeier, and W. Reif, “Interactive
verification of statecharts,” in Integration of Software Specification
The part of the proposed framework considering individual Techniques for Applications in Engineering. Springer, 2004, pp. 355–
state machines has been already developed and evaluated [27]. 373.
[17] M. Balser, S. Bäumler, A. Knapp, W. Reif, and A. Thums, “Interactive
Currently, we work on the part considering communicating verification of UML state machines,” in Formal methods and software
state machines. engineering. Springer, 2004, pp. 434–448.
[18] C. S. Pasareanu, J. Schumann, P. Mehlitz, M. Lowry, G. Karsai, H. Nine,
The proposed research aims to improve the current state- and S. Neema, “Model based analysis and test generation for flight
of-the-art in the area of model-based analysis in an evolu- software,” in Third IEEE International Conference on Space Mission
tionary software development environment. Our contributions Challenges for Information Technology, 2009. SMC-IT 2009. IEEE,
2009, pp. 83–90.
specifically include (1) the development, formalization and
[19] C. Cadar, D. Dunbar, and D. R. Engler, “Klee: Unassisted and automatic
proof-of-concept implementation of the proposed optimization generation of high-coverage tests for complex systems programs.” in
techniques mentioned in Section IV, (2) an evaluation that OSDI, vol. 8, 2008, pp. 209–224.
will provide the results showing the benefits of our research [20] C. Ghezzi, “Evolution, adaptation, and the quest for incrementality,”
in Large-Scale Complex IT Systems. Development, Operation and
methodology and (3) an actual example of research that can Management. Springer, 2012, pp. 369–379.
enrich the analysis capabilities of existing MDE tools. [21] S. Lauterburg, A. Sobeih, D. Marinov, and M. Viswanathan, “Incremental
state-space exploration for programs with dynamically allocated data,” in
ACKNOWLEGMENT Proceedings of the 30th international conference on Software engineering.
ACM, 2008, pp. 291–300.
This work is supervised by Dr. Juergen Dingel and was [22] G. Yang, M. B. Dwyer, and G. Rothermel, “Regression model checking,”
partially funded by NSERC (Canada), as part of the NECSIS in IEEE International Conference on Software Maintenance (ICSM 2009).
IEEE, 2009, pp. 115–124.
Automotive Partnership with General Motors, IBM Canada [23] A. Khalil and J. Dingel, “Incremental Symbolic Execution of Evolving
and Malina Software Corp. State Machines using Memoization and Dependence Analysis,” Queen’s
University, Tech. Rep. 2015-623, pages 1-42, 2015.
R EFERENCES [24] K. Androutsopoulos, D. Clark, M. Harman, Z. Li, and L. Tratt,
“Control dependence for extended finite state machines,” in Fundamental
[1] I. Rational, “Rhapsody System Designer, http://www- Approaches to Software Engineering. Springer, 2009, pp. 216–230.
01.ibm.com/software/awdtools/rhapsody/.” [25] V. Chimisliu and F. Wotawa, “Improving test case generation from uml
[2] D. Harel, “Statecharts: A visual formalism for complex systems,” Science statecharts by using control, data and communication dependencies,” in
of Computer Programming, vol. 8, no. 3, pp. 231–274, 1987. 13th International Conference on Quality Software (QSIC). IEEE, 2013,
[3] J. C. King, “Symbolic execution and program testing,” Communications pp. 125–134.
of the ACM, vol. 19, no. 7, pp. 385–394, 1976. [26] A. L. J. Dominguez, “Detection of feature interactions in automotive
[4] C. Gaston, P. Le Gall, N. Rapin, and A. Touil, “Symbolic execution active safety features,” Ph.D. dissertation, University of Waterloo, 2012.
techniques for test purpose definition,” in Testing of Communicating [27] A. Khalil and J. Dingel, “Incremental Symbolic Execution of Evolving
Systems. Springer, 2006, pp. 1–18. State Machines,” in 18th International Conference on Model Driven
[5] K. Zurowska and J. Dingel, “Symbolic execution of UML-RT state Engineering Languages and Systems (MODELS). ACM/IEEE, 2015, p.
machines,” in Proceedings of the 27th Annual ACM Symposium on to appear.
Applied Computing. ACM, 2012, pp. 1292–1299.