Transformation of Finite State Automata
                  to Regular Expressions using Henshin

                                                      Daniel Strüber
                                            University of Koblenz and Landau
                                                strueber@uni-koblenz.de


                                                         Abstract
                       We present a solution to the State Elimination Case of the Transfor-
                       mation Tool Contest 2017, based on the Henshin model transformation
                       language. The main task is to convert a finite state automaton (FSA)
                       into a regular expression; two extensions include the simplification of
                       FSAs as well as the conversion of probabilistic FSAs. The distinguish-
                       ing feature of our solution is its largely declarative specification, based
                       on Henshin’s concepts of rules and composite units for specifying mod-
                       ifications and control flow. We present the results of the performance
                       and scalability evaluation from the provided benchmark suite. Similar
                       to the reference solution, our solution did not scale up to all test models
                       in the benchmark suite; yet, compared to this solution, it achieved a
                       speed-up by two orders of magnitude for some of the larger models.


1    Introduction.
Finite state automata (FSAs) are a modeling concept with many practical applications, including program
analysis, pattern matching, speech recognition and behavioral software modeling. In its basic form, a FSA
comprises a set of states and a set of labelled transitions between the states; some of the states are designated as
initial and final states. An important feature of FSAs is their fundamental relationship to regular expressions.
While converting a regular expression into a corresponding FSA is easy, the opposite task is more sophisticated;
current solutions suffer from nonlinear algorithmic complexity [GVP+ 17].
   The TTC 2017 state elimination case [GVP+ 17] aims to study how transformation tools may contribute to
more efficient solutions. The case description specifies three tasks—a main task and two extensions—dealing
with three kinds of FSAs: (i) basic FSAs with at least one initial and at least one final state, (ii) simple FSAs
with exactly one initial and exactly one final state, and (iii) probabilistic FSAs, in which transitions are annotated
with probabilities. Simple FSAs are a subset of basic ones, whereas probabilistic FSAs generalize basic ones.
The main task is to transform a simple FSA to a regular expression. Extension 1 involves transforming a basic
to a simple FSA, and extension 2 deals with transforming a probabilistic FSA to a stochastic regular expression.
   In this paper, we present a complete solution based on the Henshin model transformation language [ABJ+ 10].
Henshin is a graph-based transformation language providing support for the declarative specification of in-place
model transformations. The basic features of Henshin’s tool set are a suite of editors and an interpreter kernel;
more sophisticated features include code generation for parallel graph pattern matching and support for various
transformation analyses.

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: Antonio Garcia-Dominguez, Georg Hinkel, Filip Křikava (eds.): Proceedings of the 10th Transformation Tool Contest, Marburg,
Germany, 21-07-2017, published at http://ceur-ws.org
Figure 1: Solution for main task: state elimination in FSAs.
   A distinguishing feature of Henshin is its expressive visual syntax which aims to facilitate usability during
transformation development [SBG+ 17]. To provide a largely declarative solution, we use Henshin’s concepts for
the specification of control flow and model modifications. Control flow is specified using composite units that
orchestrate the execution of a set of sub-units and rules, for example, by applying them in sequential order or in
a counted loop. Model modifications are specified using rules, expressing basic match-and-change patterns.

2     Solution
In what follows, we present our solution in detail. We start with the main task and continue with the two
extension tasks. The solution is available via GitHub at https://github.com/dstrueber/stateelim-henshin.
The URL of the associated SHARE image is made available on the GitHub site.

2.1     Main Task: Converting FSAs to Regular Expressions
Our solution to the main task follows the state elimination algorithm presented in [GVP+ 17]. We implemented
this algorithm using 8 rules, which are orchestrated in a control flow of 6 composite units, as shown in Fig. 1.
As required by the specification, the produced result is a string representing the output regular expression. On
the way to this result, we manipulate state labels using string operations, thus enabling a compact specification.
    The overall goal is to remove all non-initial and non-final states from the FSA. To achieve this goal, our
starting point is the loop unit main which executes its inner unit eliminateNode as often as possible. The aim of
eliminateNode is twofold: first, it checks whether any state is to be removed and, if this is the case, identifies it.
This is done using the rule hasRemaining, a simple rule for matching a non-initial and non-final state. Second,
if such state is found, it is stored in the parameter cur (for current) and passed to the sequential unit eliminate.
The eliminate unit proceeds in the following steps: first it “fixes” the incoming and outgoing transitions of state
cur by replacing them, then it deletes cur, and finally, it unifies any redundant transitions arising in the process.
    To fix the transitions, conditional unit fixTransitions first checks if cur is a looped state, that is, a state with
a loop transition. Depending on the result, one of the rules fixTransitionsUnlooped and fixTransitionsLooped is
triggered, which only differ in their treatment of the loop. Both rules make use of rule-nesting: they have a
kernel rule—the “flat” states in the visual syntax—and a multi-rule—the “layered” states with asterisk signs.
When applying either rule to the input model, the kernel rule is matched first, and then, based on the identified
match, the multi-rule is applied with a for-all semantics, that is, as often as possible. Based on the provided
state cur, all pairs of incoming and outgoing transitions are identified, and a new transition is created for each
of these pairs. The label of the newly created transition is composed of the labels of the pair (plus in the loop
case, the loop label), which are stored and propagated using the variables a, b (and l ). Rule delete works in a
similar manner by deleting state cur via the kernel rule, and its adjacent transitions via separate multi-rules.
    As a result of the fixTransitions step, the model may temporarily contain pairs of states with multiple transi-
tions. These redundant transitions are now unified, using separate loop units for cases where the transitions are
non-loops and loops, respectively. Each of these units calls a rule which nondeterministically identifies redundant
pair of transitions, removes one of the transitions, and joins its label with the label of the remaining transition.
    The main unit terminates when there are no re-
                                                               SequentialUnit main           ConditionalUnit simplifyInitial       ConditionalUnit simplifyFinal
maining nodes to be eliminated. To escape from the
                                                                                                if                                     if
loop, a break rule is required, which specifies an im-                                             hasMultipleInitialStates                hasMultipleFinalStates
                                                                  simplifyInitial
possible pattern (here, a node that is initial and final                                                then                                    then       else
                                                                                                                     else
                                                                  simplifyFinal                                                   createUnifyingFinalState
at the same time), so it always evaluates to false. Af-                                     createUnifyingInitialState

ter the whole process, there is only the initial and the                                                              noop                                 noop

final state left, possibly with various transitions be-        Rule hasMultipleFinalStates                  Rule hasMultipleInitialStates                 Rule noop
tween them. To obtain the final regular expression in         «preserve»        «preserve»                «preserve»         «preserve»
                                                              :State            :State                    :State             :State
a concise manner, a short Java routine puts together             isFinal=true      isFinal=true              isInitial=true     isInitial=true
their labels according to Listing 4 in [GVP+ 17].
                                                                                    Rule createUnifyingInitialState                   Rule createUnifyingFinalState
                                                                                   «create»            «create»   «preserve»       «create»           «create» «preserve»
2.2     Extension 1: FSA simplification.                                           :State             states      :TransitionGraph :State                       :TransitionGraph
                                                                                                                                                       states
                                                                                      isInitial=true                                   isFinal=true
                                                                                                           «create*» «preserve*»                                           «preserve*»
                                                                                   «create*»              transitions              «create*» «create*» transitions
The conversion of a basic FSA into a simple one                                                                           states     target                                  states
                                                                                    source
                                                                                                  «create*»                                         «create*» «preserve*»
                                                                                   «create*»              «preserve*»              «create*»
involves two steps: first, if there are multiple initial                                          target :State                    :Transition       source :State
                                                                                   :Transition
states, these states become non-initial states with an                                                                                 label="eps"               isFinal=true->false
                                                                                      label="eps"            isInitial=true->false
incoming -transition from a newly created unique
                                                                                  Figure 2: Solution for extension 1: Simplifying a FSA.
initial state. Second, if there are multiple final states, these states become non-final states with an outgoing
-transition to a newly created final initial state. Our solution represents these steps via two sub-units simpli-
fyInitial and simplifyFinal, which are applied in sequential order using a sequential main unit.
   The simplifyInitial unit uses a rule hasMultipleInitialStates to check if the simplification treatment becomes
necessary, and, if, this is the case, performs the simplification using the rule createUnifyingInitialState. Specif-
ically, rule hasMultipleInitialStates checks whether two separate initial states exist in the input FSA. Rule
createUnifyingInitialState again uses the concept of rule-nesting to achieve a for-all semantics. The kernel rule
specifies the creation of an additional initial state. The multi-rule matches all earlier initialState, turns them
into non-initial states and creates an incoming -transition for each of them.
   The treatment of final states based on unit simplifyFinal is completely dual.

2.3   Extension 2: Converting probabilistic automata.
The solution for extension 2 extends the solution for the main task with small modifications concerning the treat-
ment of labels and probabilities. Since the original case description did not specify how the labels and probabilities
are to be computed (in fact, the probabilities in the computed regular expression are ignored in the correctness
evaluation), we followed the specification of the reference implementation (https://github.com/sinemgetir/
state-elimination-mt/). Details about the reference implementation’s specification were obtained in our com-
munication with the case authors.
   According to this specification, the conversion process includes a preprocessing of the input automaton where
the probabilities of the outgoing transitions for each state are recalculated, such that (i) loop and  transitions
are ignored, (ii) the probabilities of the other transitions are added up to derive “the new 100%”, and (iii) each
transition obtains its original probability divided by “the new 100%” as its new probability. Fig. 3 shows our
transformation for implementing this specification: Unit recalculateProbs specifies that its two contained rules
are applied in sequential order. Rule recalculateProbsTemps uses rule nesting to iterate over all outgoing non-
loop/empty/ transitions of all states, so that their probabilities are stored in variable a. The sum of all values
of a is stored in a newly created Trace object, using Henshin’s Aggregations helper class to compute the sum.
Rule recalculateProbsUpdate performs the same iteration as before to update the probabilities of the involved
transitions. Given the old probability b, the new value is b/a, where a is the aggregate percentage stored in the
Trace object. In this process, the Trace objects become obsolete and are consequently deleted.


                             Figure 3: Solution for extension 2: pre-processing step.
   The remaining conversion process is the same except for the label and probabilities handling, where we adhere
to the specification: “when we concatenate two labels we multiply their probabilities. When we ’or’ two labels,
we add their probabilities.”
   For example, in the rule fixTransitionsLooped, for the transi-
tion beingly newly created, the probability needs to account for
the probabilities of the original transitions, and the label needs
to represent the probability of the loop transition. To this end,
we modified this rule as shown in Fig. 3. During the matching
process, we now store the labels as well as the probabilities of
the matched transition in variables ({a, b, l} and {pa, pb, pl}, re-
spectively). In the multi-rule, we then use these variables in the
label and probability of each newly created transition. The label
includes the probability of the loop transition, the probabilities
is made up from the probabilites of the input transitions.
                                                                       Figure 4: Solution for extension 2: excerpt.
3   Evaluation
In this section, we apply the evaluation criteria from the case description to our solution. The solution was
executed on a Windows 10 system (Intel Core i7-5600U, 2.6 GHz; 8 GB of RAM, Java 8, 2 GB Xms). We ran
the experiment with a timeout duration of max. 1 hour per model, following the reference solution.
   Correctness. When applied to the provided test models, our solution produced correct results in all cases.
For the main task and extension 2, we used the provided benchmark framework for verifying our solution. We
slightly extended the framework so it supports the execution against user-specified timeout durations.
   For extension 1, unfortunately, we could not follow the evaluation process described in the case description:
The FSA data structure used by the reference implementation does not support input FSAs with more than one
initial state, rendering it infeasible as a baseline for the correctnes check. Instead, we performed a light-weight
validation by checking if the numbers of states, initial states, final states, and transitions changed as expected.
   Suitability. With our solution, we aimed at providing a primarily declarative solution. We achieved this
goal by specifying all parts of the state elimination algorithm (Listing 2 and 3 in [GVP+ 17]) using Henshin’s
declarative rule and control flow concepts. In addition, our solution includes two minor imperative parts, written
in Java: A driver to trigger the execution of the Henshin interpreter with the specification, and a part to convert
the output of state elimination to an expression (Listing 4 in [GVP+ 17]).
   Performance. Table 1 shows the performance measurements
                                                                        Model         Execution time (sec.)
in comparison to the available data for the reference solution. For
                                                                        (#states)     Reference   Henshin
the three smallest test models, we observed a slow-down. For all of
the remaining models, we observed an increasingly growing speed-        3 2 (26)             0.09         0.21
up, amounting to an order of two magnitudes in the case of the          4 2 (61)             0.14         0.25
largest input model 4 4. A complexity analysis to substantiate          3 3 (69)             0.49         0.29
this observation is left to future work.                                5 2 (141)            3.46         0.78
                                                                        3 4 (147)            4.37         0.77
   Scalability. The largest case where our solution produced a re-
                                                                        3 5 (273)           58.60         2.93
sult within one hour was 4 5, taking 10:12 minutes for 1933 states.
                                                                        4 3 (274)           57.78         2.32
Given more time, we can transform larger models as well, e.g., 5 4
                                                                        6 2 (335)          143.12         3.45
in 120:41 minutes for 4244 states.
                                                                        3 6 (459)          461.64        10.09
                                                                        4 4 (812)         4786.58        48.15
4   Outlook
                                                                          Table 1: Performance measurements.
Conceptually, the state elimination case is a highly interesting sce-
nario for our ongoing work on the variability of model transformations [SRA+ 16, SS16]. It features variability in
two dimensions: variability in the language of the input automata (plain and probabilistic FSAs), and variability
in the solution artifacts of the transformations (repeatedly, we handle a looped case differently from an unlooped
case). We intend to use this scenario as a case study for extending our work towards language-level variability.

References
[ABJ+ 10] Thorsten Arendt, Enrico Biermann, Stefan Jurack, Christian Krause, and Gabriele Taentzer. Henshin:
          advanced concepts and tools for in-place EMF model transformations. Model Driven Engineering
          Languages and Systems (MoDELS), pages 121–135, 2010.
[GVP+ 17] Sinem Getir, Duc Anh Vu, Francois Peverali, Daniel Strüber, and Timo Kehrer. State Elimination
          as Model Transformation Problem. 10th Transformation Tool Contest (TTC), 2017.
[SBG+ 17] Daniel Strüber, Kristopher Born, Kanwal Daud Gill, Raffaela Groner, Timo Kehrer, Manuel Ohrn-
          dorf, and Matthias Tichy. Henshin: A Usability-Focused Framework for EMF Model Transformation
          Development. In International Conference on Graph Transformation (ICGT), pages 196–208, 2017.
[SRA+ 16] Daniel Strüber, Julia Rubin, Thorsten Arendt, Marsha Chechik, Gabriele Taentzer, and Jennifer
          Plöger. RuleMerger: automatic construction of variability-based model transformation rules. In Int.
          Conf. on Fundamental Approaches to Software Engineering (FASE), pages 122–140, 2016.
[SS16]     Daniel Strüber and Stefan Schulz. A tool environment for managing families of model transformation
           rules. In International Conference on Graph Transformation (ICGT), pages 89–101, 2016.