Transformation of Finite State Automata to Regular Expressions using Henshin Daniel Strüber University of Koblenz and Landau strueber@uni-koblenz.de Abstract We present a solution to the State Elimination Case of the Transfor- mation Tool Contest 2017, based on the Henshin model transformation language. The main task is to convert a finite state automaton (FSA) into a regular expression; two extensions include the simplification of FSAs as well as the conversion of probabilistic FSAs. The distinguish- ing feature of our solution is its largely declarative specification, based on Henshin’s concepts of rules and composite units for specifying mod- ifications and control flow. We present the results of the performance and scalability evaluation from the provided benchmark suite. Similar to the reference solution, our solution did not scale up to all test models in the benchmark suite; yet, compared to this solution, it achieved a speed-up by two orders of magnitude for some of the larger models. 1 Introduction. Finite state automata (FSAs) are a modeling concept with many practical applications, including program analysis, pattern matching, speech recognition and behavioral software modeling. In its basic form, a FSA comprises a set of states and a set of labelled transitions between the states; some of the states are designated as initial and final states. An important feature of FSAs is their fundamental relationship to regular expressions. While converting a regular expression into a corresponding FSA is easy, the opposite task is more sophisticated; current solutions suffer from nonlinear algorithmic complexity [GVP+ 17]. The TTC 2017 state elimination case [GVP+ 17] aims to study how transformation tools may contribute to more efficient solutions. The case description specifies three tasks—a main task and two extensions—dealing with three kinds of FSAs: (i) basic FSAs with at least one initial and at least one final state, (ii) simple FSAs with exactly one initial and exactly one final state, and (iii) probabilistic FSAs, in which transitions are annotated with probabilities. Simple FSAs are a subset of basic ones, whereas probabilistic FSAs generalize basic ones. The main task is to transform a simple FSA to a regular expression. Extension 1 involves transforming a basic to a simple FSA, and extension 2 deals with transforming a probabilistic FSA to a stochastic regular expression. In this paper, we present a complete solution based on the Henshin model transformation language [ABJ+ 10]. Henshin is a graph-based transformation language providing support for the declarative specification of in-place model transformations. The basic features of Henshin’s tool set are a suite of editors and an interpreter kernel; more sophisticated features include code generation for parallel graph pattern matching and support for various transformation analyses. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: Antonio Garcia-Dominguez, Georg Hinkel, Filip Křikava (eds.): Proceedings of the 10th Transformation Tool Contest, Marburg, Germany, 21-07-2017, published at http://ceur-ws.org Figure 1: Solution for main task: state elimination in FSAs. A distinguishing feature of Henshin is its expressive visual syntax which aims to facilitate usability during transformation development [SBG+ 17]. To provide a largely declarative solution, we use Henshin’s concepts for the specification of control flow and model modifications. Control flow is specified using composite units that orchestrate the execution of a set of sub-units and rules, for example, by applying them in sequential order or in a counted loop. Model modifications are specified using rules, expressing basic match-and-change patterns. 2 Solution In what follows, we present our solution in detail. We start with the main task and continue with the two extension tasks. The solution is available via GitHub at https://github.com/dstrueber/stateelim-henshin. The URL of the associated SHARE image is made available on the GitHub site. 2.1 Main Task: Converting FSAs to Regular Expressions Our solution to the main task follows the state elimination algorithm presented in [GVP+ 17]. We implemented this algorithm using 8 rules, which are orchestrated in a control flow of 6 composite units, as shown in Fig. 1. As required by the specification, the produced result is a string representing the output regular expression. On the way to this result, we manipulate state labels using string operations, thus enabling a compact specification. The overall goal is to remove all non-initial and non-final states from the FSA. To achieve this goal, our starting point is the loop unit main which executes its inner unit eliminateNode as often as possible. The aim of eliminateNode is twofold: first, it checks whether any state is to be removed and, if this is the case, identifies it. This is done using the rule hasRemaining, a simple rule for matching a non-initial and non-final state. Second, if such state is found, it is stored in the parameter cur (for current) and passed to the sequential unit eliminate. The eliminate unit proceeds in the following steps: first it “fixes” the incoming and outgoing transitions of state cur by replacing them, then it deletes cur, and finally, it unifies any redundant transitions arising in the process. To fix the transitions, conditional unit fixTransitions first checks if cur is a looped state, that is, a state with a loop transition. Depending on the result, one of the rules fixTransitionsUnlooped and fixTransitionsLooped is triggered, which only differ in their treatment of the loop. Both rules make use of rule-nesting: they have a kernel rule—the “flat” states in the visual syntax—and a multi-rule—the “layered” states with asterisk signs. When applying either rule to the input model, the kernel rule is matched first, and then, based on the identified match, the multi-rule is applied with a for-all semantics, that is, as often as possible. Based on the provided state cur, all pairs of incoming and outgoing transitions are identified, and a new transition is created for each of these pairs. The label of the newly created transition is composed of the labels of the pair (plus in the loop case, the loop label), which are stored and propagated using the variables a, b (and l ). Rule delete works in a similar manner by deleting state cur via the kernel rule, and its adjacent transitions via separate multi-rules. As a result of the fixTransitions step, the model may temporarily contain pairs of states with multiple transi- tions. These redundant transitions are now unified, using separate loop units for cases where the transitions are non-loops and loops, respectively. Each of these units calls a rule which nondeterministically identifies redundant pair of transitions, removes one of the transitions, and joins its label with the label of the remaining transition. The main unit terminates when there are no re- SequentialUnit main ConditionalUnit simplifyInitial ConditionalUnit simplifyFinal maining nodes to be eliminated. To escape from the if if loop, a break rule is required, which specifies an im- hasMultipleInitialStates hasMultipleFinalStates simplifyInitial possible pattern (here, a node that is initial and final then then else else simplifyFinal createUnifyingFinalState at the same time), so it always evaluates to false. Af- createUnifyingInitialState ter the whole process, there is only the initial and the noop noop final state left, possibly with various transitions be- Rule hasMultipleFinalStates Rule hasMultipleInitialStates Rule noop tween them. To obtain the final regular expression in «preserve» «preserve» «preserve» «preserve» :State :State :State :State a concise manner, a short Java routine puts together isFinal=true isFinal=true isInitial=true isInitial=true their labels according to Listing 4 in [GVP+ 17]. Rule createUnifyingInitialState Rule createUnifyingFinalState «create» «create» «preserve» «create» «create» «preserve» 2.2 Extension 1: FSA simplification. :State states :TransitionGraph :State :TransitionGraph states isInitial=true isFinal=true «create*» «preserve*» «preserve*» «create*» transitions «create*» «create*» transitions The conversion of a basic FSA into a simple one states target states source «create*» «create*» «preserve*» «create*» «preserve*» «create*» involves two steps: first, if there are multiple initial target :State :Transition source :State :Transition states, these states become non-initial states with an label="eps" isFinal=true->false label="eps" isInitial=true->false incoming -transition from a newly created unique Figure 2: Solution for extension 1: Simplifying a FSA. initial state. Second, if there are multiple final states, these states become non-final states with an outgoing -transition to a newly created final initial state. Our solution represents these steps via two sub-units simpli- fyInitial and simplifyFinal, which are applied in sequential order using a sequential main unit. The simplifyInitial unit uses a rule hasMultipleInitialStates to check if the simplification treatment becomes necessary, and, if, this is the case, performs the simplification using the rule createUnifyingInitialState. Specif- ically, rule hasMultipleInitialStates checks whether two separate initial states exist in the input FSA. Rule createUnifyingInitialState again uses the concept of rule-nesting to achieve a for-all semantics. The kernel rule specifies the creation of an additional initial state. The multi-rule matches all earlier initialState, turns them into non-initial states and creates an incoming -transition for each of them. The treatment of final states based on unit simplifyFinal is completely dual. 2.3 Extension 2: Converting probabilistic automata. The solution for extension 2 extends the solution for the main task with small modifications concerning the treat- ment of labels and probabilities. Since the original case description did not specify how the labels and probabilities are to be computed (in fact, the probabilities in the computed regular expression are ignored in the correctness evaluation), we followed the specification of the reference implementation (https://github.com/sinemgetir/ state-elimination-mt/). Details about the reference implementation’s specification were obtained in our com- munication with the case authors. According to this specification, the conversion process includes a preprocessing of the input automaton where the probabilities of the outgoing transitions for each state are recalculated, such that (i) loop and  transitions are ignored, (ii) the probabilities of the other transitions are added up to derive “the new 100%”, and (iii) each transition obtains its original probability divided by “the new 100%” as its new probability. Fig. 3 shows our transformation for implementing this specification: Unit recalculateProbs specifies that its two contained rules are applied in sequential order. Rule recalculateProbsTemps uses rule nesting to iterate over all outgoing non- loop/empty/ transitions of all states, so that their probabilities are stored in variable a. The sum of all values of a is stored in a newly created Trace object, using Henshin’s Aggregations helper class to compute the sum. Rule recalculateProbsUpdate performs the same iteration as before to update the probabilities of the involved transitions. Given the old probability b, the new value is b/a, where a is the aggregate percentage stored in the Trace object. In this process, the Trace objects become obsolete and are consequently deleted. Figure 3: Solution for extension 2: pre-processing step. The remaining conversion process is the same except for the label and probabilities handling, where we adhere to the specification: “when we concatenate two labels we multiply their probabilities. When we ’or’ two labels, we add their probabilities.” For example, in the rule fixTransitionsLooped, for the transi- tion beingly newly created, the probability needs to account for the probabilities of the original transitions, and the label needs to represent the probability of the loop transition. To this end, we modified this rule as shown in Fig. 3. During the matching process, we now store the labels as well as the probabilities of the matched transition in variables ({a, b, l} and {pa, pb, pl}, re- spectively). In the multi-rule, we then use these variables in the label and probability of each newly created transition. The label includes the probability of the loop transition, the probabilities is made up from the probabilites of the input transitions. Figure 4: Solution for extension 2: excerpt. 3 Evaluation In this section, we apply the evaluation criteria from the case description to our solution. The solution was executed on a Windows 10 system (Intel Core i7-5600U, 2.6 GHz; 8 GB of RAM, Java 8, 2 GB Xms). We ran the experiment with a timeout duration of max. 1 hour per model, following the reference solution. Correctness. When applied to the provided test models, our solution produced correct results in all cases. For the main task and extension 2, we used the provided benchmark framework for verifying our solution. We slightly extended the framework so it supports the execution against user-specified timeout durations. For extension 1, unfortunately, we could not follow the evaluation process described in the case description: The FSA data structure used by the reference implementation does not support input FSAs with more than one initial state, rendering it infeasible as a baseline for the correctnes check. Instead, we performed a light-weight validation by checking if the numbers of states, initial states, final states, and transitions changed as expected. Suitability. With our solution, we aimed at providing a primarily declarative solution. We achieved this goal by specifying all parts of the state elimination algorithm (Listing 2 and 3 in [GVP+ 17]) using Henshin’s declarative rule and control flow concepts. In addition, our solution includes two minor imperative parts, written in Java: A driver to trigger the execution of the Henshin interpreter with the specification, and a part to convert the output of state elimination to an expression (Listing 4 in [GVP+ 17]). Performance. Table 1 shows the performance measurements Model Execution time (sec.) in comparison to the available data for the reference solution. For (#states) Reference Henshin the three smallest test models, we observed a slow-down. For all of the remaining models, we observed an increasingly growing speed- 3 2 (26) 0.09 0.21 up, amounting to an order of two magnitudes in the case of the 4 2 (61) 0.14 0.25 largest input model 4 4. A complexity analysis to substantiate 3 3 (69) 0.49 0.29 this observation is left to future work. 5 2 (141) 3.46 0.78 3 4 (147) 4.37 0.77 Scalability. The largest case where our solution produced a re- 3 5 (273) 58.60 2.93 sult within one hour was 4 5, taking 10:12 minutes for 1933 states. 4 3 (274) 57.78 2.32 Given more time, we can transform larger models as well, e.g., 5 4 6 2 (335) 143.12 3.45 in 120:41 minutes for 4244 states. 3 6 (459) 461.64 10.09 4 4 (812) 4786.58 48.15 4 Outlook Table 1: Performance measurements. Conceptually, the state elimination case is a highly interesting sce- nario for our ongoing work on the variability of model transformations [SRA+ 16, SS16]. It features variability in two dimensions: variability in the language of the input automata (plain and probabilistic FSAs), and variability in the solution artifacts of the transformations (repeatedly, we handle a looped case differently from an unlooped case). We intend to use this scenario as a case study for extending our work towards language-level variability. References [ABJ+ 10] Thorsten Arendt, Enrico Biermann, Stefan Jurack, Christian Krause, and Gabriele Taentzer. Henshin: advanced concepts and tools for in-place EMF model transformations. Model Driven Engineering Languages and Systems (MoDELS), pages 121–135, 2010. [GVP+ 17] Sinem Getir, Duc Anh Vu, Francois Peverali, Daniel Strüber, and Timo Kehrer. State Elimination as Model Transformation Problem. 10th Transformation Tool Contest (TTC), 2017. [SBG+ 17] Daniel Strüber, Kristopher Born, Kanwal Daud Gill, Raffaela Groner, Timo Kehrer, Manuel Ohrn- dorf, and Matthias Tichy. Henshin: A Usability-Focused Framework for EMF Model Transformation Development. In International Conference on Graph Transformation (ICGT), pages 196–208, 2017. [SRA+ 16] Daniel Strüber, Julia Rubin, Thorsten Arendt, Marsha Chechik, Gabriele Taentzer, and Jennifer Plöger. RuleMerger: automatic construction of variability-based model transformation rules. In Int. Conf. on Fundamental Approaches to Software Engineering (FASE), pages 122–140, 2016. [SS16] Daniel Strüber and Stefan Schulz. A tool environment for managing families of model transformation rules. In International Conference on Graph Transformation (ICGT), pages 89–101, 2016.