Introduction

Pending Evolution of Grammars

0 Software Analysis & Transformation Team, Centrum Wiskunde & Informatica , Amsterdam , The Netherlands

2 9

The classic approach to grammar manipulation is based on instant processing of grammar edits, which limits the kinds of grammar evolution scenarios that can be expressed with it. Treating transformation preconditions as guards poses limitations on concurrent changes of the same grammar, on reuse of evolution scripts, on expressing optionally executed steps, on batch processing and optimization of them, etc. We propose an alternative paradigm of evolution, where a transformation can be scheduled for later execution based on its precondition. This kind of extreme evolution can be useful for expressing scenarios that are impossible to fully automate within the classic or the negotiated transformation paradigms.

Introduction

A colloquial expression `consider it done' means that the subject of the conversation is either indeed already done, or will be done in the very near future | in either case, the receiver of such a message can rest assured that the subject will take place if it has not already, and is expected to act as if it has indeed happened. The technique of pending evolution that we introduce in this paper, is similar to that expression, and the bene ts of it are not unlike the subtle di erences between considering something done and it having been done.

As it turns out, the pending evolution scheme allows us to e ciently model scenarios of grammar evolution, deployment and maintenance that are impossible to model within the traditional grammar transformation paradigm, which is brie y explained in x2. The method is introduced in x3. Since the most pro ts hide deep in the details, we spend the rest of the paper (x4) by motivating the use of pending evolution for grammars instead of classical evolution scripts, by concrete examples. x5 concludes the paper by summarizing its contributions and discussing related work. GrammarLab is a codename for a grammar manipulation project that is currently being migrated from the Software Language Processing Suite1 initiative to its own repository2. It is centered around the concept of a grammar in a broad sense [5], which can be extracted by abstracting away the idiosyncratic details that we see on class diagrams, in algebraic data type de nitions, object grammars, concrete syntax speci cations, database schemata and exposed library interfaces | all these are `grammars in a broad sense', since they model commitment to grammatical structure.

Beside extraction, GrammarLab is good in dealing with programmable grammar transformations | a disciplined method of grammar evolution, where every change is expressed as a call to a transformation operator with a well-de ned semantics; as well as grammar mutations | large scale strategies for changing one simple thing in a priori unknown number of places. GrammarLab also includes library for grammar analysis and metrics, but they are less relevant for this paper. 3

Pending evolution

In GrammarLab, the evolution of a grammar is speci ed by a sequence of steps, each referring to a transformation operator or a mutation, with proper parameterization: e.g., rst we rename a nonterminal, then we factor its de nition and then extract a part of it into a new nonterminal. Each of the possible transformation operators and mutations in the library, have preconditions that determine their applicability and postconditions that demonstrate their successful execution. Whenever a postcondition of a step or a precondition of the next step fails, the transformation sequence is interrupted and an error occurs instead.

When a negotiated transformation paradigm [13] is explored, failure of a preor a postcondition means the start of a negotiation: e.g., if a rename fails, the engine can propose alternative name pairs that would enable its execution. A clever strategy for negotiations can drastically increase applicability and reuse of a transformation script, while still allow for full automation.

The pending evolution paradigm that we propose here, can hold any transformation step pending until its precondition becomes enabled. As will become apparent from the following examples in x4, it is possible to: ( 1 ) push the pending steps all the way to the end of the evolution sequence and then disregard them; or ( 2 ) leave them forever pending and always ready to be applied any number of times necessary; or ( 3 ) relax the constraints about the order of steps; or ( 4 ) collect and log the information about all the possibly non-sequential failures in a system; or even intentionally decide that particular steps must be taken by defer their actual execution until later. Only the simplest local cases can be expressed in terms of negotiations. On the other hand, only simplest negotiations can be expressed as pending changes. In short, negotiated transformations enables exibility with the outcome of one step, while pending evolution enables exibility with the order of multiple steps. 2 V. Zaytsev. GrammarLab, 2013. http://grammarware.github.io/lab.

Scenarios

The paradigm of pending evolution probably has much wider applicability, but here we sketch at least four user stories for it, inspired by the problems in the grammarware technological space that can be addressed and solved there. 4.1

Optional execution

In the classic grammar transformation engine of GrammarLab, any grammar transformation step that changes nothing in the grammar (we call them `vacuous transformations'), is considered erroneous, since in most reasonable contexts | correction, adaptation, evolution, etc | a change that changes nothing, is meaningless. However, with some negotiated transformation schemes [13], one could nd it sensible to ignore the fact that a transformation step brought no actual changes, if considered in a broader context. In particular, consider the following scenarios: { Suppose we have a repository of grammars, such as the Grammar Zoo3 [15].

The repository is highly heterogeneous and contains `grammars in a broad sense' extracted from parser speci cations, compiler sources, readable documentation, privately created webpages, community contributed wikis, generated and manually built artifacts. However, one of the steps known from grammar research [8] to increase the quality of a grammar, is resolving all `bottom' nonterminals | the ones that are used within the grammar but never de ned (a grammar with no bottom nonterminals is called a `level 3 grammar' by Lammel and Verhoef in [8]). While some de nitions are simply missing from the grammar due to development mistakes, quite commonly these are lexical, or character-level, de nitions, containing the rules about how an identi er name or a numeric literal should look in a language being de ned. A big fraction of these, as becomes apparent after mining Grammar Zoo, have meaningful names such as `string', `identi er', `integer', `id', etc, and can be matched to a small library of prede ned production rules such as `a string is a symmetrically quoted sequence of one or more characters' or `an integer value is an optional sign followed by one or more digits, the rst of which is not zero'. This can be automated and ran over the whole repository, which can of such substantial size that prevents its manual veri cation4 | however, it would be desirable for the framework to introduce the missing de nitions only if they are truly missing, and allow individual grammars to retain their speci c views of what a string or a boolean looks like. Hence, we allow the introduce operator5 to be left pending, and disregard it at the end of the transformation application. 3 Grammar Zoo, http://slps.github.io/zoo 4 Grammar Zoo contains 569 grammars at the day of paper submission. 5 Introduce and other grammar transformation operators are documented at http: //github.com/grammarware/slps/wiki/introduce and similar URIs. { Consider another scenario. In grammarware technological space, there are two most common styles of production rules, that we will traditionally call horizontal and vertical. A horizontal de nition says that the nonterminal N is `either X or Y or Z', while the vertical one makes three statements that `N is X', as well as `N is Y', as well as `N is Z'. These can be formally proven to be equivalent. There are many exceptions, but most language documents prefer horizontal de nitions (e.g., Java Language Speci cation [3]), while language workbenches tend toward vertical ones (e.g., The Meta-Environment [4]) or make no distinction between them (e.g., Rascal [6]). Some transformation operators also expect their arguments to be either horizontal or vertical, which leads to the evolution scenarios speci ed in such a way where some of the operator calls are preceded by the calls of horizontal or vertical operators, while others are not. Obviously, this excessive versatility hinders maintainability and changeability of the transformations. It would be better to write these transformation steps as assertions. For instance, we can specify that the de nition must by vertical before we deyaccify it, and this step would be optional, requiring no action if the original de nition is already vertical. 4.2

Error handling

In GrammarLab, transformations are stopped whenever an error occurs, and an error message is displayed. Within the negotiated paradigm [13], it is possible to negotiate for another outcome. One of the rather complex strategies for achieving that, is the one that skips over the failing transformation step and proceeds with the rest of the script, and then displays all the error messages at the end of the computation. To demonstrate the usefulness of this approach, consider the following detailed scenarios.

{ In the context of grammar recovery, suppose that we want to extract several grammars in bulk | they are written in the same style, in the same metalanguage, with some a priori unknown di erences between them (perhaps they are di erent versions or dialects of the same language). After carefully considering one of them, a grammar engineer develops a post-extraction transformation script that makes the grammar maximally connected, adds missing de nitions, xes misspelt nonterminal names and corrects other problems. Naturally, we want to reuse the same transformation script for recovering the rest of these grammars. However, in the traditional setup, most of the automated reuse cases will fail because some of the extracted grammars will have some misspellings already xed, others will lack the part that concerns the xes, etc. Advanced error handling (or ignoring) can help greatly with scalability in this case, by skipping over inapplicable xes, applicability classi cation, etc. { Imagine another scenario concerning maintenance of grammar transformation scripts. Suppose that we have several grammars that are being converged together in multiple steps | e.g., the case study converging six Java grammars found in di erent editions of the Java Language Speci cation book, consisted of 1611 transformation steps arranged in 70 di erent scripts [10]. When an error is spotted in one of the existing steps, or when another step needs to be added in the middle of the transformation chain, or when the order of existing steps needs to be adjusted, it becomes a very labor-intensive task since every failure stops the transformation computation | having the luxury of recovering after a failure noticeably increases debugging capabilities. 4.3

Pending xes

Both the traditional programmed and the negotiated transformation models delegate the decision about the transformation order to the original script: any transformation step takes place after the one that precedes it in the speci cation and is followed by the one after it. However, there are situations when we can develop certain transformation scenarios and leave them pending so that they can be executed when (if) the times comes and they become applicable. In general this is useful in case of preserving any kind of normal form properties, but we provide two detailed cases taken from practice: { Recall the di erence between horizontal and vertical de nitions that we have explained in the previous section. Suppose that our grammar uses vertical de nitions exclusively | this is easy to achieve by grammar transformations or mutations, and easy to validate with a metaprogramming formula or by micropatterns. However, if we would like to specify that the dominance of vertical de nitions is not incidental and that we would like to preserve it, it is not possible to express this constraint within the straightforward grammar programming approach. With pending evolution, we could leave the verticalizing mutation pending. Then, if someone introduces a new nonterminal to the grammar, and that nonterminal is found to be horizontal, the mutation becomes enabled, is executed and recharged for next use. { Grammar recovery is a process of extracting a grammar from an existing software artifact that may not be of perfect quality. Automated grammar recovery methodology [14] is based on a collection of heuristics that are partly con gurable and partly inferred from the notation speci cation. One of such heuristics is splitting composite terminals: for instance, if a terminal like `);' is found, it can be broken into two consecutive terminals: `)' and `;' | simply because the resulting atomic terminals are more helpful for other heuristics (like matching parentheses). A grammar mutation that breaks up composite terminals, can be programmed and left pending, such that under any circumstances that would bring such terminals to the grammar (such as importing another grammar, introducing a new nonterminal de nition, folding/unfolding, projecting symbols, etc), it becomes enabled and is immediately red to split such terminals as desired.

These scenarios are su ciently di erent from the ones in the previous section not only in motivation, but also in realization, since we speak of pending mutations (which are large scale transformations) and recharging them after transformation. 4.4

Intentional pending

Since we have discussed preserving the grammar already being in the normal form, another scenario deserves mentioning where the grammar is normalized | or rather, when such a normalizing mutation is left pending. Below there are two use cases for this situation, but any normalization could possibly spawn another one.

{ Suppose that we have a grammar written in a speci c notation (usually a dialect of EBNF). Suppose also that a notation evolves, and the grammar is required to coevolve in order to preserve conformance to the metalanguage. This scenario is called `metalinguistic evolution' [12] and has been studied su ciently to be applied in an automated fashion. One of such applications involves a grammar being exported to a particular notation, which it might not perfectly t. For instance, the grammar may use an explicit repetition (usually denoted with `*') or other metaconstructs which are lacking from the notation. Another typical case is that the target metalanguage insists on a particular naming convention for the nonterminal (e.g., all must be written in capitals). In that case, the grammar needs to coevolve with the `change' of notation from its original one to the one that it is being exported to. However, this coevolution is essentially a part of the exporting process, and as such must always take place after all the other evolution steps. Hence, it can be left pending until the very end of the transformation script, and be executed last, removing the use of excessive metalanguage elements, changing the naming convention and adjusting grammar before the actual export mapping. { Grammars in a broad sense can be observed in very di erent environments and extracted from artifacts hailing from di erent technological spaces: XML schemata, Ecore models, class diagrams, parser speci cations, data types, etc. Even when these de ne one intended language, they are di erent in many ways. A technique called grammar convergence [9] is used to reverse engineer the real relationships between such grammars: based on expertwritten transformation scripts, it can show which grammars de ne the same language, which de ne languages that are subsets or supersets of one another, and which are incomparable. It is also possible to automate the creation of such scripts, but the inference algorithm performs best when grammars are in so called `abstract normal form'. Many constraints of the abstract normal form contradict the practice of grammar engineering, so it would be most desirable to continue working with the non-normalized grammar and then perform the pending normalization right before the guided grammar convergence algorithm is applied. Then, the obtained result can be traced back to the original grammar by reversing the bidirectional transformation chain produced by the normalizer.

Concluding remarks

There are some techniques similar to pending evolution in the inconsistency management, most notably with concurrent transformation schemes. Such inconsistencies can be represented as separate rst-class entities [2] and incorporated directly to the resulting model [7], which enables e cient handling of inconsistency detection and resolutions as graph transformation rules [11] in a much less extreme way than the one proposed in this paper. The fact that these approaches of inconsistency modeling and resolution are not entirely covered by negotiated grammar transformation, has inspired us to look for common schemes of advanced change impact propagation, importing ideas from modelware to grammarware and adapting them to the domain.

To summarize, we have proposed the following use cases for the technique of pending grammar evolution: { optional execution (x4.1) optionally complementing the grammar with missing de nitions using optional transformations as assertions { error handling (x4.2) reusing transformations for bulk extraction debugging transformations { pending xes (x4.3) persistent commitment to a normal form pending recovery heuristics { intentional pending (x4.4) pre-export processing pre-convergence normalization

Pending evolution for grammars (either in a broad sense [5] or in the classic sense [1]) has never been considered before. Investigating the impact and opportunities for pending evolution schemes in other elds like program transformation remains future work. In transaction handling domains both of great strictness (such as database management and mainframe job processing) and persistent inconsistency (such as managing wiki contents with a bot) one will be able to nd techniques somewhat similar to the one we have proposed here. 5. P. Klint, R. Lammel, and C. Verhoef. Toward an Engineering Discipline for Grammarware. ACM Transactions on Software Engineering Methodology (TOSEM), 14( 3 ):331{380, 2005. 6. P. Klint, T. van der Storm, and J. Vinju. EASY Meta-programming with Rascal.

In J. M. Fernandes, R. Lammel, J. Visser, and J. Saraiva, editors, Post-proceedings of the Third International Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE 2009), volume 6491 of LNCS, pages 222{289, Berlin, Heidelberg, Jan. 2011. Springer-Verlag. 7. M. Kogel, H. Naughton, J. Helming, and M. Herrmannsdorfer. Collaborative Model Merging. In Companion of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, SPLASH '10, pages 27{34, New York, NY, USA, 2010. ACM. 8. R. Lammel and C. Verhoef. Semi-automatic Grammar Recovery. Software|

Practice & Experience, 31(15):1395{1438, Dec. 2001. 9. R. Lammel and V. Zaytsev. An Introduction to Grammar Convergence. In M. Leuschel and H. Wehrheim, editors, Proceedings of the Seventh International Conference on Integrated Formal Methods (iFM 2009), volume 5423 of LNCS, pages 246{260, Berlin, Heidelberg, Feb. 2009. Springer-Verlag. 10. R. Lammel and V. Zaytsev. Recovering Grammar Relationships for the Java Language Speci cation. Software Quality Journal (SQJ), 19( 2 ):333{378, Mar. 2011. 11. T. Mens, R. Van Der Straeten, and M. D'Hondt. Detecting and Resolving Model Inconsistencies Using Transformation Dependency Analysis. In O. Nierstrasz, J. Whittle, D. Harel, and G. Reggio, editors, Model Driven Engineering Languages and Systems (MoDELS'06), volume 4199 of LNCS, pages 200{214. Springer, 2006. 12. V. Zaytsev. Language Evolution, Metasyntactically. Electronic Communications of the European Association of Software Science and Technology (EC-EASST), 49, 2012. 13. V. Zaytsev. Negotiated Grammar Transformation. In J. De Lara, D. Di Ruscio, and A. Pierantonio, editors, Post-proceedings of the Extreme Modeling Workshop (XM 2012). ACM Digital Library, Nov. 2012. In print, currently available at http:// www.di.univaq.it/diruscio/sites/XM2012/xm2012_submission_11.pdf. An extended version is currently under major revision to the Special issue on Extreme Modeling of The Journal of Object Technology (JOT). 14. V. Zaytsev. Notation-Parametric Grammar Recovery. In A. Sloane and S. Andova, editors, Post-proceedings of the 12th International Workshop on Language Descriptions, Tools, and Applications (LDTA 2012). ACM Digital Library, June 2012. 15. V. Zaytsev. Grammar Zoo: A Repository of Experimental Grammarware. Under major revision for the Fifth Special issue on Experimental Software and Toolkits of Science of Computer Programming (SCP EST5), 2013.

A. V.

Aho ,

Sethi , and

J. D.

Ullman . Compilers: Principles, Techniques and Tools. Addison-Wesley, 1985 .

Cicchetti , D.

Di Ruscio, and

Pierantonio . A Metamodel Independent Approach to Di erence Representation . Journal of Object Technology , 6 ( 9 ): 165 { 185 , Oct . 2007 . TOOLS EUROPE 2007 | Objects, Models, Components, Patterns.

Gosling ,

Joy ,

G. L.

Steele , and G. Bracha. The Java Language Speci cation . Addison-Wesley, third edition , 2005 .

Heering ,

P. R. H.

Hendriks ,

Klint , and

Rekers. The Syntax De nition Formalism SDF | Reference Manual . ACM SIGPLAN Notices , 24 ( 11 ): 43 { 75 , 1989 .