Survey on Template-based Code Generation Lechanceux Luhunu Eugene Syriani DIRO DIRO University of Montreal University of Montreal Montreal, QC, Canada Montreal, QC, Canada Email: lechanceux.luhunu.kavuya@umontreal.ca Email: syriani@iro.umontreal.ca Abstract—Among the various model-to-text transformation 60 # of papers paradigms, template-based code generation (TBCG) is the most 50 popular in MDE. Given the diversity of tools and approaches, it is necessary to classify and compare existing TBCG techniques 40 to provide appropriate support to developers. We conduct a systematic mapping study of the literature to better understand 30 the trends and characteristics of TBCG techniques over the past 16 years. We also evaluate the expressiveness, performance and 20 scalability of the associated tools based on a range of models that implement critical patterns. 10 Index Terms—model-driven engineering, code generation, sys- 0 tematic mapping study, performance study, expressiveness eval- 2000 2002 2004 2006 2008 2010 2012 2014 2016 uation Fig. 1: Evolution of papers in the corpus I. M OTIVATION AND GOALS A critical step in model-driven engineering (MDE) is the automatic synthesis of a textual artifact from models. This range of models that conform to a metamodel composed by is a very useful model transformation to generate application the combination of these patterns. code, to serialize the model in persistent storage, and gen- II. T RENDS OF TBCG erate documentation or reports. Among the various model-to- We first present the results of the SMS we conducted on text transformation paradigms, template-based code generation TBCG. (TBCG) is the most popular in MDE. TBCG is a synthesis technique that produces code from high-level specifications, A. Systematic Mapping Study called templates. A template is an abstract and generalized We followed the process defined in [1] to portray the representation of the textual output it describes. It has a static literature on TBCG. The protocol we followed is described part, text fragments that appear in the output “as is”, and a in [2]. The research questions guiding the SMS are: (RQ1) dynamic part embedded with splices of meta-code that encode What are the trends in template-based code generation? (RQ2) the generation logic. It is a popular technique in MDE, as What are the characteristics of TBCG approaches? (RQ3) To they both emphasize abstraction and automation. Given the what extent are TBCG tools being used? (RQ4) What is the diversity of tools and approaches, it is necessary to classify place of MDE in TBCG? We collected 5 131 papers published and compare existing TBCG techniques and tools to provide between 2000–2016 from online databases that matched the appropriate support to developers. keywords we searched for. After screening all these papers, In this work, we conduct a systematic mapping study we obtained a final corpus of 481 papers. We then classified (SMS) of the literature in order to understand the trends, each paper according to a classification scheme (available in identify the characteristics of TBCG, assess the popularity [3]) that helps answering our research questions. of existing tools, and determine the influence that MDE has had on TBCG over the past 16 years. Based on this SMS, B. Evolution of TBCG we compare the nine most popular TBCG tools found in Fig. 1 reports the number of papers per year, averaging the literature. We perform a qualitative evaluation of their around 28. This significantly large sample of papers clearly expressiveness based on typical metamodel patterns that influ- suggests that TBCG has received sufficient attention from the ence the implementation of the templates. The expressiveness research community. The community has maintained a pro- of a tool is the set of language constructs that can be used duction rate in-line with the last 11 years average, especially to complete a particular task natively. This is important since, with a constant rate of appearance in journal articles (24%). to the best of our knowledge, there are no available metrics The only exceptions were a significant boost in 2013 and a to assess the code generation templates. We also evaluate dip in 2015. The most popular venues are M ODELS, S O S YM, the performance and scalability of these tools based on a E CMFA. However, we noticed a decrease of publications in MDE venues, indicating that TBCG is now applied in devel- opment projects rather than being a critical research problem to solve. Conference papers as well as venues outside MDE and software engineering had a significant impact on the evolution of TBCG. Given that TBCG seems to have reached a steady publication rate since 2005, we can expect contributions from the research community to continue in that trend. C. Characteristics of TBCG Output-based templates have always been the most popular style from the beginning (72%). This template style is when Fig. 2: Invoice metamodel the template is syntactically based on the actual target output, such as in [4] that uses Xpand. Nevertheless, there have been some attempts to propose other template styles, like the rule- modeling [7]. Thus, it is not surprising to see that many, though based style in [5], but they did not catch on (4%). Because not exclusively, code generation tools came out from the MDE of its simplicity to use, the predefined style is probably still community. As TBCG became a commonplace in general, the popular in practice, e.g., in CASE tools, but less in research research in this area is now mostly conducted by the MDE papers (24%). community. Furthermore, MDE has brought very popular tools TBCG has been used to synthesize a variety of application that have encountered a great success, and they are also code or documents. As expected, the study shows that high- contributing to the expansion of TBCG across industry. It is level language inputs (general purpose 48% or domain-specific important to mention that the MDE community publishes in 22% modeling languages) have prevailed over any other type specific venues like M ODELS, S OSYM, or E CMFA unlike other (schema 20% or programming languages 10%). Specifically research communities where the venues are very diversified. for MDE approaches to TBCG, the input to transform is These three are the top ranked venues in terms of number of moving from general purpose to domain-specific models. TBCG paper published. All this analysis clearly concludes that The study confirms that the community uses TBCG to the advent of MDE has been driving TBCG research. generate mainly source code (81%), rather than structured data e.g., XML (16%) or natural language documents (3%). This III. T OOL E XPRESSIVENESS trend is set to continue since the automation of computerized We evaluate and compare the nine most popular tools found tasks is continuing to gain ground in all fields. TBCG has been in the SMS with respect to metamodel patterns that drive implemented in many domains, software engineering (55%) the implementation of the dynamic part of the template. The and embedded systems (13%) being the most popular, but also complete evaluation methodology is described in [8]. unexpectedly in unrelated domains, such as bio-medicine and finance. A. Metamodel Patterns for TBCG The study revealed a total of 77 different tools for TBCG. Many studies implemented code generation with a custom- To evaluate the expressiveness of TBCG tools, we identify made tool that was never or seldom reused. This indicates a minimal set of four common structures found in metamodels that the development of new tools is still very active. Model- that influence TBCG. This is the result of analyzing a plethora based tools are the most popular (49%). Since the research of metamodels that were used for TBCG from repositories [9], community has favored output-based template style, this has [10], known metamodel patterns [11], and industrial experi- particularly influenced the tools implementation. This tem- ences [12]. We evaluated the tools on the common running plate style allows for more fine-grained customization of the example of invoice production, for which the metamodel is synthesis logic which seems to be what users have favored. depicted in Fig. 2. This particular aspect is also influencing the expansion of 1) Navigation: this pattern is when there is a navigable TBCG into industry. Well-known tools like Acceleo, Xpand relation between two classes. A template uses it to access the and Velocity are moving from being simple research material data of a target class related to the class of the current context. to effective development resources in industry. 2) Variable dependency: this pattern is like the navigation pattern, but when the template desires to output a value that D. Role of MDE depends on variables present in other classes. The burst of papers in 2005 coincides with the transition 3) Polymorphism: this pattern takes advantage of an inher- form the U ML to M ODELS conference. MDE venues have led itance relationship to reuse parts of the template. The template to increase the average number of publications by a factor implements the output for the super class and only what varies of four. There are many advantages to code generation, such for the subclass(es). as reduced development effort, easier to write and understand 4) Recursion: this pattern consists of a recursive self- domain/application concepts and less error-prone [6]. These relation of a class. The template can be reapplied on objects are, in fact, the pillar principles of MDE and domain-specific of the same type in a transparent way. Type Tool Navigation Variable dependency Polymorphism Recursion Acceleo X X X X Model- Xpand X × X × based EGL X X X X Xtend2 X X X X JET X X × × Velocity X X × × Code- T4 X X × X based StringTemplate X × × × XSLT X X × × TABLE I: Summary of the qualitative evaluation of the tools expressiveness B. Template expressiveness 1E+6 Time (ms) Table I summarizes the qualitative evaluation of the expres- 1E+5 siveness of each TBCG tools, showing whether it successfully 1E+4 implemented each pattern or not. All tools successfully implement the trivial navigation pat- 1E+3 tern. For example, to access the meta-data date from the 1E+2 invoice object, all tools use the dot operator. In XSLT, nav- igating through a composition relation is accomplished with 1E+1 the xsl:value-of expression. It also requires a different Model size 1E+0 strategy when the relation is an association. 1E+1 1E+2 1E+3 1E+4 1E+5 We implemented the variable dependency pattern to output JET Velocity ST Xtend2 T4 XSLT Xpand EGL Acceleo and calculate the total of the invoice. Acceleo and XSLT Fig. 3: Tool performance, no recursion have powerful built-in mathematical functions, especially for collection types. EGL, JET, Velocity, T4, and Xtend2 rely on the use of global variables and statement blocks. It was not IV. P ERFORMANCE E VALUATION possible to implement this pattern with StringTemplate (ST) To compare the performance of the tools, we generated 10 and Xpand “natively”. We resorted to extend the template with models conforming to Fig. 2, with a size varying from 10 a Java program to handle the calculations. to 105 classes. There are 3 instances of navigation, 7 to 105 We used the polymorphism pattern to process the priced variable dependencies, 6 to 105 ploymorphisms, and 1 to 102 items of an invoice, that are subtypes of the abstract item recursions. Fig. 3 shows that the execution time increases with type. In Acceleo, Xpand, and Xtend2, it is mandatory to write the size of the model for all tools. The complete evaluation a template block for the super class even though its content is methodology is described in [8]. not printed in the output. In EGL, the content of the superclass Overall, JET is the fastest tool, completing the whole template definition block is output, along with the content of experiment. This is expected since JET generates instantly the the one for the subclass. In JET, Velocity, T4, XSLT, and ST, corresponding Java class from the template as the developer no template code can be defined for abstract classes. Thus, is writing the template. Therefore, the execution time here the developer must replicate the common template code for corresponds to executing the generated Java code that produces all possible subclasses. the output. Excluding the special case of JET, Velocity and ST We implemented the recursion pattern to obtain the depth are the fastest. T4 is as efficient as JET for smaller models. level of the invoice Category from the hierarchy of cate- However, for larger models, it becomes slower than Velocity gories present in the model. We were only able to implement it and ST. Xtend2 outperforms T4 for these models tool, making in EGL, Acceleo, Xtend2, and T4 thanks to the use of function it the fastest model-based tool. Xpand and XSLT come next. or typed definition block. The dedicated language of T4 allows The slowest tools are EGL followed by Acceleo. to call C# functions defined in the template and thus implement Velocity templates execution scales remarkably well by only recursion. Although Xpand supports typed definition blocks, a factor of 15 for models with 105 elements compared to they only take a single argument which is a type of element smaller models with 103 elements. It is followed by JET, in the input metamodel. Thus it is not possible to accumulate Xtend2, ST, and XSLT with around a factor of 25. For the a value in a variable. XSLT does not implement this pattern remaining tools, the size of the model has a significant effect either because there is no trace between the argument that is on their performance.T4 and Acceleo have the worst scale passed to the function and the variable passed in the initial factor. invocation. It is not possible to implement this pattern in JET, Enabling the recursion pattern gives a similar trend for the ST, and Velocity due to the absence of typed definition block four tools concerned. It did not influence significantly the or function. performance of Acceleo and T4, but Xtend2 performed 10% slower than in Fig. 3. However, EGL performed 10% faster because the dedicated language EOL supports caching [13]. V. C ONCLUSION The community has been diversely using TBCG over the past 16 years, and that research and development is still very active. TBCG has been greatly influenced by MDE. Both model-based and code-based tools are becoming effective development resources in industry. The former are the most capable tools since most of them successfully implemented all the metamodel patterns. However, the latter performed much faster. Although JET is the fastest tool, Xtend2 offers the best compromise between the expressiveness and the performance. R EFERENCES [1] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, “Systematic Map- ping Studies in Software Engineering,” in Evaluation and Assessment in Software Engineering, ser. EASE’08, vol. 17. British Computer Society, 2008, pp. 68–77. [2] E. Syriani, L. Luhunu, and H. Sahraoui, “Systematic Mapping Study of Template-based Code Generation,” Tech. Rep. arXiv:1703.06353, 2017. [3] http://www-ens.iro.umontreal.ca/∼luhunukl/survey/classification.html. [4] W. Dahman and J. Grabowski, “UML-based specification and generation of executable web services,” in System Analysis and Modeling, ser. LNCS, vol. 6598. Springer, 2010, pp. 91–107. [5] Z. Hemel, L. C. Kats, D. M. Groenewegen, and E. Visser, “Code generation by model transformation: a case study in transformation modularity,” Software & Systems Modeling, vol. 9, no. 3, pp. 375–402, 2010. [6] R. Balzer, “A 15 Year Perspective on Automatic Programming,” Trans- actions on Software Engineering, vol. 11, no. 11, pp. 1257–1268, 1985. [7] S. Kelly and J.-P. Tolvanen, Domain-Specific Modeling: Enabling Full Code Generation. John Wiley & Sons, 2008. [8] L. Luhunu and E. Syriani, “Comparison of the expressiveness and performance of template-based code generation tools,” in Software Language Engineering, ser. LNCS. Springer, 2017. [9] http://web.emn.fr/x-info/atlanmod/index.php?title=Zoos. [10] http://www.remodd.org. [11] H. Cho and J. Gray, “Design Patterns for Metamodels,” in Domain- Specific Modeling workshop, ser. SPLASH ’11 Workshops. ACM, 2011, pp. 25–32. [12] V. Sousa, E. Syriani, and M. Paquin, “Feedback on How MDE Tools are Used Prior to Academic Collaboration,” in Symposium On Applied Computing, 2017. [13] D. Kolovos, L. Rose, R. F. Paige, and A. Garcıa Domınguez, The Epsilon Book. Eclipse, 2010.