Testing Code Generators: a Case Study on Applying USE, EFinder
and Tracts in Practice

Zijun Chen 1, Wilbert Alberts 2 and Ivan Kurtev 1,3
1
  Technical University of Eindhoven, De Zaale, Eindhoven 5612 AZ, the Netherlands
2
  ASML, De Run 6501, Veldhoven 5504 DR, the Netherlands
3
  Altran Netherlands, Limburglaan 24, Eindhoven 5652 AA, the Netherlands


                Abstract
                A commonly found application of model transformations is in the implementation of code
                generators where typically a chain of model-to-model and model-to-text transformations is
                used. A number of approaches for testing such transformations have been proposed in the
                literature but still there is not much experience in applying them in large non-trivial industrial
                projects. In this paper we present the results of a case study for improving the testing process
                of an industrial code generator. We focused on two aspects: automatic support for generating
                an efficient suite of input test models and alleviating the test oracle problem by using
                lightweight transformation specifications based on the tracts approach. In both aspects OCL is
                heavily used as a main specification language. Our experiments involved three tools: USE,
                EFinder, and TractsTool. We observed that these tools and the corresponding approaches can
                be useful in practical testing of code generators but improvements in two main directions are
                needed. On one hand, the proposed approaches should be better adapted to the ecosystems and
                working assumptions in the industry. On the other hand, the practitioners should reconsider
                some of their practices in order to fully benefit from the recent academic achievements.

                Keywords 1
                Model transformation testing, code generators, classifying terms, USE, EFinder, tracts

1. Introduction
   Model transformations are key operations in Model Driven Engineering. One of the main scenarios
in which they are used is code generation where one or more models are transformed to executable code
in a programming language. Usually, a code generator is implemented as a chain of model-to-model
(M2M) and model-to-text (M2T) transformations. Like any other software artifact, these
transformations need to be specified, implemented, and checked for correctness. One particular way of
checking is testing. In the last decade a number of approaches for transformation testing has been
proposed in the literature that address various aspects of the testing process. However, there is still little
knowledge on how these approaches perform in an industrial context when applied to non-trivial
transformations.
   In this paper we present initial results from a case study in which the testing process of an existing
industrial code generator is analyzed, points for improvement are identified and various techniques are
applied and evaluated. More concretely, we focused on two aspects: generation of an efficient suite of
input test models and using a form of transformation specification in order to support the development
of test oracles. In this context, an efficient suite is understood as a set of models that do not have
duplicated characteristics. More concretely, the space of possible models is partitioned and only a single
model from each partition is selected. In order to achieve this we applied the classifying terms approach

Proceedings Name, Month XX–XX, YYYY, City, Country
EMAIL: zijunchen.work@gmail.com (Z. Chen); wilbert.alberts@asml.com (W. Alberts); i.kurtev@tue.nl (I. Kurtev)
ORCID: XXXX-XXXX-XXXX-XXXX (A. 1); XXXX-XXXX-XXXX-XXXX (A. 2); XXXX-XXXX-XXXX-XXXX (A. 3)
             ©️ 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
[1]. Furthermore, deciding if the result of a particular test is correct is not always a trivial task. This is
known as the oracle problem, a challenge inherent to any testing process and in particular in model
transformation testing. In order to address this challenge we experimented with the TractsTool and its
corresponding technique of specifying model transformations as tracts [2].
    The results of this case study brought us to two main conclusions. The techniques we investigated
have the potential to solve the intended problems (generation of an efficient test suite and the oracle
problem) and to improve the testing process. However, there are still some mismatches between the
intended usage of the tools on one hand and the used technologies and practices in the industry on the
other hand.
    This paper is organized as follows. Section 2 presents the case study, a code generator for a domain-
specific language (DSL) for data modeling. Section 3 analyzes the testing process of this generator and
motivates the choice for tools and techniques that can improve it. Section 4 presents results from
applying the selected tools. Sections 5 and 6 provide a further discussion and conclude the paper.

2. Case Study: Generation of Data Repositories from Domain Data Models
   The case study uses an existing DSL and its code generator developed by ASML and Altran
Netherlands. The generator produces code used in the TWINSCAN lithography systems developed by
ASML. It takes a data model expressed in the DSL and generates a C++ implementation of a repository
service. At runtime, clients of the repository service are able to store, retrieve, modify and delete
instances of data entity types defined in the input data model. The architecture of the generator is shown
in Figure 1:


Figure 1: Architecture of the code generator

    The generator consists of two transformation steps. The first one is a model-to-model transformation
written in QVT operational (QVTo, https://www.omg.org/spec/QVT/). The input of this step is a data
model with defined data types and a generator model. The generator model decorates the domain data
model by providing information about the actual C++ implementation. This includes a mapping of the
user defined primitive types to C++ types, information about the deployment of the repository in
memory (for example, storage in heap memory versus shared memory), how the data is provided to the
clients (direct access versus clone-based access) and other implementation specific information. The
M2M transformation merges the content of the two input models into a single one that is used in the
second step. All input and output models conform to their corresponding metamodels (not shown in the
figure).
    The second step is a model-to-text transformation written in Xtend that generates the actual code.
This M2T transformation contains the main logic of the generator.
    The language for domain data models resembles UML class diagrams and is relatively simple at a
first glance. An example model is shown in Figure 2.


Figure 2: an example domain data model
    The example contains two entity types named A and B. Instances of these types can be created and
stored in the repository at runtime. An entity type has multiplicity that indicates the allowed number of
instances in the repository at any given time. For example, entity type A has multiplicity {0..3} meaning
that at most 3 instances of A can be stored in the repository. Furthermore, an entity type has usage
restrictions shown as icons in the lower part of the type: the helmet indicates if instances of this type
can be created; the pencil indicates if instances can be changed while in the repository (crossed pencil
defines immutable entities) and the trash bin indicates if the instances can be deleted. Entity attributes
and associations are also allowed. The trash bin on the source of the example association mandates that
if an instance of B is deleted then all instances of A that have a link to it have to be deleted as well. If
these instances of A are further related to other instances by similar associations, their deletion will
trigger further deletions, a process known as cascaded deletion. This feature, along with the possibly
intricate interplay of multiplicities and access restrictions, makes the implementation of the repository
service non-trivial. It should be noted that the description given here is a simplification and the language
is a part of a larger family of DSLs that supports specification of control logic and interaction with data
processing algorithms. Providing more details in this direction is beyond the scope of this paper.
    The current testing process of the generator is illustrated in Figure 3:


Figure 3: Testing of the code generator

    The test-related artifacts are shown in dashed lines. The M2M transformation step is tested with a
set of manually created input models (data and generator models). The correctness of an output model
is decided by an oracle function that compares the result to a manually created expected model. Since
the M2M transformation step is considered less complex, only a small set of test cases is used.
    The main purpose of the testing of the M2T transformation is to demonstrate the functional
correctness of the generated C++ code. This is done by manually created functional tests executed over
the generated code. The test cases generally depend on the input test models. It is crucial that the test
models cover all ‘interesting’ combinations of model elements in the input generator and data models.
For example, the generator models should cover the scenarios for memory storage and provisioning,
the input models should cover the combinations of various multiplicity ranges and access control. This
easily leads to a significant amount of possible combinations.
    The presented code generator is being used in practice and has evolved over a number of years.
Experience shows that most defects are located in the M2T transformation and are often caused by an
untested combination of model elements and generation options. Since the generated code is used in
production software, the qualification of the generator (by means of testing) is a crucial step in the
project.

3. Analysis of Testing Process and Selection of Testing Techniques
   In this section we will analyze the current status and challenges in the described testing process and
will motivate the choice of techniques that can improve it.
3.1.    Lack of Coverage Information
   Test coverage is an important indicator for the quality of a test suite and a testing process. In this
case study the coverage takes different forms: (i) a metamodel coverage that shows which metalements
are instantiated at least once in the input test models; (ii) transformation coverage that reflects the
execution of the M2M transformation code; (iii) coverage of the M2T transformation Xtend code; (iv)
coverage of the generated C++ code. Currently, there is no explicit measuring process in place that
takes all these aspects into account. Overall, integrated tools that measure and visualize the coverage of
the entire chain, starting from metamodel coverage, transformation, and generated code coverage are
not readily available although most of this can be easily achieved by integrating existing tools. We
consider this more an engineering challenge rather than a research one but we recognize its importance
in practice. Development of a dashboard that shows an overview of the coverage for the entire
transformation chain is an ongoing work.

3.2.    Manual Creation of Input Test Models
   Currently, the input test models and the C++ tests are manually created. This is a time consuming
and error-prone task and it is identified as one of the main candidates for improvement. There is a
number of approaches proposed in the literature that automate the generation of instance models that
conform to a metamodel. In the context of model transformations it is important to achieve an effective
and efficient test suite. Effective is often interpreted in terms of achieving certain degree of coverage in
the transformation specification or implementation. Techniques for model generation that support this
goal are not applicable in our case mainly because we use QVTo for implementation (see the next
subsections). Furthermore, the engineers indicated that the most important aspect in the development
of input test models is the ability to enumerate certain properties of these models (e.g. presence of
certain instances or attribute values) and leave the model completion and property combinations to a
tool. Another concern is the efficiency of the test suite: to make sure that the input model space is
partitioned properly and only a single specimen from a given partition is used. All these considerations
motivated the choice of classifying terms approach to be used in assisting the process of test model
creation. The development of the C++ functional tests was not addressed in this case study, it is briefly
discussed in the future work section.

3.3.    Lack of Transformation Specifications
    In our case study the requirements for the transformations are given in textual form and also on the
basis of examples. A number of approaches rely on the availability of a transformation specification in
a formal language or in a DSL, for example Pamomo [2][3][4]. Such specifications can solve the oracle
problem and also support the automatic generation of an effective test suite that guarantees certain
specification coverage. These benefits are counterweighted by the requirements that the developers
need to master yet another language and tool and have to provide a complete specification of the
transformation. The available time and resources in our investigation did not permit experimentation
with tools like Pamomo, for example. Instead, we chose the more lightweight approach of TractsTool
[1] where some form of specifications is given as a contract based on OCL expressions. This approach
also nicely integrates with the already chosen classifying terms technique.

3.4.    Achieving an Effective Test Suite
    Some approaches for measuring the quality of the test suite use mutation testing techniques [5][6];
others are driven by achieving maximum transformation coverage [7][8]. Unfortunately they are not
directly applicable to our M2M transformation written in QVTo since the proposed techniques work
mainly for ATL. ATL is very popular in academia, however, QVTo as an OMG standard remains
attractive for many industrial applications. We believe that most of the techniques available for ATL
can be transferred to the QVTo language.
3.5.    Testing Model-to-Text Transformations
    In [9] the problem of testing M2T transformations is reduced to testing M2M transformations by
treating the produced text as a model conforming to a very generic metamodel for textual artifacts
(folders, files, lines of text, etc.). In principle this approach can be applied in our case to ensure that
certain structural properties are present in the generated code. Ideally, these properties should entail
functional correctness of the code. In our case study, however, one of the main requirements is to
produce evidence that the generated C++ code is correct by means of functional tests. In addition, we
need an indication of the expected performance of the generated repository. This is the reason that we
did not select the mentioned approach for further experimentation.

4. Application of the Selected Techniques and Tools
  We first present our experience in developing and using classifying terms for generating test models.
Then the observations from applying the TractsTool are given.

4.1.    Using Classifying Terms for Generation of Input Test Models
   The classifying terms were identified based on features of the input models considered by the
developers as important. Some of them reflected situations that were not properly tested in the earlier
versions of the code generator and led to defects later.
   Here are some examples of classifying terms for the domain data models given as text:
        Number of entity types with lower bound multiplicity of 0. The characteristic values here
            are 0 and at least 1
        Number of associations with deletion at the source end
        Lower bound of association target end multiplicity. Characteristic values are 0, a fixed
            number and infinity
        Upper bound of association target end multiplicity. Characteristic values are the same as
            above
   Examples of classifying terms for the generator models:
       Memory deployment of the generated repository (heap versus shared memory)
       Communication mode (intra versus interprocess)
       Visibility level of the generated software module in the scope of the global software
         architecture
    Each of these terms brings a number of partitions based on the term’s characteristic values. In our
case study the product of the terms was formed and the corresponding OCL expressions were specified
as prescribed by the classifying terms approach. The two input metamodels, their associated validity
constraints (in OCL) and the OCL expressions for the classifying terms were used to automatically
generate input test models.
    The classifying terms approach was initially applied with the USE tool [10]. Unfortunately this tool
is not directly applicable to our case because it lacks import and export features for Ecore models and
metamodels (at the time of the execution of this research). Therefore, we switched to another tool –
EFinder [11], which is built on top of USE and adds the required bridge to the EMF/Ecore ecosystem
along with an useful abstraction over various OCL dialects.
    Before obtaining the desired set of input test models, a number of challenges were faced, some of
them caused by limitations in EFinder, and others caused by the way OCL is used in our DSLs. In the
following they are described in details.
         Usage of Java operations in the metamodels. The metamodel of the domain data language
            contains few Java operations that implement navigations over models. These operations
            are used in the OCL well-formedness constraints (a feature supported by the Eclipse OCL
           implementation). This mix of Java and OCL hinders the application of tools like USE and
           EFinder. In our case, it was possible to ignore the constraints using Java. In the general
           case we recommend using OCL helpers to implement useful navigation operations;
          Underspecified well-formedness constraints. We detected a situation in which USE and
           EFinder created undesired models that conform to the metamodel and the OCL
           constraints. This is a sign that the combination of the DSL metamodel and the well-
           formedness constraints is incomplete. The problem is related to the composition relations
           in the metamodel. In the domain data DSL, every model element except the root model
           container have to be contained by another element. The composition relations generally
           follow the pattern shown in the left hand side of Figure 4. This metamodel fragment does
           not require that instances of Part are always contained by a Container. When EFinder is
           invoked with such a metamodel, sometimes models like the one shown on the right hand
           side in Figure 4 are created. In practice, such models cannot be created by the user because
           the used grammar or graphical editor do not allow it. However, tools like USE and
           EFinder use the metamodel and OCL constraints directly. The solution is either to modify
           the metamodel by making a bi-directional reference or to add a suitable OCL constraint;


Figure 4: Composition relation that admits objects without a container

          No distinction between errors and warnings. It is known that if an OCL constraint fails
           during validation an error is reported. In other words, OCL does not assign degrees of
           severity to failing constraints. In our case study there is a distinction between errors and
           warnings. In case of failure, some constraints produce validation errors and others just
           warnings. The check of the first kind is wrapped in an operation called asError, the second
           kind is wrapped in an operation asWarning which returns null in case of failure. In order
           to use EFinder we excluded these operations. We believe that the distinction between
           errors and warnings in OCL deserves more systematic attention. Furthermore, a more
           practical implementation should provide a choice to the user to include/exclude particular
           OCL constraints given in an OCL document before invoking EFinder;
          Support for multiple input models. EFinder supports multiple metamodels as input but the
           generated output is contained in a single model. In our case we need this single model to
           be split: one data model and one generator model. To handle this we implemented a post-
           processing step that splits the generated output to two models. Clearly, EFinder (and USE)
           solves the fundamental problem of finding a model that is a solution for given constraints.
           A practical solution further requires adding an adaptation layer that handles the case when
           the flat result needs to be decomposed;
   After solving the enumerated obstacles it was possible to use EFinder for generation of test models.
The classifying terms were created by the first author. From this perspective, it still remains unclear
how easy for the developers is to come up with the OCL formulation of the classifying terms. Although,
the engineers are generally well experienced with OCL, our anecdotal evidence shows that the
developers think in terms of small model patterns that have to be present in the input models, possibly
extended with OCL constraints. Potentially, the classifying terms can be synthesized from such more
visual representations. An interesting work in this direction is presented in [12][13].

4.2.    Using TractsTool and Tracts to Alleviate the Oracle Problem
    The TractsTool was used in the testing of the M2M transformation in order to eliminate the current
test oracle that uses model comparison. Instead, a tract was defined that specifies the desired properties
of the generated models as OCL expressions. This relieves the developers from the task to create a
complete reference model.
    Generally, we see a good potential in the tracts approach in alleviating the oracle problem. Of course,
the quality and the completeness of the tracts is crucial in this approach. A major concern is that the
tool is not actively developed anymore and still needs improvements before an industrial application.

5. Discussion and Future Work
    In this section we provide a reflection at a more global level by comparing the approaches and tools
found in the academic literature and the needs identified in the industry.
    We observe a degree of mismatch between the technological assumptions taken in the research tools
and the current status of industrial ecosystems. This is first of all evident at the level of used M2M
transformation languages. Significant amount of work is based on ATL whereas in practice QVTo is
often used because it is a standard. Furthermore, many practical model transformations are written in a
GPL like Xtend, Java or Python. In general, we need a better picture on how transformations are
implemented in practice. The work of Cabot et al. [14] is a promising step in this direction.
    We also observe insufficient awareness of the practitioners about the benefits of using tools like
USE and EFinder that are based on strong theoretical foundations (SAT and SMT solving). This is
probably due to their early phase of development and degree of immaturity. Raising such awareness in
developers should also lead to a more disciplined use of OCL and metamodeling in order to fully benefit
from these tools.
    Testing a M2T transformation that produces executable code is challenging and reducing it to the
problem of testing M2M transformation is often not sufficient in practice. A possible approach is to test
the entire generator (or transformation chain) as a black box especially when it is implemented in a
GPL. Techniques for test model generation like classifying terms are still applicable but they do not
alleviate the oracle problem. We plan to investigate the applicability of model-based testing when the
input language has dynamic semantics. Another interesting research direction is metamorphic testing
that was recently applied for M2M transformations [15] possibly extended with ideas from compiler
metamorphic testing. The main challenge here is the identification of a suitable metamorphic relation
that may vary per input DSL.
    We also consider investigating the benefits of using explicit transformation specifications which
would allow the application of tools like Pamomo.
    Final discussion point concerns the validity of our observations. Our conclusions are derived from
a single case study, however, we consider it rather representative since the reported challenges are
observed in other projects executed in Altran Netherlands. Still, the case study reflects the practices in
a single company with projects mainly from the area of high-tech embedded systems.

6. Conclusions
   We presented initial results from a case study of testing an industrial code generator that uses a
combination of M2M and M2T transformations. A number of challenges were identified and two
techniques proposed in the literature were applied: the classifying terms and tracts approaches. The case
study revealed that they have the potential to improve the current testing practices with respect to the
problems of input test generation and oracle function definition.
   We also observed that the used academic tools still need an additional engineering work in order to
bring them to the required quality for industrial application. However, the underlying fundamentals are
solid and useful. The presented analysis of the transformation process illustrates some of the industrial
needs in the area of model transformations and is a starting point for further research.
   In this paper the study was mostly focused on M2M transformations. The M2T transformation
component is still not sufficiently addressed. The required functional testing of the generated code
involves a lot of manual work mainly because of a missing reference point that specifies the intended
code behavior in a more abstract way.

7. References
   [1] Martin Gogola, Antonio Vallecillo, Loli Burgueno and Frank Hilken. “Employing classifying
        terms for testing model transformations”. ACM/IEEE 18th International Conference on Model
        Driven Engineering Languages and Systems (MODELS) (2015): 312-321
   [2] Martin Gogolla and Antonio Vallecillo. “Tractable model transformation testing”. European
        Conference on Modelling Foundations and Applications (2011): 221-235
   [3] Esther Guerra, Juan de Lara, Manuel Wimmer, Gerti Kappel, Angelika Kusel, Werner
        Retschitzegger, Johannes Schonbock and Wieland Schwinger. “Automated verification of
        model transformations based on visual contracts”. Automated Software Engineering (2015)
   [4] Esther Guerra and Mathias. “Specification-driven model transformation testing”. Software and
        Systems Modeling (2015): 623-644
   [5] Esther Guerra, Jesus Sanchez Cuadrado, and Juan de Lara. “Towards effective mutation testing
        for ATL”. ACM/IEEE 22nd International Conference on Model Driven Engineering Languages
        and Systems (MODELS)(2019): 78-88
   [6] Javier Troya, Alexander Bergmayr, Loli Burgueno, and Manuel Wimmer. “Towards systematic
        mutations for and with ATL model transformations”. IEEE 8th International Conference on
        Software Testing, Verification and Validation Workshops (ICSTW)(2015): 1- 10
    [7] Carlos A Gonzalez and Jordi Cabot. “Atltest: a white-box test generation approach for ATL
        transformations”. International Conference on Model Driven Engineering Languages and
        Systems (2012): 449-464
    [8] Carlos Gonzalez and Jordi Cabot. “Test data generation for model transformations combining
        partition and constraint analysis”. In International Conference on Theory and Practice of Model
        Transformations (2014): 25-41
   [9] Manuel Wimmer and Loli Burgueno. “Testing M2T/T2M transformations”. International
        Conference on Model Driven Engineering Languages and Systems (2013): 203-219
   [10] Martin Gogolla. “Model development in the Tool USE: Explorative, Consolidating and
        Analytic Steps for UML and OCL models”. ICDCIT (2021): 24-43
   [11] Jesus Sanchez Cuadrado and Martin Gogolla. “Model finding in the EMF Ecosystem”. Journal
        of Object Technology (2020) 19(2): 1-21
   [12] Kristof Marussy, Oszkar Semerath, Aren Babikian and Daniel Varro. “A Specification
        Language for Consistent Model Generation based on Partial Models”. Journal of Object
        Technology (2020), 19(3): 3, 1-22
   [13] Kristof Marussy, Oszkar Semerath and Daniel Varro, "Automated Generation of Consistent
        Graph Models with Multiplicity Reasoning," IEEE Transactions on Software Engineering, doi:
        10.1109/TSE.2020.3025732.
   [14] Loli Burgueno, Jordi Cabot, and Sebastien Gerard. “The Future of Model Transformation
        Languages: an Open Community Discussion”. Journal of Object Technology (2019), 18(3), 7:
        1-11
   [15] Javier Troya, Sergio Segura and Antonio Ruiz-Cortes. “Automated inference of likely
        metamorphic relations for model transformations”. Journal of Systems and Software, (2018),
        136: 188-208