Benchmarx Reloaded: A Practical Benchmark
           Framework for Bidirectional Transformations

            Anthony Anjorin                          Zinovy Diskin                      Frédéric Jouault
          Paderborn University,                  McMaster University,                    ESEO, France
                Germany                                 Canada                      frederic.jouault@eseo.fr
         anthony.anjorin@upb.de                  diskinz@mcmaster.ca

        Hsiang-Shang Ko                     Erhan Leblebici                          Bernhard Westfechtel
           NII, Japan                    TU Darmstadt, Germany                    Univ. of Bayreuth, Germany
     hsiang-shang@nii.ac.jp                  erhan.leblebici                         bernhard.westfechtel
                                          @es.tu-darmstadt.de                          @uni-bayreuth.de


                                                        Abstract
                       Bidirectional transformation (bx) approaches provide a systematic way
                       of specifying, restoring, and maintaining the consistency of related
                       models. The current diversity of bx approaches is certainly beneficial,
                       but it also poses challenges, especially when it comes to comparing the
                       different approaches and corresponding bx tools that implement them.
                       Although a benchmark for bx (referred to as a benchmarx ) has been
                       identified in the community as an important and currently still missing
                       contribution, only a rather abstract description and characterisation of
                       what a benchmarx should be has been published to date. In this paper,
                       therefore, we focus on providing a practical and pragmatic framework,
                       on which future concrete benchmarx can be built. To demonstrate its
                       feasibility, we present a first non-trivial benchmarx based on a well-
                       known example, and use it to compare and evaluate three bx tools,
                       chosen to cover the broad spectrum of bx approaches.


1    Introduction and Motivation
Bidirectional transformations (bx) are mechanisms for specifying and maintaining consistency between two (in
general multiple) related sources of information. Bx is currently an active research area as the challenge of
“consistency management” in software systems is still largely open. Inconsistencies arise as different actors
(tools, devices, or users) operate independently on shared information, and being able to maintain consistency
will become even more important with the increasing complexity of software systems and usage workflows.
   Bx approaches are quite diverse and are based on different foundations from various communities including
programming languages, constraint solving, and graph transformation. As a consequence, although sharing
common goals, bx tools differ in many aspects such as typical application scenarios, ways of specifying and
restoring consistency, traceability support, and performance (to name but a few).

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: R. Eramo, M. Johnson (eds.): Proceedings of the Sixth International Workshop on Bidirectional Transformations (Bx 2017),
Uppsala, Sweden, April 29, 2017, published at http://ceur-ws.org
   This diversity is certainly advantageous from a research community point of view as it provides a broad
perspective on bx, but it also makes discerning the exact capabilities and limitations of individual approaches
challenging. The need for a bx benchmark (referred to as a benchmarx in the following) has been discussed in
detail by Anjorin et al. [3], together with an abstract description of such a benchmarx. A benchmarx should
enable direct comparisons of bx tools based on a maintained set of examples and executable test cases. While
benchmarking is a common and well established practice for conducting evaluation and comparison in general,
a benchmarx exhibits additional characteristic properties and challenges that are unique to bx.
   Although Anjorin et al. [3] discuss what should be provided by a benchmarx in an attempt to encourage
concrete benchmarx efforts, there has been no practical realisation of a benchmarx until now. Providing a series
of benchmarx has been a recurring topic discussed at different bx events (e.g., [1, 5]) involving representatives
from all bx sub-communities. A bx example repository [4] has been established and is being actively maintained
(prevalently via examples introduced in bx-related talks and publications), but there is still no non-trivial exe-
cutable test suite for any of the examples that can be run, adapted, and interpreted for the numerous bx tools
that exist. We believe this is because many practical challenges have not been addressed, including how to
structure the test suite to simplify interpretation of test results for diverse bx tools with different capabilities.
   In this paper, we close this gap by providing a pragmatic and practical benchmarx framework that can be used
to establish further benchmarx implementations. Our contribution is thereby twofold: We propose a benchmarx
template based on an analysis of bx tool workflows and a classification according to supported features. This
classification into features is used to derive a concrete test suite structure that can be used not only to identify
a meaningful subset of tests for a given bx tool, but also to evaluate test coverage. Furthermore, we provide
an open-source reference implementation1 of our framework, together with a concrete benchmarx using a simple
but non-trivial example to evaluate and compare solutions with three very different bx tools.
   The rest of the paper is structured as follows: A well-known bx example is introduced in Sect. 2 and used con-
sequently throughout the paper to explain basic terminology, illustrate our benchmarx design, and demonstrate
its usage. The architecture of our benchmarx implementation is provided in Sect. 3. Solutions for our running
example (one functional programming-based, one rule-based, and one constraint-based) are presented in Sect. 4.
In Sect. 5, these solutions are integrated into, evaluated, and discussed with our benchmarx implementation to
demonstrate how to handle such a heterogeneous choice of bx tools. These results are a first demonstration of a
tool comparison conducted with our benchmarx implementation, and should serve as encouragement for (more
detailed) future tool comparisons. Finally, Sect. 6 concludes the paper and provides an outlook on future work.

2     Our Benchmarx Example: Families to Persons
To give concrete examples throughout the paper, we shall make consequent use of our chosen example for the
benchmarx: “Families to Persons”, a well-known demonstrative transformation in the model transformation
community. The example is part of the ATL2 transformation zoo3 and was created as part of the “Usine
Logicielle” project. The original authors are Freddy Allilaire, Frédéric Joault, and Jean Bézivin, all members of
the ATLAS research team at the time it was created (December 2006) and published (2007).
    Figure 1 depicts the two data structures involved, which we shall refer to in the following as the “source” and
“target” of the bidirectional transformation between them. The motivational story is that two administration
offices of some city wish to maintain their respective data in a consistent state. Both source and target offices
keep track of individuals registered in the city, but at different abstraction levels.
    As depicted in Fig. 1a as a metamodel, the source data structure has the core concept of a family (Family),
consisting of family members (FamilyMember). Both concepts are named entities, and family members play
different roles in a family. Note that the associations are all composites (denoted with the black diamonds),
implying that a family can only be in one family register, and that a family member can only be in one family
at a given point in time. Furthermore, due to the multiplicities 0..1, a family can have at most one mother and
at most one father. This is just a simplification, ruling out realistic situations where a man can be both a father
and son in different families, and a family can of course have two mothers or two fathers.
    In contrast, the target data structure (Fig. 1b) only has the primary concept of a person (Person), who can
be either male or female, but has a birthday in addition to a name. In the target domain, a person’s name is
formed by concatenating surname and first name, separated by a comma and space.
    1 Available from https://github.com/eMoflon/benchmarx
    2 The Atlas Transformation Language: https://www.eclipse.org/atl/
    3 Available from https://www.eclipse.org/atl/atlTransformations/#Families2Persons
                     FamilyRegister                                            PersonRegister


                       0 ..* families
                                                                                  0 ..* persons
                        Family
                                                                                    Person
                name : String
                                                                           name : String
                                                                           birthday : Date
            0 ..1 mother      0 ..* sons
                    0 ..1 father    0 ..* daughters

                   FamilyMember                                               Male           Female
                name : String
                    (a) A register of people                                 (b) A register of families

                                    Figure 1: Source and target data structures

    In summary, source and target models contain not only shared information on which they must agree (family
members are persons), but also information only contained in one of the models (birthdays, family roles).
    A families model is consistent with a persons model if a bijective mapping between family members and persons
can be established such that (i) mothers and daughters (fathers and sons) are paired with females (males), and
(ii) the name of every person p is “f.name, m.name”, where m is the member (in family f ) paired with p.
    This example, of which many variants exist including FamilyToPersons [2] in the bx example repository [4],
was chosen for the following reasons, which can be taken into account when searching for a new benchmarx
example: It is already well-known and well-accepted by a substantial community, indicating that the example is
accessible to a wide range of people. It is also simple and invokes an intuitive expectation of consistency (e.g.,
that there should be a bijection between all family members in the source and all persons in the target). It
can nonetheless be used to demonstrate a fair number of synchronisation challenges including information loss,
non-determinism, flexible alignment, and update policies. Finally, the data structures involved are simple trees,
simplifying its interpretation in other technological spaces.

3     Proposed Template for a Benchmarx
A major challenge in benchmarking bx tools is that different tools may use different input data. Addressing this
challenge requires a unifying design space, in which different tool architectures can be placed and respectively
classified. In Sect. 3.1 we shall define such a space and distinguish seven basic bx tool architectures according to
the type of their input data. In Section 3.2, we then define a respective design space of test cases, and discuss
how a tool’s architecture influences the interpretation of test results.

3.1     Design space of bx tools
To establish a common foundation, we first present a vocabulary of basic notions, constructs, and terms based
on previous work by Diskin et al. [7], which we shall use for discussing consistency restoration via change
propagation. Let us refer to the two domains to be kept synchronised as the source and the target model spaces.
Source models are denoted by A, A0 , etc., target models by B, B 0 , etc. Transformations from the source to the
target are called forward (fwd), from the target to the source backward (bwd).

3.1.1    Objects: models, deltas, and corrs
The objects that bx tools operate on are models and relationships/links between them. Relationships between
models from different model spaces are called correspondence links, or just corrs. We shall denote corrs by double
bidirectional arrows, e.g., R : A ⇔ B, and depict them in our diagrams horizontally. Normally, a corr is a set of
links r(a, b) between elements (a in A, b in B), which are compatible with the models’ structure.
   Relationships between models from the same model space are called deltas, denoted by directed arrows, e.g.,
∆: A → A0 , where we typically consider A0 to be an updated version of A. To visually distinguish corrs from
deltas, we shall depict deltas in our diagrams vertically. Deltas can be structural or operational.
   A structural delta (s-delta) is a collection of links between elements of the original and the updated model,
which is compatible with structure, e.g., is type preserving. In fact, an s-delta is a (usually simple) vertical corr
between models of the same space. We denote s-deltas by double unidirectional arrows, e.g., ∆: A ⇒ A0 .
   An operational delta (o-delta) is an operational specification δ of changes to be performed on a model to
update it. Examples of such operational deltas include edit logs, in-place transformations, or transformation
rules. A given o-delta δ can be applicable to some models but not to others. If it is applicable to a model A,
its application results in (i) a new model A0 denoted by δ@A (δ applied to A), and (ii) an s-delta ∆: A ⇒ A0
                                                =⇒
relating A and A0 , which we denote by δ@A. It is important to distinguish s-deltas from o-deltas as different
                                                                                       =⇒        =⇒
o-deltas may result in the same model and even the same s-delta, i.e., δ1 @A = δ2 @A (and δ1 @A = δ2 @A) for
δ1 6= δ2 . For instance, when the order in which updates are performed is important, then a tool must accept
                                                                    @
o-deltas as input. If o-delta δ is applicable to A, we write δ : A −→.
Example (E1). Our source model space consists of Family Registers, and the target space of Person Registers.
Consider a source model A consisting of a single family register with the Simpson family as a single family,
consisting of family members Homer (as father) and Marge (as mother), and a target model consisting of a single
person register with a single female person Simpson, Marge, and a single male person Simpson, Homer (with
their birthdays). This pair of source and target models can be related by a corr R : A ⇔ B including links
connecting the family register with the person register, Homer with Simpson, Homer, and Marge with Simpson,
Marge. As a source delta ∆: A → A0 , consider adding a new family member Bart as a son in the Simpson family
to produce a new source model A0 . This source delta can either be given as an s-delta ∆: A ⇒ A0 consisting of
links between elements in A and A0 including, e.g., a link connecting Homer in A with Homer in A0 . The source
                                             @
delta can also be given as an o-delta δ : A −→, e.g., some code creating a new family member Bart and adding
it as a son in the Simpson family. This code can be applied to A to produce δ@A = A0 .

3.1.2   Basic operations employed by bx tools
Our next goal is to classify bx tools based on their (expected) capabilities. This is directly related to the input
that a tool accepts and should exploit. Based on existing bx taxonomies [6, 8], as well as current bx tools under
active development,4 the following basic operations (visualised to the left of Fig. 2) can be used to provide a
schematic overview of different bx tool architectures:
Alignment (vAln, hAln). There are two alignment operations: vertical and horizontal alignment, denoted by
vAln and hAln, respectively.5 The former takes two models A, A0 from the same model space, and produces an
s-delta between them: vAln(A, A0 ): A ⇒ A0 . The latter takes two models A, B from different model spaces, and
produces a corr hAln(A, B) : A ⇔ B. There is also an operation ∗ of corr composition (re-alignment) that takes a
corr R : A ⇔ B and either a source s-delta ∆: A ⇒ A0 to produce a new corr ∆ ∗ R : A0 ⇔ B, or dually, a target
s-delta E: B ⇒ B 0 to produce R ∗ E : A ⇔ B 0 .
Consistency is modelled by a predicate (a Boolean-valued function) check on the set of all corrs. For any corr
R : A ⇔ B, either check(R) = 1, i.e., the corr R is consistent, or check(R) = 0, i.e., R is inconsistent.
Consistency Restoration (fCR, bCR). The operation of forward consistency restoration fCR, takes a corr
R : A0 ⇔ B and restores consistency by changing the target model B and keeping A0 intact. It thus produces an
s-delta fCRv(R) = E: B ⇒ B 0 , and a new consistent corr fCRh(R) = R0 : A0 ⇔ B 0 . Backward consistency restora-
tion bCR operates dually by updating the source model and keeping the target model intact, i.e., it produces
bCRv(R) = ∆: A ⇒ A0 , and a consistent corr bCRh(R) = R0 : A0 ⇔ B 0 .
Update Propagation (fUP, bUP). There are also two operations of o-delta propagation. Forward propagation
                                                                         @                                       @
fUP, takes a corr R : A ⇔ B and an A-applicable o-delta δ : A −→, and produces a B-applicable o-delta  : B −→.
Backward propagation bUP operates analogously with bUP(R, ) = δ.
Example (E2). Consider the source delta ∆: A → A0 from (E1), resulting in a new source model A0 with Bart
as a new family member. Given only A and A0 , vertical alignment vAln can be used to produce an s-delta
  4 http://bx-community.wikidot.com/relatedtools
  5 Remember that corrs and deltas are typically depicted horizontally and vertically, respectively.
vAln(A, A): A ⇒ A0 . This could be achieved by making certain (arguably) reasonable assumptions, e.g., that
families and family members with the same name in A and A0 are actually the same objects and should not
be deleted and re-created by the resulting s-delta. Given the source and target models A and B from (E1),
horizontal alignment hAln can be applied to establish a corr hAln(A, B) : A ⇔ B connecting them. This requires
further assumptions, e.g., that family names, names of family members in a family, and names of persons in the
person register are unique. These assumptions for vAln and hAln do not hold in general for our example.
    Consider now the s-delta ∆: A ⇒ A0 (adding Bart as a son) and the old corr R : A ⇔ B. The operation of
re-alignment ∗ can be applied to produce the corr R∗ = ∆ ∗ R : A0 ⇔ B. In this case, as nothing is deleted, the
links in R can simply be transferred to R∗ , using ∆ to identify the corresponding elements in A0 .
    The predicate check for our running example can be derived from the constraints stated in Sect. 2. For models
A and B from example (E1), it must be possible to extract the pairs (mH , pH ) and (mM , pM ) from a consistent
corr R : A ⇔ B, where mH , mM are the family members with names Homer and Marge, and pH , pM the persons
with names Simpson, Homer and Simpson, Marge, respectively. The new, re-aligned corr R∗ is not consistent
according to check as a bijection cannot be derived (there is no corresponding person for Bart in the person
register). To restore consistency, the operation of forward consistency restoration fCR can be used to produce
a target s-delta E: B ⇒ B 0 , and a new corr R0 : A0 ⇔ B 0 . In this case, this could be achieved by creating a
new person Simpson, Bart in the person register to produce the new target model B 0 , and adding a new link,
connecting the family member Bart with the person Simpson, Bart, to result in the new, now consistent corr
                                      @
R0 . Analogously, the o-delta δ : A −→ (update) adding Bart as a son to the Simpson family, can be forward
                                                            @
propagated to the appropriate o-delta fUP(R, δ) =  : B −→ B 0 , adding Simpson, Bart to the person register.

3.1.3   A zoo of bx tool architectures
Based on these basic concepts and operations, seven bx tool architectures are depicted to the right of Fig. 2. We
have restricted ourselves to realistic architectures for which we have found actual implemented bx tools (outlined
in blue, red, and green), or for which such architectures are at least suggested in the literature (often with
prototypical implementations). Various additional hybrids and variants are of course possible including, e.g., a
state-based tool that also takes A and performs both vertical and horizontal alignment. For each entry in the
table, the row and column represent the input that the tool accepts. For presentation purposes, we assume that
both models A and A0 (B) can be extracted from δ: A → A0 (R : A ⇔ B).


                                              []                           B                  A          B
                                             BATCH                     STATE-BASED             STATE/CORR-BASED
                                     in: A'                     in: A',B                  in: A',R:A<=>B
                                                                computation:              computation:
                                A'   computation:
                                       R = A'<=>[]                R = hAln(A',B):A'<=>B     D = vAln(A,A'):A=>A'
                  hAln
                                       E = fCRv(R):[]=>B'         E = fCRv(R):B=>B'         R* = D*R:A'<=>B
                                     out:B'                     out:B'                      R' = fCRh(R*):A'<=>B'
          A               B                                                               out:R'
                                                                  S-DELTA/STATE-BASED        S-DELTA/CORR-BASED
              *                 A                                in: D:A=>A',B            in: D:A=>A',R:A<=>B
                                                                 computation:             computation:
                                                                   R = hAln(A,B):A<=>B      R* = D*R:A'<=>B
 vAln
                                                                   R* = D*R:A'<=>B          E = fCRv(R*):B=>B'
                                A'                                 E = fCRv(R*):B=>B'       R' = fCRh(R*):A'<=>B'
                    fCR                                                                   out:E,R'
                                                                 out: E
                                                                  O-DELTA/STATE-BASED        O-DELTA/CORR-BASED
         A'               B'    A                                in: d:A-@>,B               in: d:A-@>,R:A<=>B
                                                                 computation:               computation:
                               @                                   R = hAln(A,B):A<=>B        e = fUP(d,R)
                                                                   e = fUP(d,R):B-@>          R' = hAln(d@A,e@B)
                                                                 out: e                     out: e,R'


               Figure 2: Basic operations (left) and input-based classification (right) of bx tools
    A batch bx tool takes as input only the source model A0 (the “empty” model is represented as []) and
produces as output B 0 . The involved computational tasks indicate that batch tools cannot exhibit good or
efficient synchronisation behaviour in general: A batch tool can only assume trivial alignment of A0 with the
empty model [], and is forced to reconstruct B 0 from scratch.
    State-based tools take both the new source model A0 and the old target model B as input, producing an
updated target model B 0 as output. The required computations now make more sense, as horizontal alignment
can be applied, and B used as a starting point from which to produce a consistent B 0 . This means that a
state-based tool can handle information loss in the target domain.
    To avoid horizontal alignment, state/corr-based tools take A0 and a corr R : A ⇔ B. Such tools can handle
cases where it is impossible to compute hAln based on conventions.
    Similarly, if conventions can be used to compute horizontal alignment, but vertical alignment is problematic,
s-delta/state-based and o-delta/state-based tools can be used.
    Tools that are s-delta/corr-based are able to exploit available correspondence information in both horizontal
and vertical dimensions (Fig. 2). In such tools, alignment is completely replaced with re-alignment.
    Finally, an o-delta/corr-based tool takes an o-delta directly as input and can thus handle cases where the
order of changes is important.
    Taken from the domain of (software) product line engineering, a feature model [9] consisting of features and
constraints over features can be used to succinctly capture the space of valid variants or products.
    Figure 3 depicts a feature model for bx tool architectures. Every bx tool architecture must be a valid “product”
of the product line described by this feature model. The nodes of the tree are the “features” that a given bx tool
architecture can possess. Features can be optional or mandatory, children features can only be chosen together
with their parent feature, and children of the same parent can be either exclusive or non-exclusive alternatives.
Features are abstract (grey fill) if they can be implied from the set of concrete features (white fill) in a given
product. The feature model depicted in Fig. 3 yields exactly the seven bx tool architectures described in Fig. 2
in terms of involved computation steps.

                                                                                              Legend
                               bx tool architecture
                                                                                                or
                                                                                                alternative
                                                    incremental                                 (xor)
                     batch
                                                (alignment-based)
                                                                                                concrete
                                                                                                feature
                                                                                                abstract
                                      state-based              delta/corr-based
                                                                                                feature
                                 (alignment computed         (external alignment is
                                       internally)            required and used)


                                                       delta-based               corr-based


                                               operational     structural


                             Figure 3: Bx tool architecture variability as a feature model
   To further illustrate the architecture of the bx tools most relevant for this paper, workflow diagrams for
state-based, state/corr-based, and s-delta/corr-based bx tools are depicted in Fig. 4, Fig. 5, and Fig. 6. Apart
from visualising the computation flow, an update policy is indicated as additional input. The task of consistency
restoration typically involves decisions: think of setting Simpson, Bart’s birthday in the forwards direction, or
deciding if a female person corresponds to a mother or daughter in the backwards direction. An update policy
is used to handle these choices, either at design time (e.g., hard-coded preferences and conventions concerning
default values), or at runtime (e.g., user interaction). In the workflow diagrams, the symbol ⊗ is used to either
suppress output, or restrict it to only a part of what is available (e.g., only B 0 from ∆: B ⇒ B 0 ).
Example (E3). In our example, explicit corrs are required as input to handle non-unique person names. Explicit
deltas are required as input in our example to handle, e.g., renaming vs. create/delete operations. In both
these cases, computing alignment internally based on conventions cannot work in general. As an example of
how the order of updates, i.e., taking o-deltas directly as input, can be important, consider the fixed update
policy of preferring to creating mothers to daughters in the backwards transformation. As families can have only
one mother, the order in which two females with the same family name are added to the person register, say
Simpson, Marge and Simpson, Maggie, is important to avoid creating Maggie as mother and Marge as daughter
of a new family in the register.
                                                                             Update Policy
                                                                         (at design or runtime)

                   Direction                                                                                         E: B   B'       B'
                                                                                                                            X
                          A'                                                           Consistency Restoration
                                          Horizontal Alignment        R: A'       B             (fCR)                R': A' B'
                          B                                                                                                 X
                                                 (hAln)


                                               Figure 4: Forward workflow for state-based bx tools

                                                                                                Update Policy
                                                                                            (at design or runtime)

 Direction
                                                                                                                                   E: B    B'
                                                                                                                                           X
       A'                                                                                           Consistency Restoration
                                    Vertical Alignment D: A   A'
                                                                                                             (fCR)
                                                                                                                                                R': A'   B'
R: A        B             A               (vAln)                   Re-Alignment R*: A' B
                     X
                                                                        ( )

                                            Figure 5: Forward workflow for state/corr-based bx tools

                                                                           Update Policy
                                                                       (at design or runtime)

                    Direction                                                                                               E: B      B'

                   D: A        A'                                                     Consistency Restoration
                                              Re-Alignment          R*: A'    B                (fCR)                        R': A'    B'
                   R: A        B
                                                   ( )


                                           Figure 6: Forward workflow for s-delta/corr-based bx tools

3.2         Design space of test cases
The current6 test suite for the FamiliesToPersons benchmarx consists of 52 test cases, and will be constantly
revised as further implementations are established or extended. As such a test suite grows in size, adding test
cases systematically is important to avoid redundancy and to have some measure for the current coverage and
which types of tests are still missing. Based on our classification of bx tool architectures, this section identifies a
respective design space for actual test cases, providing a feature-based classification of benchmarx test cases in
Sect. 3.2.1, a basic test case structure in Sect. 3.2.2, and a classification of test results in Sect. 3.2.3.

3.2.1           Feature-based classification of test cases
As an extension of the feature model for bx tool architectures from Sect. 3.1.3, Fig. 7 depicts a feature model for
benchmarx test cases. Every benchmarx test case must state the required bx tool architecture (cf. Fig. 3), its
direction to be one of fwd (forward), bwd (backward), or round trip (a mixture), the combination of different
change types applied in the test, and the required update policy to successfully pass the test. The set of
possible change types, currently del (deletion), add (addition), and attribute (attribute manipulation) can be
extended in the future to accommodate more expressive frameworks. Note that a test case can require an update
policy that is a mixture of fixed, i.e., design time preferences and conventions, and runtime configuration.

3.2.2           Test cases as synchronisation dialogues
Although it would be certainly interesting to directly assert the resulting corr and output delta of a synchroni-
sation run, our investigations with bx tools have revealed that these data structures are typically tool-specific
  6 As of February 28, 2017.
                                                  benchmarx
                                                   test case                                                          Legend
                                                                                                                         optional
                                                                                                                         mandatory
                                                                                                                         or
  bx tool architecture            direction                                                    update policy
                                                                  change type                                            alternative
                                                                                                                         (xor)
                                                                                                                         concrete
                                                                                                                         feature
                                                                                                                         abstract
                                                                                                                         feature
                         fwd        bwd       round trip    del       add       attribute   runtime       fixed


                                        Figure 7: Test case variability as a feature model

and complicate the task of test case specification. To cope with this in a pragmatic manner, we propose to
design each test case as a synchronisation dialogue, always starting from the same agreed upon consistent state,
from which a sequence of deltas are propagated. Only the resulting output models are to be directly asserted by
comparing them with expected versions.
   A benchmarx test case is depicted schematically to the left of Fig. 8, with a concrete test case for our example
to the right, following the proposed structure and representing an instantiation with JavaDoc, Java, and JUnit.
         Documentation:                       /** Test for changing a person's full name (where another person with
          What is being tested?                * the same name exists).
          Expected behaviour?                  * Expected: first name of the corresponding member and their family
   11                                          * name must be changed. As no fitting family exists, a new family must be
         Classification:                       * created and the member moved to this new family (as the father of this family).
          List of features of the test case    * Features: bwd, attribute, corr-based, structural, runtime */
                                              @Test
                                              public void testFullNameChangeOfNonUniquePerson() {
         TEST CASE NAME/ID:
              Initiate Synchronisation            tool.initiateSynchronisationDialogue();
   2
   2                   Dialogue
                                                  util.configure().makeDecision(Decisions.PREFER_CREATING_PARENT_TO_CHILD, true)
             Synchronisation Dialogue                             .makeDecision(Decisions.PREFER_EXISTING_FAMILY_TO_NEW, true);
   3         to Establish Precondition            tool.performAndPropagateTargetEdit(helperPerson::createSimpsonsWithTwoBarts);
                                                  ...
   34           Assert Precondition               util.assertPrecondition("Pre_MemberNameChangeOther", "Pre_PersonNameChangeOther");


            Synchronisation Dialogue
   4        to Establish Postcondition            //----------------
                                                  util.configure().makeDecision(Decisions.PREFER_CREATING_PARENT_TO_CHILD, true);
   5
   5
              Runtime Configuration               tool.performAndPropagateTargetEdit(helperPerson::fullNameChangeOfOtherBart);
                                                  //----------------
               Update Propagation
   6

   6           Assert Postcondition               util.assertPostcondition("MemberFullNameChangeOther","PersonFullNameChangeOther");
                                              }


                               Figure 8: A benchmarx test case as a synchronisation dialogue

   Every test case should be documented (cf. Label 1 in Figure 8) stating (i) what is being tested, (ii) expected
behaviour, and (iii) a list of the concrete features of the test case taken from Fig. 7 to clarify at a glance if a
given bx tool can pass the test or not.
   Every test starts with an initialisation command invoked on the bx tool under test (Label 2), giving it the
chance to establish the agreed upon starting point (e.g., for the FamiliesToPerson benchmarx this comprises a
single empty family register and corresponding single empty persons register), and create all necessary internal
auxiliary data structures.
   The next part of a test case (Label 3) is a series of propagation steps, used to establish the precondition of
the test. Although this creates a dependency to other tests (asserting exactly this precondition), this simplifies
handling the required correspondence model, as the bx tool can build up the necessary internal state that must
go along with the precondition. This means that the old consistent corr is “passed” implicitly to the bx tool via a
series of preparatory propagation steps. The precondition is finally asserted (Label 4), representing a well-defined
starting point. If the test requires a runtime update policy, this is configured just before propagating the actual
input delta (Label 5). The last part of a test case (Label 6) is an assertion of the postcondition, checking if the
final source and target models are as expected.
   In the concrete exemplary test case, a number of persons are created in the person register and then backward
propagated to establish a consistent family register that is asserted as a precondition. As part of the actual test,
a person named Simpson, Bart is now renamed in the person register; this change is backward propagated with
the update policy to prefer creating parents (if possible) to creating children. The interested reader is referred
to the actual test cases7 for all further details.

3.2.3     Interpretation of test results: Failure, limitation, or (un)expected pass
Due to the heterogeneity of bx tools, it is important to be able to easily and quickly distinguish between:

    • A test that fails because it requires features that the tool does not support, referred to as a limitation.
    • A test that fails even though the tool should pass it (based on its classification), referred to as a failure.

  Limitations confirm the tool’s set of (un)supported features, while failures indicate potential bugs in (the
implementation of the benchmarx with) the bx tool. Similarly, one should clearly distinguish between:

    • A test that the tool should (based on its classification) pass and passes, referred to as an expected pass.
    • A test that the tool passes even though this is unexpected in general, referred to as an unexpected pass.

   Expected passes confirm that the tool is correctly classified and behaves as expected, while unexpected passes
indicate that the test case can either be improved, as it is unable to reveal the missing features of the tool, or
that the bx tool has not been classified correctly.

4      The FamiliesToPersons (F2P) Benchmarx: Version 1.0
In this section, we defend our choice of three bx tools, present three implementations of the (F 2P ) benchmarx
with these tools, and discuss initial results and insights.
   To demonstrate how to integrate diverse bx tools, we have chosen three bx tools with the aim of covering
all the primary groups of BX approaches: BiGUL,8 eMoflon,9 and medini QVT10 . Bx approaches can be very
broadly classified into:
Functional programming (FP) based approaches: The basic idea here is to program a single function
   (forward or backward) that can be used to infer the other function (backward or forward) such that the
   pair obeys a set of round tripping laws. Referring to forward propagation as get and backward propagation
   as put, as is usual in an asymmetric setting, this idea can be realised either by using get to infer put (get-
   based approaches) or by using put to infer get (putback-based approaches). A final strategy is to provide
   a library of pairs of (get, put) that already obey the round tripping laws, and use a set of combinators to
   allow composition (combinator-based approaches). In all cases, the underlying consistency relation is not
   explicitly specified. We have chosen BiGUL as a representative of this group of bx approaches.
Grammar-based approaches: While FP-based approaches allow for fine granular control of all details of
   consistency restoration, grammar-based approaches provide a high-level, rule-based language, with which
   the language of all consistent pairs of source and target models can be generated. Such a specification is
   under-specified as the consistency relation (which equates here to language membership) is fully specified,
   but not all details of consistency restoration are fixed. Although this technique is not restricted to graphs
   and can be transferred to strings, lists, or trees, Triple Graph Grammars (TGGs) [10] are the primary
   representative of this group, and have been implemented by various tools. We have chosen eMoflon as a
   representative of this group.
    7 Available from https://github.com/eMoflon/benchmarx
    8 www.prg.nii.ac.jp/project/bigul/
    9 www.emoflon.org
    10 http://projects.ikv.de/qvt
Constraint-based approaches: Even more high-level than grammar-based approaches, constraint-based ap-
   proaches leave all details of consistency restoration open and only require a specification of the consistency
   relation. Consistency restoration is often realised via a constraint solver, guided via a global objective
   function characterising “preferred” strategies. We have chosen medini QVT11 as a constraint-based bx tool.
Besides the fundamental differences regarding the underlying bx approach, the tools also differ with respect to
their architectures: We decided to use BiGUL to implement a state-based solution (Fig. 4) with a design-time
update policy. In contrast, we used the TGG tool eMoflon to implement an s-delta/corr-based solution (Fig. 6)
with support for a runtime update policy, while medini QVT was used to implement a state/corr-based solution
(Fig. 5) with a design-time update policy.
   While other cases can possibly be supported by the same tools, e.g., by encoding explicit delta and cor-
respondence information in “states” for BiGUL, or calculating the “best fitting” correspondence model and
implementing an additional delta discovery procedure for eMoflon, the choices above represent the “standard”
case for these tools, i.e., the sweet spot for the tool for which minimal effort is required. We now give an overview
of the three solutions to the F2P benchmarx in the following.

4.1   A solution to the F2P benchmarx with BiGUL
A BiGUL program describes how consistency is established for two models having asymmetric information,
in the sense that one model contains more information than the other. When programming in BiGUL, the
programmer assumes that two (possibly inconsistent) models are given, and describes how to put all data in the
less informative model into proper places in the more informative model, thereby producing an updated version
of the latter model that is consistent with the former model.
   For symmetric scenarios such as the F2P benchmarx, symmetric consistency restoration can be achieved by
combining two asymmetric consistency restorers that synchronise the two models with a common intermediate
model. This intermediate model can either incorporate all the information of both models, or only contain
the information shared by both models. Our BiGUL implementation of the F2P benchmarx adopts the latter
approach: A family register is consistent with a person register exactly when the two registers are consistent with
the same intermediate model, which is a collection of shared people (names and genders) with the Haskell type
MediumR = [((String, String), Bool)]. The overall structure of the BiGUL implementation can thus be
thought of as two programs putting MediumR into FamilyRegister (Figure 1a) and PersonRegister (Figure 1b)
respectively. If one register is changed, say the family register, the collection of people it represents will be
computed and put into the person register; the updated person register will represent the same collection of
people, thus re-establishing consistency.
   To better organise the implementation, we introduce another intermediate model MediumL, whose structure
is closer to FamilyRegister: A model of type MediumL = [(String, [(String, Bool)])] is a collection of
families, each of which consists of a family name and a list of first names and genders of its members. A more
accurate depiction of the structure of the implementation is thus:
                                          syncL             syncM             syncR
                     FamilyRegister               MediumL           MediumR           PersonRegister

where the arrows indicate the directions of the constituent put programs. Besides straightforward format con-
version (e.g., converting True and False to Male and Female), each of the three put programs is in charge
of a part of the consistency restoration policy: (i) syncR aligns a collection of people with a person register,
and describes how to update (retain, create, or delete) the birthdays in the person register; (ii) syncM aligns
a collection of people with a collection of families, and describes how families are created or deleted, and how
people are assigned to families; (iii) syncL describes how members in each family are assigned their roles.
   As an example of a BiGUL specification, the code for syncM is depicted in Listing 1. The three “normal”
branches deal with the cases where the collections of people and families are consistent, while the last “adaptive”
branch describes how the collection of families can be updated to become consistent with the collection of people,
applying the fixed update policy of matching each person with an entry with the same name and gender in an
existing family, or creating a new entry in the first family with matching family name. If no matching family
can be found, a new family is created.
  11 Even though “QVT” is in its name, we regard medini QVT as simply a constraint-based bx tool that just happens to have a

QVT-R inspired textual concrete syntax (like many others such as Echo: http://haslab.github.io/echo/ or JTL: http://jtl.di.
univaq.it). We do not regard medini QVT as an implementation of the QVT-R standard, as its semantics diverge significantly.
syncM :: BiGUL MediumL MediumR
syncM = Case
  [ $ ( normalSV [ p | [] |] [ p | [] |] [ p | [] |])
    = = > $ ( update [ p | _ |] [ p | [] |] [ d | |])
  , $ ( normalSV [ p | (( familyName , []): _ ) |] [ p | _ |] [ p | (_ , []): _ |])
    = = > $ ( update [ p | _ : rest |] [ p | rest |] [ d | rest = syncM |])
  , $ ( normal [| \(( familyName , ( firstName , gender ): _ ): _ ) vs ->
                         (( familyName , firstName ) , gender ) ‘ elem ‘ vs |]
                  [ p | (_ , _ : _ ): _ |])
    = = > $ ( rearrS [| \(( familyName , ( firstName , gender ): ns ): ss ) ->
                               ((( familyName , firstName ) , gender ) , ( familyName , ns ): ss ) |]) $
              ( Replace ‘ Prod ‘ syncM ) ‘ Compose ‘ extract
  , $ ( adaptive [| \ _ _ -> otherwise |])
    = = > adapt
  ]
  where
    extract :: ( Ord a , Show a ) = > BiGUL (a , [ a ]) [ a ]
    extract = Case
        [ $ ( normal [| \( s , _ ) ( v : _ ) -> v == s |] [| \( s , ss ) -> null ss || head ss >= s |])
          = = > $ ( update [ p | (x , xs ) |] [ p | x : xs |] [ d | x = Replace ; xs = Replace |])
        , $ ( normal [| \( s , _ : _ ) ( v : _ ) -> v < s |] [| \( s , s ’: _ ) -> s ’ < s |])
          = = > $ ( rearrS [| \( s , s ’: ss ) -> (s ’ , (s , ss )) |]) $
                    $ ( rearrV [| \( v : vs ) -> (v , vs ) |]) $
                        Replace ‘ Prod ‘ extract
        , $ ( adaptive [| \( s , []) ( v : _ ) -> v < s |])
          = = > \( s , _ ) _ -> (s , [ undefined ])
        ]
    adapt :: MediumL -> MediumR -> MediumL
    adapt [] vs = map (( fst . fst . head ) &&& map ( snd *** id ))
                               ( groupBy ((==) ‘on ‘ ( fst . fst )) vs )
    adapt (( familyName , ns ): ss ) vs =
        let ns ’ :: [( String , Bool )]
              ns ’ = concat ( map snd ( filter ((== familyName ) . fst ) ss ))
              vs ’ :: [( String , Bool )]
              vs ’ = map ( snd *** id ) ( filter ((== familyName ) . fst . fst ) vs ) \\ ( ns ’ \\ ns )
        in ( familyName , vs ’) : adapt ss ( vs \\ ( map (( familyName ,) *** id ) vs ’))
                               Listing 1: An excerpt from the solution in BiGUL

4.2   A solution to the F2P benchmarx with eMoflon
The recommended way of “thinking in TGGs” is to describe how two consistent models are to evolve simul-
taneously, independently of how the TGG is to be ultimately operationalised, i.e., used to derive a directed
(forward/backward) propagation of changes in one model to the other. Considering our running example, an
obvious step in a simultaneous construction is the creation of a family member (father, mother, daughter, or
son) in the family register, and the corresponding creation of a person (male or female) in the person register,
indicating that four different cases must be handled at the minimum. Another orthogonal dimension is whether
a family member is created in an existing family or in a new family, i.e., a new family could be created with
its very first member. There are thus eight (4x2) different cases required to handle the construction of family
members and corresponding persons simultaneously.
   An excerpt from the TGG rules designed with this understanding of simultaneous construction in mind (and
specified with eMoflon) is depicted in Figure 9. We omit the trivial construction step of required root containers
(creating an empty family register together with an empty person register), and focus on the different cases for
creating family members and corresponding persons. The TGG rules in Figure 9 are depicted using eMoflon’s
textual syntax (eMoflon also provides a read-only visualisation in the common visual syntax of graph grammars).
In all rules, created elements are green and provided with a ++-markup, while context elements (which must be
already present in order to apply a rule) are black.
   To the left, an abstract rule FamilyMemberToPerson (note the keyword #abstract) is given which creates a
family, its first family member, a person, and a correspondence link between the family member and the person.
Two decisions are left open: the exact role of the family member in the family, and the concrete type (male or
female) of the person. Two rules in the middle (MotherToFemale and FatherToMale) refine this abstract rule
(note the #extends keyword) and clarify these open decisions for mothers and fathers (analogous rules creating
sons and daughters are present but not explicitly depicted).
   In the right part of Figure 9, two further rules (MotherOfExistingFamilyToFemale and
FatherOfExistingFamilyToMale) refine the previous ones by requiring the family as existent context
(as opposed to creating it). In both rules, an additional Negative Application Condition (NAC), depicted blue
and with a !-markup, forbids the existence of a mother or father (depending on what is created) in the family,
in order to prevent generating families with multiple mothers or fathers. Analogous rules creating sons and
daughters in existing families are again present but not explicitly depicted. These rules for sons and daughters
do not, however, have the NAC as multiple sons and daughters are allowed in families.


                                            #rule MotherToFemale
 #abstract #rule FamilyMember2Person        #extends
 #source {                                                                #rule MotherOfExistingFamilyToFemale
                                            FamilyMember2Person
    fr : FamilyRegister {                                                 #extends MotherToFemale
       ++ -families->f                      #source {
    }                                                                     #source {
                                               ++ f : Family {
    ++ f : Family                                                             fr:FamilyRegister {
                                                  ++ -mother->fm
    ++ fm : FamilyMember                                                         -families -> f
                                               }
 }                                                                            }
                                                                              f:Family {
                                                ++ fm : FamilyMember
 #target {                                                                       -mother -> existingMother
                                            }
    pr : PersonRegister {                                                     }
       ++ -persons->p                                                         !existingMother:FamilyMember
                                            #target {
    }                                                                     }
                                               ++ p : Female
    ++ p : Person                           }
 }

 #correspondence {
    fr2pr : FamiliesToPersonsCorr {         #rule FatherToMale
       #src->fr                             #extends                      #rule FatherOfExistingFamilyToMale
       #trg->pr                             FamilyMember2Person           #extends FatherToMale
    }
    ++ fm2p: FamilyMemberToPerson {         #source {                     #source {
       #src->fm                                ++ f : Family {                fr:FamilyRegister {
       #trg->p                                    ++-father->fm                 -families -> f
    }                                          }                              }
 }                                                                            f : Family {
                                                ++ fm : FamilyMember            -father->existingFather
 #attributeConditions {                     }                                 }
    concat(", ", f.name,                                                      !existingFather : FamilyMember
           fm.name, p.name)                 #target {                     }
 }                                             ++ p : Male
                                            }


                   Figure 9: An excerpt from the eMoflon solution used for the benchmarx.


4.3   A solution to the F2P benchmarx with medini QVT

Medini QVT follows a constraint-based approach to bx. A transformation is defined in terms of a set of relations
which must hold between the participating models, and is executed in the direction of one of them. Consistency
restoration ensures that for each relation and each instance of a source domain (source pattern), there is a
corresponding instance of the target domain such that all constraints on the domain instances are satisfied; the
same must hold in the opposite direction.
   An excerpt of the solution to the F2P benchmarx with medini QVT is depicted in Listing 2. It involves
five relations: one relation between the root elements, and one relation for each role of a family member. In
the forward direction, each family member is mapped to a person of the same gender with the same full name.
Without further provisions, each person would be transformed twice in the backward direction: once as a parent
and once as a child. As this violates the consistency constraint that demands a bijective mapping between
family members and persons, the application of one of these relations is suppressed in the backward direction by
querying the binding state of some variable in the target domain. As the when clause is executed before the target
domain is instantiated in the forward direction, the variables in the target domain will still be unbound when
the transformation is executed and the condition female.oclIsUndefined() will thus only then be satisfied.
tran sformati on f a m i l i e s 2 p e r s o n s ( famDB : Families , perDB : Persons ) {
     top relation F a m i l y R e g i s t e r 2 P e r s o n R e g i s t e r {
         enforce domain famDB familyR egister : Families :: Fami lyRegist er {};
         enforce domain perDB personR egister : Persons :: Person Register {};
         where {
              Father2Male ( familyRegister , pers onRegist er );
              Son2Male ( familyRegister , p ersonReg ister );
              Mother2Female ( familyRegister , per sonRegi ster );
              D au gh te r 2F em al e ( familyRegister , person Register );
         }
     }
     relation Mother2Female {
         familyName , firstName , fullName : String ;
         enforce domain famDB familyR egister : Families :: Fami lyRegist er {
              families = family : Families :: Family {
                    name = familyName ,
                    mother = mother : Families :: FamilyMember { name = firstName }
              }
         };
         enforce domain perDB personR egister : Persons :: Person Register {
              persons = female : Persons :: Female { name = fullName }
         };
         where {
              fullName = familyName + ’, ’ + firstName ;
              firstName = firstName ( fullName );
              familyName = familyName ( fullName );
         }
     }
     relation D au g ht er 2 Fe ma le {
         ... -- Like ’ Mother2Female ’ , replace ’ mother ’ with ’ daughters ’
         when {
              female . oclIsUn defined ();
         } -- P r e v e n t s a p p l i c a t i o n in b a c k w a r d d i r e c t i o n
         ... -- See ’ Mother2Female ’
     }
     ...
}
                                    Listing 2: An excerpt from the solution in medini QVT

5      A first tool comparison based on the F2P benchmarx
We compare in the following BiGUL, eMoflon, and medini QVT based on the F2P benchmarx. Our goal is to
impart a first impression of what can be achieved with our proposed benchmarx infrastructure and its reference
implementation, and to inspire future tool comparisons.
  We chose to investigate the following research questions (referred to as RQ from now on) with our tool
comparison, covering the different capabilities of bx tools (RQ-1), performance (RQ-2.a,b,c), and required
implementation effort (RQ-3):

       RQ-1: Which subset of bx test cases (cf. Section 3) can be handled by each of the three tools?
       RQ-2.a: How do the tools scale in an initial batch transformation from scratch?
       RQ-2.b: Does propagation time scale with the size of the update or with the size of all models?
       RQ-2.c: Do the tools exhibit symmetric behaviour in runtime in both forward and backward directions?
       RQ-3: How do the tools differ with respect to the size of the required transformation specification?

To investigate RQ-1, we executed 52 tests differing in their features such as direction, change type, and required
update policy.12 Figure 10 depicts a representative excerpt showing the results for eight test cases. As discussed
in Sect. 3.2, we distinguish between expected passes, failures, unexpected passes, and limitations for each result
with a given bx tool.
   Our test results show that all tools are able to propagate simple updates such as creation of new families (#1
in Figure 10) or birthday changes of persons (#2) which should not trigger any changes in families. Renaming
family members (#3) is propagated by all tools as renaming of persons (as the test requires). BiGUL, supporting
    12 The entire set of test cases and the results are available as part of our reference implementation (cf. Section 1).
                                Update    Change                                                                 medini
              #     Direction                               Test Case Description           BiGUL     eMoflon
                                Policy     Type                                                                   QVT

              1       fwd        fixed       add                create new families         PASS        PASS     PASS

                                                              change birthdays
              2     roundtrip    fixed    attribute                                         PASS        PASS     PASS
                                                           (nothing should happen)

              3       fwd        fixed    attribute        rename family members            pass        PASS     PASS

                                                      rename person who should start a
              4     roundtrip   runtime   attribute    new family (altough a family with     fail       PASS      fail
                                                           the same name exists)

                                                       delete family members whose
              5       fwd        fixed       del                                            pass        PASS     PASS
                                                      names are not unique in persons

                                                       create a male for whom a father
              6     roundtrip   runtime      add                                             fail       PASS      fail
                                                                   exists

              7     roundtrip    fixed       del                  delete person             PASS        PASS      FAIL

                                                       delete person who was the first
              8     roundtrip    fixed       del                                            PASS        FAIL      FAIL
                                                            member in the family

                      PASS : expected pass     FAIL : failure      pass : unexpected pass    fail : limitation

                                   Figure 10: An excerpt from the F2P test results

neither deltas nor corrs, cannot necessarily be expected to pass this test, i.e., a completely new person could be
created with the new name. We therefore identify this result as an unexpected pass. In a similar test where
renaming a person should start a new family (#4), eMoflon passes the test while medini QVT and BiGUL fail
as expected (limitation) due to a missing mechanism to configure the required update policy. When deleting
a family member whose name is not unique (#5), all tools pass the test (by deleting the respective person).
BiGUL could potentially fail (unexpected pass) due to missing corrs, which are necessary to determine the exact
mapping of persons to the non-unique family members. Creating a male person who should become a son (#6)
is only propagated by eMoflon as required, while medini QVT and BiGUL fail, again due to a non-configurable
update policy. Deleting a unique person (#7) is propagated by eMoflon and BiGUL correctly (by deleting the
respective family member) whereas medini QVT fails in this test by performing a backward transformation from
scratch and thus creating new families without sons or daughters. Finally, eMoflon also fails in a similar test if
the deleted person was the first member of their family (#8), due to a suboptimal strategy for handling deletion
updates. This result is thus a failure for eMoflon.
   From all 52 test cases, eMoflon has 51 expected passes and one failure, medini QVT has 11 expected passes, 15
unexpected passes, four failures, and 22 limitations, and finally, BiGUL has 12 expected passes, 19 unexpected
passes, and 21 limitations. We thus conclude the following for RQ-1: Assuming a bx scenario where consistency
specification is feasible with all three tools, they still differ in the set of their supported bx test cases. eMoflon
strives to cover all cases with its configurable update policy as well as its s-delta and corr-based architecture.
It can, however, still fail in some cases, by deleting more elements than absolutely necessary for consistency
restoration. The BiGUL implementation supports a rather small subset of the tests for which configuration of
the update policy, explicit deltas, and correspondences are not necessary. Nevertheless, BiGUL does not fail in
any test case where it is expected to pass, indicating that it allows adequate control of the consistency restoration
process. Medini QVT addresses a similar subset of the tests as BiGUL, additionally supporting correspondences,
but often fails for round-tripping tests as it always (re-)transforms all persons from scratch to families without
sons or daughters. As a final remark, the numerous unexpected passes indicate that more non-trivial tests are
necessary to better reveal the limitations of the tools.
   To investigate RQ-2.a,b,c, we generated source models of increasing size (consisting of uniquely named
families with three children), performed a forward transformation from scratch and subsequently propagated
an update adding a new daughter to one of the families. The same experiment was conducted analogously in
the backward direction. The plots in Figure 11 depict our measurement results. The y-axis shows the median
runtime (in seconds) of five repetitions executed on a standard computer (Mac OS X, 1.7 GHz Intel Core i7, 8
GB RAM) with Eclipse Neon, Java 8, and 4 GB available memory for the eMoflon and medini QVT solutions,
and GHC 8.0.1 for BiGUL. The x-axis shows the total number of all source and target model elements, ranging
from about a thousand to a million elements (objects and links but not attributes). No measurement results are
available for cases where a tool either runs out of memory or requires more than 5 minutes.
                                                                 Forward Propagation                                                                                                           Backward Propagation
                          1000                                                                                                                          1000
                                       BiGUL   eMoflon     MediniQVT       BiGUL-Incr      eMoflon-Incr      MediniQVT-Incr                                           BiGUL    eMoflon     MediniQVT       BiGUL-Incr      eMoflon-Incr     MediniQVT-Incr


                           100                                                                                                                           100


                            10                                                                                                                            10


                                                                                                                              propagation time in [s]
propagation time in [s]


                             1                                                                                                                             1


                            0.1                                                                                                                           0.1


                           0.01                                                                                                                          0.01


                          0.001                                                                                                                         0.001
                                  02

                                         89 2
                                        13 2
                                        17 02
                                        22 02
                                        26 02
                                        30 02
                                        35 02
                                        39 02
                                        44 02
                                        48 02
                                        52 02
                                        57 02
                                        61 02
                                        66 02
                                        70 02
                                        74 02
                                        79 02
                                        83 02
                                        88 02
                                        92 02
                                        96 02
                                       10 02

                                               2


                                                                                                                                                                02

                                                                                                                                                                       89 2
                                                                                                                                                                             2

                                                                                                                                                                      17 02
                                                                                                                                                                      22 02
                                                                                                                                                                      26 02
                                                                                                                                                                      30 02
                                                                                                                                                                      35 02
                                                                                                                                                                      39 02
                                                                                                                                                                      44 02
                                                                                                                                                                      48 02
                                                                                                                                                                      52 02
                                                                                                                                                                      57 02
                                                                                                                                                                      61 02
                                                                                                                                                                      66 02
                                                                                                                                                                      70 02
                                                                                                                                                                      74 02
                                                                                                                                                                      79 02
                                                                                                                                                                      83 02
                                                                                                                                                                      88 02
                                                                                                                                                                      92 02
                                                                                                                                                                      96 02
                                                                                                                                                                     10 02

                                                                                                                                                                             2
                                            10
                                            10


                                            10


                                                                                                                                                                     10
                                                                                                                                                                          10


                                                                                                                                                                          10
                              11


                                          31
                                          71
                                          11
                                          51
                                          91
                                          31
                                          71
                                          11
                                          51
                                          91
                                          31
                                          71
                                          11
                                          51
                                          91
                                          31
                                          71
                                          11
                                          51
                                          91


                                                                                                                                                            11


                                                                                                                                                                        31
                                                                                                                                                                        71
                                                                                                                                                                        11
                                                                                                                                                                        51
                                                                                                                                                                        91
                                                                                                                                                                        31
                                                                                                                                                                        71
                                                                                                                                                                        11
                                                                                                                                                                        51
                                                                                                                                                                        91
                                                                                                                                                                        31
                                                                                                                                                                        71
                                                                                                                                                                        11
                                                                                                                                                                        51
                                                                                                                                                                        91
                                                                                                                                                                        31
                                                                                                                                                                        71
                                                                                                                                                                        11
                                                                                                                                                                        51
                                                                                                                                                                        91
                                   45


                                         13


                                                                                                                                                                 45


                                                                                                                                                                       13
                                                                                                                                                                          13
                                                   # of source and target model elements (nodes and edges)                                                                        # of source and target model elements (nodes and edges)


                                                         Figure 11: Runtime measurements for fwd (left) and bwd (right) direction

We answer in the following our performance-related research questions based on the measurement results:
RQ2.a: The initial transformation with BiGUL scales linearly with model size and remains under 20 seconds for
all cases up to a million elements. Forward transformation with eMoflon remains under 30 seconds with linear
scalability for up to about 900K model elements, but faces scalability issues for the largest case with about a
million elements requiring 96s before running out of memory. In the backward direction, eMoflon reaches its
limits already at about 60K elements. Medini QVT suffers from scalability issues in both directions and reaches
its limits at 221K elements requiring more than 150 seconds for the forward as well as the backward direction.
RQ-2.b: Representing a state-based solution, runtime for an update propagation with BiGUL scales with model
size (and not with update size) requiring almost the same runtime with the initial transformation in all cases.
Update propagation with eMoflon takes under one second for all cases up to 900K elements showing that the
update size is the decisive factor. Nevertheless, a relatively slow but steady linear growth in propagation time
can be observed with increasing model size, especially when eMoflon starts running out of memory for more than
900K elements. Update propagation with medini QVT takes longer than the initial transformation, probably
because correspondence information is handled as additional constraints. This indicates a critical dependency
on model size for solutions depending on a costly vertical alignment.
RQ-2.c: BiGUL and medini QVT exhibit symmetrical behaviour with respect to their scalability. In contrast,
the backward transformation with eMoflon is much slower than its forward transformation. This can be partly
attributed to the price for enabling a flexible runtime update policy that requires maintaining all possible
decisions (male to father or son, female to mother or daughter, in existing or new families).
To investigate RQ-3, we measured the size of F2P solutions developed with each individual tool. The most
compact solution (in terms of pure typing effort) is with eMoflon, consisting of 192 lines of code (256 words, 3195
characters), distributed over 12 files. The decomposition into multiple files (one TGG rule per file) is suggested
but optional; if a single file is used the solution is even more compact (as no imports are required): 168 lines
(208 words, 1995 characters). The medini QVT solution consists of 144 lines (374 words, 4163 characters) in a
single file, while the BiGUL solution consists of 181 lines (710 words, 6629 characters), again in a single file.
   The first tool comparison based our proposed benchmarx framework indicates the heterogeneity of the bx tool
landscape: Efficient batch transformation is provided by BiGUL, and to some extent, eMoflon, but they swap
places when propagating updates. While BiGUL is reliable for a limited set of test cases, eMoflon, and to some
extent medini QVT, strive to support a broader spectrum but can fail in some cases with limited support for
fine grained control over consistency restoration. Specifications tend to be more compact in a declarative bx
approach (eMoflon or medini QVT) as compared to a programming-based approach (BiGUL).
6   Conclusion and Future Work
The need for a benchmark for bidirectional transformations has been recognised for long in the bx community.
The benchmarx framework presented in this paper provides an infrastructure that takes the heterogeneity of bx
tools into account. The functional tool interface is based on a synchronisation dialogue; the representation of data
is tool-specific rather than prescribed by the framework. The framework supports a zoo of tool architectures,
including batch, incremental, state-based, delta-based and correspondence-based tools. As a pilot application,
we selected the well-known Families2Persons case and implemented it in three tools with significantly differing
architectures (BiGUL, eMoflon, and medini QVT). The F2P benchmarx provides a rich set of test cases which
is organised systematically with the help of a feature model. The solutions developed with the tools mentioned
above were evaluated with respect to both functionality and performance. This evaluation was performed
primarily to obtain feedback concerning the benchmarx infrastructure, the design of the F2P benchmark, and
the organisation of test cases according to the proposed feature model; we did not strive to optimise the solutions.
    To disseminate, evaluate, and improve the benchmarx infrastructure, we are planning to elaborate the Fam-
ilies2Persons benchmark into a tool transformation case. Since this transformation problem is well known and
may be implemented with acceptable effort, we hope that this case will be solved with a wide range of bx
tools, ideally covering the entire spectrum of tool architectures. Furthermore, we intend to apply the benchmarx
infrastructure to other bx problems to provide additional insights into the capabilities of bx languages and tools.

Acknowledgements
We would like to thank Robin Oppermann, Patrick Robrecht, Chandni Rikin Shah (Paderborn University), all
participants of the benchmarx working group of the BX Shonan meeting 2017, and our anonymous reviewers.

References
 [1] Bidirectional Transformations - NII Shonan Meeting Seminar 091. http://shonan.nii.ac.jp/seminar/
     091/. Date retrieved: February 28, 2017.
 [2] Anthony Anjorin. FamilyToPersons v0.1 in Bx Examples Repository. http://bx-community.wikidot.
     com/examples:home. Date retrieved: February 28, 2017.
 [3] Anthony Anjorin, Alcino Cunha, Holger Giese, Frank Hermann, Arend Rensink, and Andy Schürr. Bench-
     marx. In K. Selccuk Candan, Sihem Amer-Yahia, Nicole Schweikardt, Vassilis Christophides, and Vincent
     Leroy, editors, Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT
     2014), CEUR Workshop Proceedings, pages 82–86. CEUR-WS.org, 2014.
 [4] James Cheney, James McKinna, Perdita Stevens, and Jeremy Gibbons. Towards a Repository of Bx Ex-
     amples. In K. Selccuk Candan, Sihem Amer-Yahia, Nicole Schweikardt, Vassilis Christophides, and Vincent
     Leroy, editors, Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT
     2014), volume 1133 of CEUR Workshop Proceedings, pages 87–91. CEUR-WS.org, 2014.
 [5] Alcino Cunha and Ekkart Kindler, editors. Proceedings of the 4th International Workshop on Bidirectional
     Transformations (BX 2015), volume 1396 of CEUR Workshop Proceedings. CEUR-WS.org, 2015.
 [6] Zinovy Diskin, Arif Wider, Hamid Gholizadeh, and Krzysztof Czarnecki. Towards a Rational Taxonomy
     for Increasingly Symmetric Model Synchronization. In Davide Di Ruscio and Dániel Varró, editors, ICMT
     2014, volume 8568 of LNCS, pages 57–73. Springer, 2014.
 [7] Zinovy Diskin, Yingfei Xiong, and Krzysztof Czarnecki. From State- to Delta-Based Bidirectional Model
     Transformations: the Asymmetric Case. JOT, 10:6:1–25, 2011.
 [8] Soichiro Hidaka, Massimo Tisi, Jordi Cabot, and Zhenjiang Hu. Feature-Based Classification of Bidirectional
     Transformation Approaches. SoSyM, pages 1–22, 2015.
 [9] Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. Feature-Oriented
     Domain analysis (FODA) Feasibility Study. Technical report, DTIC Document, 1990.
[10] Andy Schürr. Specification of Graph Translators with Triple Graph Grammars. In Ernst W. Mayr, Gunther
     Schmidt, and Gottfried Tinhofer, editors, WG 1994, volume 903 of LNCS, pages 151–163. Springer, 1994.