<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Uppsala, Sweden, April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Benchmarx Reloaded: A Practical Benchmark Framework for Bidirectional Transformations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anthony Anjorin</string-name>
          <email>anthony.anjorin@upb.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zinovy Diskin</string-name>
          <email>diskinz@mcmaster.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frederic Jouault</string-name>
          <email>frederic.jouault@eseo.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hsiang-Shang Ko</string-name>
          <email>hsiang-shang@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erhan Leblebici</string-name>
          <email>@es.tu-darmstadt.de</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernhard Westfechtel</string-name>
          <email>@uni-bayreuth.de</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ESEO</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>McMaster University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NII</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Paderborn University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>TU Darmstadt</institution>
          ,
          <addr-line>Germany, erhan.leblebici</addr-line>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Univ. of Bayreuth</institution>
          ,
          <addr-line>Germany, bernhard.westfechtel</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>29</volume>
      <issue>2017</issue>
      <abstract>
        <p>Bidirectional transformation (bx) approaches provide a systematic way of specifying, restoring, and maintaining the consistency of related models. The current diversity of bx approaches is certainly bene cial, but it also poses challenges, especially when it comes to comparing the di erent approaches and corresponding bx tools that implement them. Although a benchmark for bx (referred to as a benchmarx ) has been identi ed in the community as an important and currently still missing contribution, only a rather abstract description and characterisation of what a benchmarx should be has been published to date. In this paper, therefore, we focus on providing a practical and pragmatic framework, on which future concrete benchmarx can be built. To demonstrate its feasibility, we present a rst non-trivial benchmarx based on a wellknown example, and use it to compare and evaluate three bx tools, chosen to cover the broad spectrum of bx approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This diversity is certainly advantageous from a research community point of view as it provides a broad
perspective on bx, but it also makes discerning the exact capabilities and limitations of individual approaches
challenging. The need for a bx benchmark (referred to as a benchmarx in the following) has been discussed in
detail by Anjorin et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], together with an abstract description of such a benchmarx. A benchmarx should
enable direct comparisons of bx tools based on a maintained set of examples and executable test cases. While
benchmarking is a common and well established practice for conducting evaluation and comparison in general,
a benchmarx exhibits additional characteristic properties and challenges that are unique to bx.
      </p>
      <p>
        Although Anjorin et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] discuss what should be provided by a benchmarx in an attempt to encourage
concrete benchmarx e orts, there has been no practical realisation of a benchmarx until now. Providing a series
of benchmarx has been a recurring topic discussed at di erent bx events (e.g., [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ]) involving representatives
from all bx sub-communities. A bx example repository [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has been established and is being actively maintained
(prevalently via examples introduced in bx-related talks and publications), but there is still no non-trivial
executable test suite for any of the examples that can be run, adapted, and interpreted for the numerous bx tools
that exist. We believe this is because many practical challenges have not been addressed, including how to
structure the test suite to simplify interpretation of test results for diverse bx tools with di erent capabilities.
      </p>
      <p>In this paper, we close this gap by providing a pragmatic and practical benchmarx framework that can be used
to establish further benchmarx implementations. Our contribution is thereby twofold: We propose a benchmarx
template based on an analysis of bx tool work ows and a classi cation according to supported features. This
classi cation into features is used to derive a concrete test suite structure that can be used not only to identify
a meaningful subset of tests for a given bx tool, but also to evaluate test coverage. Furthermore, we provide
an open-source reference implementation1 of our framework, together with a concrete benchmarx using a simple
but non-trivial example to evaluate and compare solutions with three very di erent bx tools.</p>
      <p>The rest of the paper is structured as follows: A well-known bx example is introduced in Sect. 2 and used
consequently throughout the paper to explain basic terminology, illustrate our benchmarx design, and demonstrate
its usage. The architecture of our benchmarx implementation is provided in Sect. 3. Solutions for our running
example (one functional programming-based, one rule-based, and one constraint-based) are presented in Sect. 4.
In Sect. 5, these solutions are integrated into, evaluated, and discussed with our benchmarx implementation to
demonstrate how to handle such a heterogeneous choice of bx tools. These results are a rst demonstration of a
tool comparison conducted with our benchmarx implementation, and should serve as encouragement for (more
detailed) future tool comparisons. Finally, Sect. 6 concludes the paper and provides an outlook on future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Our Benchmarx Example: Families to Persons</title>
      <p>To give concrete examples throughout the paper, we shall make consequent use of our chosen example for the
benchmarx: \Families to Persons", a well-known demonstrative transformation in the model transformation
community. The example is part of the ATL2 transformation zoo3 and was created as part of the \Usine
Logicielle" project. The original authors are Freddy Allilaire, Frederic Joault, and Jean Bezivin, all members of
the ATLAS research team at the time it was created (December 2006) and published (2007).</p>
      <p>Figure 1 depicts the two data structures involved, which we shall refer to in the following as the \source" and
\target" of the bidirectional transformation between them. The motivational story is that two administration
o ces of some city wish to maintain their respective data in a consistent state. Both source and target o ces
keep track of individuals registered in the city, but at di erent abstraction levels.</p>
      <p>As depicted in Fig. 1a as a metamodel, the source data structure has the core concept of a family (Family),
consisting of family members (FamilyMember). Both concepts are named entities, and family members play
di erent roles in a family. Note that the associations are all composites (denoted with the black diamonds),
implying that a family can only be in one family register, and that a family member can only be in one family
at a given point in time. Furthermore, due to the multiplicities 0..1, a family can have at most one mother and
at most one father. This is just a simpli cation, ruling out realistic situations where a man can be both a father
and son in di erent families, and a family can of course have two mothers or two fathers.</p>
      <p>In contrast, the target data structure (Fig. 1b) only has the primary concept of a person (Person), who can
be either male or female, but has a birthday in addition to a name. In the target domain, a person's name is
formed by concatenating surname and rst name, separated by a comma and space.</p>
      <sec id="sec-2-1">
        <title>1Available from https://github.com/eMoflon/benchmarx</title>
        <p>2The Atlas Transformation Language: https://www.eclipse.org/atl/
3Available from https://www.eclipse.org/atl/atlTransformations/#Families2Persons
0 ..* families</p>
        <sec id="sec-2-1-1">
          <title>Family</title>
          <p>name : String
0 ..1 mother
0 ..* sons
0 ..1 father
0 ..* daughters</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>FamilyMember</title>
          <p>name : String
(a) A register of people
0 ..* persons</p>
          <p>Person
name : String
birthday : Date</p>
          <p>Male</p>
          <p>Female
(b) A register of families</p>
          <p>In summary, source and target models contain not only shared information on which they must agree (family
members are persons), but also information only contained in one of the models (birthdays, family roles).</p>
          <p>A families model is consistent with a persons model if a bijective mapping between family members and persons
can be established such that (i) mothers and daughters (fathers and sons) are paired with females (males), and
(ii) the name of every person p is \f:name, m:name", where m is the member (in family f ) paired with p.</p>
          <p>
            This example, of which many variants exist including FamilyToPersons [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] in the bx example repository [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ],
was chosen for the following reasons, which can be taken into account when searching for a new benchmarx
example: It is already well-known and well-accepted by a substantial community, indicating that the example is
accessible to a wide range of people. It is also simple and invokes an intuitive expectation of consistency (e.g.,
that there should be a bijection between all family members in the source and all persons in the target). It
can nonetheless be used to demonstrate a fair number of synchronisation challenges including information loss,
non-determinism, exible alignment, and update policies. Finally, the data structures involved are simple trees,
simplifying its interpretation in other technological spaces.
3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Template for a Benchmarx</title>
      <p>A major challenge in benchmarking bx tools is that di erent tools may use di erent input data. Addressing this
challenge requires a unifying design space, in which di erent tool architectures can be placed and respectively
classi ed. In Sect. 3.1 we shall de ne such a space and distinguish seven basic bx tool architectures according to
the type of their input data. In Section 3.2, we then de ne a respective design space of test cases, and discuss
how a tool's architecture in uences the interpretation of test results.
3.1</p>
      <p>
        Design space of bx tools
To establish a common foundation, we rst present a vocabulary of basic notions, constructs, and terms based
on previous work by Diskin et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which we shall use for discussing consistency restoration via change
propagation. Let us refer to the two domains to be kept synchronised as the source and the target model spaces.
Source models are denoted by A, A0, etc., target models by B, B0, etc. Transformations from the source to the
target are called forward (fwd), from the target to the source backward (bwd).
3.1.1
      </p>
      <p>Objects: models, deltas, and corrs
The objects that bx tools operate on are models and relationships/links between them. Relationships between
models from di erent model spaces are called correspondence links, or just corrs. We shall denote corrs by double
bidirectional arrows, e.g., R : A , B, and depict them in our diagrams horizontally. Normally, a corr is a set of
links r(a; b) between elements (a in A, b in B), which are compatible with the models' structure.</p>
      <p>Relationships between models from the same model space are called deltas, denoted by directed arrows, e.g.,
: A ! A0, where we typically consider A0 to be an updated version of A. To visually distinguish corrs from
deltas, we shall depict deltas in our diagrams vertically. Deltas can be structural or operational.</p>
      <p>A structural delta (s-delta) is a collection of links between elements of the original and the updated model,
which is compatible with structure, e.g., is type preserving. In fact, an s-delta is a (usually simple) vertical corr
between models of the same space. We denote s-deltas by double unidirectional arrows, e.g., : A ) A0.</p>
      <p>An operational delta (o-delta) is an operational speci cation of changes to be performed on a model to
update it. Examples of such operational deltas include edit logs, in-place transformations, or transformation
rules. A given o-delta can be applicable to some models but not to others. If it is applicable to a model A,
its application results in (i) a new model A0 denoted by @A ( applied to A), and (ii) an s-delta : A ) A0
=)
relating A and A0, which we denote by @A. It is important to distinguish s-deltas from o-deltas as di erent
=) =)
o-deltas may result in the same model and even the same s-delta, i.e., 1@A = 2@A (and 1@A = 2@A) for
1 6= 2. For instance, when the order in which updates are performed is important, then a tool must accept
o-deltas as input. If o-delta is applicable to A, we write : A</p>
      <p>Example (E1). Our source model space consists of Family Registers, and the target space of Person Registers.
Consider a source model A consisting of a single family register with the Simpson family as a single family,
consisting of family members Homer (as father) and Marge (as mother), and a target model consisting of a single
person register with a single female person Simpson, Marge, and a single male person Simpson, Homer (with
their birthdays). This pair of source and target models can be related by a corr R : A , B including links
connecting the family register with the person register, Homer with Simpson, Homer, and Marge with Simpson,
Marge. As a source delta : A ! A0, consider adding a new family member Bart as a son in the Simpson family
to produce a new source model A0. This source delta can either be given as an s-delta : A ) A0 consisting of
links between elements in A and A0 including, e.g., a link connecting Homer in A with Homer in A0. The source
delta can also be given as an o-delta : A @!, e.g., some code creating a new family member Bart and adding
it as a son in the Simpson family. This code can be applied to A to produce @A = A0.
3.1.2</p>
      <p>
        Basic operations employed by bx tools
Our next goal is to classify bx tools based on their (expected) capabilities. This is directly related to the input
that a tool accepts and should exploit. Based on existing bx taxonomies [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ], as well as current bx tools under
active development,4 the following basic operations (visualised to the left of Fig. 2) can be used to provide a
schematic overview of di erent bx tool architectures:
Alignment (vAln, hAln). There are two alignment operations: vertical and horizontal alignment, denoted by
vAln and hAln, respectively.5 The former takes two models A; A0 from the same model space, and produces an
s-delta between them: vAln(A; A0): A ) A0. The latter takes two models A; B from di erent model spaces, and
produces a corr hAln(A; B) : A , B. There is also an operation of corr composition (re-alignment) that takes a
corr R : A , B and either a source s-delta : A ) A0 to produce a new corr R : A0 , B, or dually, a target
s-delta E: B ) B0 to produce R E : A , B0.
      </p>
      <p>Consistency is modelled by a predicate (a Boolean-valued function) check on the set of all corrs. For any corr
R : A , B, either check(R) = 1, i.e., the corr R is consistent, or check(R) = 0, i.e., R is inconsistent.
Consistency Restoration (fCR, bCR). The operation of forward consistency restoration fCR, takes a corr
R : A0 , B and restores consistency by changing the target model B and keeping A0 intact. It thus produces an
s-delta fCRv(R) = E: B ) B0, and a new consistent corr fCRh(R) = R0 : A0 , B0. Backward consistency
restoration bCR operates dually by updating the source model and keeping the target model intact, i.e., it produces
bCRv(R) = : A ) A0, and a consistent corr bCRh(R) = R0 : A0 , B0.</p>
      <p>Update Propagation (fUP, bUP). There are also two operations of o-delta propagation. Forward propagation
fUP, takes a corr R : A , B and an A-applicable o-delta : A @!, and produces a B-applicable o-delta : B
Backward propagation bUP operates analogously with bUP(R, ) = .</p>
      <p>Example (E2). Consider the source delta : A ! A0 from (E1), resulting in a new source model A0 with Bart
as a new family member. Given only A and A0, vertical alignment vAln can be used to produce an s-delta
4http://bx-community.wikidot.com/relatedtools
5Remember that corrs and deltas are typically depicted horizontally and vertically, respectively.
vAln(A; A): A ) A0. This could be achieved by making certain (arguably) reasonable assumptions, e.g., that
families and family members with the same name in A and A0 are actually the same objects and should not
be deleted and re-created by the resulting s-delta. Given the source and target models A and B from (E1),
horizontal alignment hAln can be applied to establish a corr hAln(A; B) : A , B connecting them. This requires
further assumptions, e.g., that family names, names of family members in a family, and names of persons in the
person register are unique. These assumptions for vAln and hAln do not hold in general for our example.</p>
      <p>Consider now the s-delta : A ) A0 (adding Bart as a son) and the old corr R : A , B. The operation of
re-alignment can be applied to produce the corr R = R : A0 , B. In this case, as nothing is deleted, the
links in R can simply be transferred to R , using to identify the corresponding elements in A0.</p>
      <p>The predicate check for our running example can be derived from the constraints stated in Sect. 2. For models
A and B from example (E1), it must be possible to extract the pairs (mH ; pH ) and (mM ; pM ) from a consistent
corr R : A , B, where mH ; mM are the family members with names Homer and Marge, and pH ; pM the persons
with names Simpson, Homer and Simpson, Marge, respectively. The new, re-aligned corr R is not consistent
according to check as a bijection cannot be derived (there is no corresponding person for Bart in the person
register). To restore consistency, the operation of forward consistency restoration fCR can be used to produce
a target s-delta E: B ) B0, and a new corr R0 : A0 , B0. In this case, this could be achieved by creating a
new person Simpson, Bart in the person register to produce the new target model B0, and adding a new link,
connecting the family member Bart with the person Simpson, Bart, to result in the new, now consistent corr
R0. Analogously, the o-delta : A @! (update) adding Bart as a son to the Simpson family, can be forward
propagated to the appropriate o-delta fUP(R; ) = : B @! B0, adding Simpson, Bart to the person register.
3.1.3</p>
      <p>A zoo of bx tool architectures
Based on these basic concepts and operations, seven bx tool architectures are depicted to the right of Fig. 2. We
have restricted ourselves to realistic architectures for which we have found actual implemented bx tools (outlined
in blue, red, and green), or for which such architectures are at least suggested in the literature (often with
prototypical implementations). Various additional hybrids and variants are of course possible including, e.g., a
state-based tool that also takes A and performs both vertical and horizontal alignment. For each entry in the
table, the row and column represent the input that the tool accepts. For presentation purposes, we assume that
both models A and A0 (B) can be extracted from : A ! A0 (R : A , B).</p>
      <p>vAln</p>
      <p>A
A'
*
hAln
fCR</p>
      <p>B
B'</p>
      <p>A'
A
A'</p>
      <p>A</p>
      <p>BATCH
in: A'
computation:</p>
      <p>R = A'&lt;=&gt;[]</p>
      <p>E = fCRv(R):[]=&gt;B'
out:B'</p>
      <p>B</p>
      <p>STATE-BASED
in: A',B
computation:</p>
      <p>R = hAln(A',B):A'&lt;=&gt;B</p>
      <p>E = fCRv(R):B=&gt;B'
out:B'</p>
      <p>S-DELTA/STATE-BASED
in: D:A=&gt;A',B
computation:</p>
      <p>R = hAln(A,B):A&lt;=&gt;B
R* = D*R:A'&lt;=&gt;B</p>
      <p>E = fCRv(R*):B=&gt;B'
out: E
O-DELTA/STATE-BASED
in: d:A-@&gt;,B
computation:</p>
      <p>R = hAln(A,B):A&lt;=&gt;B
e = fUP(d,R):B-@&gt;
out: e</p>
      <p>A</p>
      <p>B</p>
      <p>STATE/CORR-BASED
in: A',R:A&lt;=&gt;B
computation:</p>
      <p>D = vAln(A,A'):A=&gt;A'
R* = D*R:A'&lt;=&gt;B</p>
      <p>R' = fCRh(R*):A'&lt;=&gt;B'
out:R'</p>
      <p>S-DELTA/CORR-BASED
in: D:A=&gt;A',R:A&lt;=&gt;B
computation:</p>
      <p>R* = D*R:A'&lt;=&gt;B
E = fCRv(R*):B=&gt;B'</p>
      <p>R' = fCRh(R*):A'&lt;=&gt;B'
out:E,R'</p>
      <p>O-DELTA/CORR-BASED
in: d:A-@&gt;,R:A&lt;=&gt;B
computation:
e = fUP(d,R)</p>
      <p>R' = hAln(d@A,e@B)
out: e,R'</p>
      <p>A batch bx tool takes as input only the source model A0 (the \empty" model is represented as []) and
produces as output B0. The involved computational tasks indicate that batch tools cannot exhibit good or
e cient synchronisation behaviour in general: A batch tool can only assume trivial alignment of A0 with the
empty model [], and is forced to reconstruct B0 from scratch.</p>
      <p>State-based tools take both the new source model A0 and the old target model B as input, producing an
updated target model B0 as output. The required computations now make more sense, as horizontal alignment
can be applied, and B used as a starting point from which to produce a consistent B0. This means that a
state-based tool can handle information loss in the target domain.</p>
      <p>To avoid horizontal alignment, state/corr-based tools take A0 and a corr R : A , B. Such tools can handle
cases where it is impossible to compute hAln based on conventions.</p>
      <p>Similarly, if conventions can be used to compute horizontal alignment, but vertical alignment is problematic,
s-delta/state-based and o-delta/state-based tools can be used.</p>
      <p>Tools that are s-delta/corr-based are able to exploit available correspondence information in both horizontal
and vertical dimensions (Fig. 2). In such tools, alignment is completely replaced with re-alignment.</p>
      <p>Finally, an o-delta/corr-based tool takes an o-delta directly as input and can thus handle cases where the
order of changes is important.</p>
      <p>
        Taken from the domain of (software) product line engineering, a feature model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] consisting of features and
constraints over features can be used to succinctly capture the space of valid variants or products.
      </p>
      <p>Figure 3 depicts a feature model for bx tool architectures. Every bx tool architecture must be a valid \product"
of the product line described by this feature model. The nodes of the tree are the \features" that a given bx tool
architecture can possess. Features can be optional or mandatory, children features can only be chosen together
with their parent feature, and children of the same parent can be either exclusive or non-exclusive alternatives.
Features are abstract (grey ll) if they can be implied from the set of concrete features (white ll) in a given
product. The feature model depicted in Fig. 3 yields exactly the seven bx tool architectures described in Fig. 2
in terms of involved computation steps.</p>
      <p>bx tool architecture
batch</p>
      <p>incremental
(alignment-based)
state-based
(alignment computed
internally)</p>
      <p>delta/corr-based
(external alignment is
required and used)
delta-based</p>
      <p>corr-based
operational
structural</p>
      <p>Legend
or</p>
      <p>To further illustrate the architecture of the bx tools most relevant for this paper, work ow diagrams for
state-based, state/corr-based, and s-delta/corr-based bx tools are depicted in Fig. 4, Fig. 5, and Fig. 6. Apart
from visualising the computation ow, an update policy is indicated as additional input. The task of consistency
restoration typically involves decisions: think of setting Simpson, Bart's birthday in the forwards direction, or
deciding if a female person corresponds to a mother or daughter in the backwards direction. An update policy
is used to handle these choices, either at design time (e.g., hard-coded preferences and conventions concerning
default values), or at runtime (e.g., user interaction). In the work ow diagrams, the symbol is used to either
suppress output, or restrict it to only a part of what is available (e.g., only B0 from : B ) B0).
Example (E3). In our example, explicit corrs are required as input to handle non-unique person names. Explicit
deltas are required as input in our example to handle, e.g., renaming vs. create/delete operations. In both
these cases, computing alignment internally based on conventions cannot work in general. As an example of
how the order of updates, i.e., taking o-deltas directly as input, can be important, consider the xed update
policy of preferring to creating mothers to daughters in the backwards transformation. As families can have only
one mother, the order in which two females with the same family name are added to the person register, say
Simpson, Marge and Simpson, Maggie, is important to avoid creating Maggie as mother and Marge as daughter
of a new family in the register.
The current6 test suite for the FamiliesToPersons benchmarx consists of 52 test cases, and will be constantly
revised as further implementations are established or extended. As such a test suite grows in size, adding test
cases systematically is important to avoid redundancy and to have some measure for the current coverage and
which types of tests are still missing. Based on our classi cation of bx tool architectures, this section identi es a
respective design space for actual test cases, providing a feature-based classi cation of benchmarx test cases in
Sect. 3.2.1, a basic test case structure in Sect. 3.2.2, and a classi cation of test results in Sect. 3.2.3.
3.2.1</p>
      <p>Feature-based classi cation of test cases
As an extension of the feature model for bx tool architectures from Sect. 3.1.3, Fig. 7 depicts a feature model for
benchmarx test cases. Every benchmarx test case must state the required bx tool architecture (cf. Fig. 3), its
direction to be one of fwd (forward), bwd (backward), or round trip (a mixture), the combination of di erent
change types applied in the test, and the required update policy to successfully pass the test. The set of
possible change types, currently del (deletion), add (addition), and attribute (attribute manipulation) can be
extended in the future to accommodate more expressive frameworks. Note that a test case can require an update
policy that is a mixture of fixed, i.e., design time preferences and conventions, and runtime con guration.
3.2.2</p>
      <p>Test cases as synchronisation dialogues
Although it would be certainly interesting to directly assert the resulting corr and output delta of a
synchronisation run, our investigations with bx tools have revealed that these data structures are typically tool-speci c</p>
      <p>benchmarx
test case
and complicate the task of test case speci cation. To cope with this in a pragmatic manner, we propose to
design each test case as a synchronisation dialogue, always starting from the same agreed upon consistent state,
from which a sequence of deltas are propagated. Only the resulting output models are to be directly asserted by
comparing them with expected versions.</p>
      <p>A benchmarx test case is depicted schematically to the left of Fig. 8, with a concrete test case for our example
to the right, following the proposed structure and representing an instantiation with JavaDoc, Java, and JUnit.
Legend
optional
mandatory
or</p>
      <p>Every test case should be documented (cf. Label 1 in Figure 8) stating (i) what is being tested, (ii) expected
behaviour, and (iii) a list of the concrete features of the test case taken from Fig. 7 to clarify at a glance if a
given bx tool can pass the test or not.</p>
      <p>Every test starts with an initialisation command invoked on the bx tool under test (Label 2), giving it the
chance to establish the agreed upon starting point (e.g., for the FamiliesToPerson benchmarx this comprises a
single empty family register and corresponding single empty persons register), and create all necessary internal
auxiliary data structures.</p>
      <p>The next part of a test case (Label 3) is a series of propagation steps, used to establish the precondition of
the test. Although this creates a dependency to other tests (asserting exactly this precondition), this simpli es
handling the required correspondence model, as the bx tool can build up the necessary internal state that must
11
go along with the precondition. This means that the old consistent corr is \passed" implicitly to the bx tool via a
series of preparatory propagation steps. The precondition is nally asserted (Label 4), representing a well-de ned
starting point. If the test requires a runtime update policy, this is con gured just before propagating the actual
input delta (Label 5). The last part of a test case (Label 6) is an assertion of the postcondition, checking if the
nal source and target models are as expected.</p>
      <p>In the concrete exemplary test case, a number of persons are created in the person register and then backward
propagated to establish a consistent family register that is asserted as a precondition. As part of the actual test,
a person named Simpson, Bart is now renamed in the person register; this change is backward propagated with
the update policy to prefer creating parents (if possible) to creating children. The interested reader is referred
to the actual test cases7 for all further details.</p>
      <p>Interpretation of test results: Failure, limitation, or (un)expected pass
Due to the heterogeneity of bx tools, it is important to be able to easily and quickly distinguish between:
A test that fails because it requires features that the tool does not support, referred to as a limitation.</p>
      <p>A test that fails even though the tool should pass it (based on its classi cation), referred to as a failure.</p>
      <p>Limitations con rm the tool's set of (un)supported features, while failures indicate potential bugs in (the
implementation of the benchmarx with) the bx tool. Similarly, one should clearly distinguish between:
A test that the tool should (based on its classi cation) pass and passes, referred to as an expected pass.</p>
      <p>A test that the tool passes even though this is unexpected in general, referred to as an unexpected pass.</p>
      <p>Expected passes con rm that the tool is correctly classi ed and behaves as expected, while unexpected passes
indicate that the test case can either be improved, as it is unable to reveal the missing features of the tool, or
that the bx tool has not been classi ed correctly.
4</p>
      <p>The FamiliesToPersons (F2P) Benchmarx: Version 1.0
In this section, we defend our choice of three bx tools, present three implementations of the (F 2P ) benchmarx
with these tools, and discuss initial results and insights.</p>
      <p>
        To demonstrate how to integrate diverse bx tools, we have chosen three bx tools with the aim of covering
all the primary groups of BX approaches: BiGUL,8 eMo on,9 and medini QVT10. Bx approaches can be very
broadly classi ed into:
Functional programming (FP) based approaches: The basic idea here is to program a single function
(forward or backward) that can be used to infer the other function (backward or forward) such that the
pair obeys a set of round tripping laws. Referring to forward propagation as get and backward propagation
as put, as is usual in an asymmetric setting, this idea can be realised either by using get to infer put
(getbased approaches) or by using put to infer get (putback-based approaches). A nal strategy is to provide
a library of pairs of (get; put) that already obey the round tripping laws, and use a set of combinators to
allow composition (combinator-based approaches). In all cases, the underlying consistency relation is not
explicitly speci ed. We have chosen BiGUL as a representative of this group of bx approaches.
Grammar-based approaches: While FP-based approaches allow for ne granular control of all details of
consistency restoration, grammar-based approaches provide a high-level, rule-based language, with which
the language of all consistent pairs of source and target models can be generated. Such a speci cation is
under-speci ed as the consistency relation (which equates here to language membership) is fully speci ed,
but not all details of consistency restoration are xed. Although this technique is not restricted to graphs
and can be transferred to strings, lists, or trees, Triple Graph Grammars (TGGs) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are the primary
representative of this group, and have been implemented by various tools. We have chosen eMo on as a
representative of this group.
      </p>
      <sec id="sec-3-1">
        <title>7Available from https://github.com/eMoflon/benchmarx 8www.prg.nii.ac.jp/project/bigul/ 9www.emoflon.org 10http://projects.ikv.de/qvt</title>
        <p>Constraint-based approaches: Even more high-level than grammar-based approaches, constraint-based
approaches leave all details of consistency restoration open and only require a speci cation of the consistency
relation. Consistency restoration is often realised via a constraint solver, guided via a global objective
function characterising \preferred" strategies. We have chosen medini QVT11 as a constraint-based bx tool.
Besides the fundamental di erences regarding the underlying bx approach, the tools also di er with respect to
their architectures: We decided to use BiGUL to implement a state-based solution (Fig. 4) with a design-time
update policy. In contrast, we used the TGG tool eMo on to implement an s-delta/corr-based solution (Fig. 6)
with support for a runtime update policy, while medini QVT was used to implement a state/corr-based solution
(Fig. 5) with a design-time update policy.</p>
        <p>While other cases can possibly be supported by the same tools, e.g., by encoding explicit delta and
correspondence information in \states" for BiGUL, or calculating the \best tting" correspondence model and
implementing an additional delta discovery procedure for eMo on, the choices above represent the \standard"
case for these tools, i.e., the sweet spot for the tool for which minimal e ort is required. We now give an overview
of the three solutions to the F2P benchmarx in the following.
4.1</p>
        <p>A solution to the F2P benchmarx with BiGUL
A BiGUL program describes how consistency is established for two models having asymmetric information,
in the sense that one model contains more information than the other. When programming in BiGUL, the
programmer assumes that two (possibly inconsistent) models are given, and describes how to put all data in the
less informative model into proper places in the more informative model, thereby producing an updated version
of the latter model that is consistent with the former model.</p>
        <p>For symmetric scenarios such as the F2P benchmarx, symmetric consistency restoration can be achieved by
combining two asymmetric consistency restorers that synchronise the two models with a common intermediate
model. This intermediate model can either incorporate all the information of both models, or only contain
the information shared by both models. Our BiGUL implementation of the F2P benchmarx adopts the latter
approach: A family register is consistent with a person register exactly when the two registers are consistent with
the same intermediate model, which is a collection of shared people (names and genders) with the Haskell type
MediumR = [((String, String), Bool)]. The overall structure of the BiGUL implementation can thus be
thought of as two programs putting MediumR into FamilyRegister (Figure 1a) and PersonRegister (Figure 1b)
respectively. If one register is changed, say the family register, the collection of people it represents will be
computed and put into the person register; the updated person register will represent the same collection of
people, thus re-establishing consistency.</p>
        <p>To better organise the implementation, we introduce another intermediate model MediumL, whose structure
is closer to FamilyRegister: A model of type MediumL = [(String, [(String, Bool)])] is a collection of
families, each of which consists of a family name and a list of rst names and genders of its members. A more
accurate depiction of the structure of the implementation is thus:</p>
        <p>FamilyRegister</p>
        <p>MediumL
syncL
syncM</p>
        <p>MediumR
syncR</p>
        <p>PersonRegister
where the arrows indicate the directions of the constituent put programs. Besides straightforward format
conversion (e.g., converting True and False to Male and Female), each of the three put programs is in charge
of a part of the consistency restoration policy: (i) syncR aligns a collection of people with a person register,
and describes how to update (retain, create, or delete) the birthdays in the person register; (ii) syncM aligns
a collection of people with a collection of families, and describes how families are created or deleted, and how
people are assigned to families; (iii) syncL describes how members in each family are assigned their roles.</p>
        <p>As an example of a BiGUL speci cation, the code for syncM is depicted in Listing 1. The three \normal"
branches deal with the cases where the collections of people and families are consistent, while the last \adaptive"
branch describes how the collection of families can be updated to become consistent with the collection of people,
applying the xed update policy of matching each person with an entry with the same name and gender in an
existing family, or creating a new entry in the rst family with matching family name. If no matching family
can be found, a new family is created.</p>
        <p>11Even though \QVT" is in its name, we regard medini QVT as simply a constraint-based bx tool that just happens to have a
QVT-R inspired textual concrete syntax (like many others such as Echo: http://haslab.github.io/echo/ or JTL: http://jtl.di.
univaq.it). We do not regard medini QVT as an implementation of the QVT-R standard, as its semantics diverge signi cantly.
syncM :: BiGUL MediumL MediumR
syncM = Case
[ $( normalSV [p| [] |] [p| [] |] [p| [] |])</p>
        <p>==&gt; $( update [p| _ |] [p| [] |] [d| |])
, $( normalSV [p| (( familyName , []): _) |] [p| _ |] [p| (_ , []): _ |])</p>
        <p>==&gt; $( update [p| _: rest |] [p| rest |] [d| rest = syncM |])
, $( normal [| \(( familyName , ( firstName , gender ): _ ): _) vs -&gt;</p>
        <p>(( familyName , firstName ), gender ) `elem ` vs |]
[p| (_ , _:_ ): _ |])
==&gt; $( rearrS [| \(( familyName , ( firstName , gender ): ns ): ss ) -&gt;</p>
        <p>((( familyName , firstName ), gender ), ( familyName , ns ): ss ) |]) $
( Replace `Prod ` syncM ) `Compose ` extract
, $( adaptive [| \_ _ -&gt; otherwise |])</p>
        <p>==&gt; adapt
]
where
extract :: ( Ord a , Show a) =&gt; BiGUL (a , [a ]) [a]
extract = Case
[ $( normal [| \(s , _) (v:_) -&gt; v == s |] [| \(s , ss ) -&gt; null ss || head ss &gt;= s |])
==&gt; $( update [p| (x , xs ) |] [p| x: xs |] [d| x = Replace ; xs = Replace |])
, $( normal [| \(s , _:_) (v:_) -&gt; v &lt; s |] [| \(s , s ': _) -&gt; s ' &lt; s |])
==&gt; $( rearrS [| \(s , s ': ss ) -&gt; (s ', (s , ss )) |]) $
$( rearrV [| \( v: vs ) -&gt; (v , vs ) |]) $</p>
        <p>Replace `Prod ` extract
, $( adaptive [| \(s , []) (v:_) -&gt; v &lt; s |])</p>
        <p>==&gt; \(s , _) _ -&gt; (s , [ undefined ])
]
adapt :: MediumL -&gt; MediumR -&gt; MediumL
adapt [] vs = map (( fst . fst . head ) &amp;&amp;&amp; map ( snd *** id ))</p>
        <p>( groupBy ((==) `on ` ( fst . fst )) vs )
adapt (( familyName , ns ): ss ) vs =
let ns ' :: [( String , Bool )]
ns ' = concat ( map snd ( filter ((== familyName ) . fst ) ss ))
vs ' :: [( String , Bool )]
vs ' = map ( snd *** id ) ( filter ((== familyName ) . fst . fst ) vs ) \\ (ns ' \\ ns )
in ( familyName , vs ') : adapt ss ( vs \\ ( map (( familyName ,) *** id ) vs '))</p>
        <p>Listing 1: An excerpt from the solution in BiGUL
4.2</p>
        <p>A solution to the F2P benchmarx with eMo on
The recommended way of \thinking in TGGs" is to describe how two consistent models are to evolve
simultaneously, independently of how the TGG is to be ultimately operationalised, i.e., used to derive a directed
(forward/backward) propagation of changes in one model to the other. Considering our running example, an
obvious step in a simultaneous construction is the creation of a family member (father, mother, daughter, or
son) in the family register, and the corresponding creation of a person (male or female) in the person register,
indicating that four di erent cases must be handled at the minimum. Another orthogonal dimension is whether
a family member is created in an existing family or in a new family, i.e., a new family could be created with
its very rst member. There are thus eight (4x2) di erent cases required to handle the construction of family
members and corresponding persons simultaneously.</p>
        <p>An excerpt from the TGG rules designed with this understanding of simultaneous construction in mind (and
speci ed with eMo on) is depicted in Figure 9. We omit the trivial construction step of required root containers
(creating an empty family register together with an empty person register), and focus on the di erent cases for
creating family members and corresponding persons. The TGG rules in Figure 9 are depicted using eMo on's
textual syntax (eMo on also provides a read-only visualisation in the common visual syntax of graph grammars).
In all rules, created elements are green and provided with a ++-markup, while context elements (which must be
already present in order to apply a rule) are black.</p>
        <p>To the left, an abstract rule FamilyMemberToPerson (note the keyword #abstract) is given which creates a
family, its rst family member, a person, and a correspondence link between the family member and the person.
Two decisions are left open: the exact role of the family member in the family, and the concrete type (male or
female) of the person. Two rules in the middle (MotherToFemale and FatherToMale) re ne this abstract rule
(note the #extends keyword) and clarify these open decisions for mothers and fathers (analogous rules creating
sons and daughters are present but not explicitly depicted).</p>
        <p>In the right part of Figure 9, two further rules (MotherOfExistingFamilyToFemale and
FatherOfExistingFamilyToMale) re ne the previous ones by requiring the family as existent context
(as opposed to creating it). In both rules, an additional Negative Application Condition (NAC), depicted blue
and with a !-markup, forbids the existence of a mother or father (depending on what is created) in the family,
in order to prevent generating families with multiple mothers or fathers. Analogous rules creating sons and
daughters in existing families are again present but not explicitly depicted. These rules for sons and daughters
do not, however, have the NAC as multiple sons and daughters are allowed in families.</p>
        <p>}
}
}
}
#abstract #rule FamilyMember2Person
#source {
fr : FamilyRegister {</p>
        <p>++ -families-&gt;f
}
++ f : Family
++ fm : FamilyMember
#target {
pr : PersonRegister {</p>
        <p>++ -persons-&gt;p
}
++ p : Person
#correspondence {
fr2pr : FamiliesToPersonsCorr {
#src-&gt;fr
#trg-&gt;pr
}
++ fm2p: FamilyMemberToPerson {
#src-&gt;fm
#trg-&gt;p</p>
        <p>A solution to the F2P benchmarx with medini QVT
Medini QVT follows a constraint-based approach to bx. A transformation is de ned in terms of a set of relations
which must hold between the participating models, and is executed in the direction of one of them. Consistency
restoration ensures that for each relation and each instance of a source domain (source pattern), there is a
corresponding instance of the target domain such that all constraints on the domain instances are satis ed; the
same must hold in the opposite direction.</p>
        <p>An excerpt of the solution to the F2P benchmarx with medini QVT is depicted in Listing 2. It involves
ve relations: one relation between the root elements, and one relation for each role of a family member. In
the forward direction, each family member is mapped to a person of the same gender with the same full name.
Without further provisions, each person would be transformed twice in the backward direction: once as a parent
and once as a child. As this violates the consistency constraint that demands a bijective mapping between
family members and persons, the application of one of these relations is suppressed in the backward direction by
querying the binding state of some variable in the target domain. As the when clause is executed before the target
domain is instantiated in the forward direction, the variables in the target domain will still be unbound when
the transformation is executed and the condition female.oclIsUndefined() will thus only then be satis ed.
transformation families2persons ( famDB : Families , perDB : Persons ) {
top relation FamilyRegister2PersonRegister {
enforce domain famDB familyRegister : Families :: FamilyRegister {};
enforce domain perDB personRegister : Persons :: PersonRegister {};
where {</p>
        <p>Father2Male ( familyRegister , personRegister );
Son2Male ( familyRegister , personRegister );
Mother2Female ( familyRegister , personRegister );</p>
        <p>Daughter2Female ( familyRegister , personRegister );
5 A rst tool comparison based on the F2P benchmarx
We compare in the following BiGUL, eMo on, and medini QVT based on the F2P benchmarx. Our goal is to
impart a rst impression of what can be achieved with our proposed benchmarx infrastructure and its reference
implementation, and to inspire future tool comparisons.</p>
        <p>We chose to investigate the following research questions (referred to as RQ from now on) with our tool
comparison, covering the di erent capabilities of bx tools (RQ-1), performance (RQ-2.a,b,c), and required
implementation e ort (RQ-3):</p>
        <p>RQ-1: Which subset of bx test cases (cf. Section 3) can be handled by each of the three tools?
RQ-2.a: How do the tools scale in an initial batch transformation from scratch?
RQ-2.b: Does propagation time scale with the size of the update or with the size of all models?
RQ-2.c: Do the tools exhibit symmetric behaviour in runtime in both forward and backward directions?
RQ-3: How do the tools di er with respect to the size of the required transformation speci cation?
To investigate RQ-1, we executed 52 tests di ering in their features such as direction, change type, and required
update policy.12 Figure 10 depicts a representative excerpt showing the results for eight test cases. As discussed
in Sect. 3.2, we distinguish between expected passes, failures, unexpected passes, and limitations for each result
with a given bx tool.</p>
        <p>Our test results show that all tools are able to propagate simple updates such as creation of new families (#1
in Figure 10) or birthday changes of persons (#2) which should not trigger any changes in families. Renaming
family members (#3) is propagated by all tools as renaming of persons (as the test requires). BiGUL, supporting
12The entire set of test cases and the results are available as part of our reference implementation (cf. Section 1).
#
1
2
3
4
5
6
7
8
roundtrip
fwd
fwd
fixed
fixed
fixed
fwd</p>
        <p>fixed
roundtrip
attribute
add
create new families
change birthdays
(nothing should happen)
rename family members
rename person who should start a
new family (altough a family with</p>
        <p>the same name exists)
delete family members whose
names are not unique in persons
create a male for whom a father</p>
        <p>exists
delete person
delete person who was the first
member in the family</p>
        <p>PASS
PASS
pass
fail
pass
fail
PASS
PASS</p>
        <p>PASS
PASS
PASS
PASS
PASS
PASS
PASS
FAIL
PASS : expected pass FAIL : failure
pass : unexpected pass</p>
        <p>fail : limitation
neither deltas nor corrs, cannot necessarily be expected to pass this test, i.e., a completely new person could be
created with the new name. We therefore identify this result as an unexpected pass. In a similar test where
renaming a person should start a new family (#4), eMo on passes the test while medini QVT and BiGUL fail
as expected (limitation) due to a missing mechanism to con gure the required update policy. When deleting
a family member whose name is not unique (#5), all tools pass the test (by deleting the respective person).
BiGUL could potentially fail (unexpected pass) due to missing corrs, which are necessary to determine the exact
mapping of persons to the non-unique family members. Creating a male person who should become a son (#6)
is only propagated by eMo on as required, while medini QVT and BiGUL fail, again due to a non-con gurable
update policy. Deleting a unique person (#7) is propagated by eMo on and BiGUL correctly (by deleting the
respective family member) whereas medini QVT fails in this test by performing a backward transformation from
scratch and thus creating new families without sons or daughters. Finally, eMo on also fails in a similar test if
the deleted person was the rst member of their family (#8), due to a suboptimal strategy for handling deletion
updates. This result is thus a failure for eMo on.</p>
        <p>From all 52 test cases, eMo on has 51 expected passes and one failure, medini QVT has 11 expected passes, 15
unexpected passes, four failures, and 22 limitations, and nally, BiGUL has 12 expected passes, 19 unexpected
passes, and 21 limitations. We thus conclude the following for RQ-1: Assuming a bx scenario where consistency
speci cation is feasible with all three tools, they still di er in the set of their supported bx test cases. eMo on
strives to cover all cases with its con gurable update policy as well as its s-delta and corr-based architecture.
It can, however, still fail in some cases, by deleting more elements than absolutely necessary for consistency
restoration. The BiGUL implementation supports a rather small subset of the tests for which con guration of
the update policy, explicit deltas, and correspondences are not necessary. Nevertheless, BiGUL does not fail in
any test case where it is expected to pass, indicating that it allows adequate control of the consistency restoration
process. Medini QVT addresses a similar subset of the tests as BiGUL, additionally supporting correspondences,
but often fails for round-tripping tests as it always (re-)transforms all persons from scratch to families without
sons or daughters. As a nal remark, the numerous unexpected passes indicate that more non-trivial tests are
necessary to better reveal the limitations of the tools.</p>
        <p>To investigate RQ-2.a,b,c, we generated source models of increasing size (consisting of uniquely named
families with three children), performed a forward transformation from scratch and subsequently propagated
an update adding a new daughter to one of the families. The same experiment was conducted analogously in
the backward direction. The plots in Figure 11 depict our measurement results. The y-axis shows the median
runtime (in seconds) of ve repetitions executed on a standard computer (Mac OS X, 1.7 GHz Intel Core i7, 8
GB RAM) with Eclipse Neon, Java 8, and 4 GB available memory for the eMo on and medini QVT solutions,
and GHC 8.0.1 for BiGUL. The x-axis shows the total number of all source and target model elements, ranging
from about a thousand to a million elements (objects and links but not attributes). No measurement results are
available for cases where a tool either runs out of memory or requires more than 5 minutes.
eMoflon-Incr</p>
        <p>MediniQVT-Incr</p>
        <p>BiGUL
eMoflon
eMoflon-Incr</p>
        <p>MediniQVT-Incr</p>
        <p>Backward Propagation</p>
        <p>MediniQVT BiGUL-Incr
1000
100
We answer in the following our performance-related research questions based on the measurement results:
RQ2.a: The initial transformation with BiGUL scales linearly with model size and remains under 20 seconds for
all cases up to a million elements. Forward transformation with eMo on remains under 30 seconds with linear
scalability for up to about 900K model elements, but faces scalability issues for the largest case with about a
million elements requiring 96s before running out of memory. In the backward direction, eMo on reaches its
limits already at about 60K elements. Medini QVT su ers from scalability issues in both directions and reaches
its limits at 221K elements requiring more than 150 seconds for the forward as well as the backward direction.
RQ-2.b: Representing a state-based solution, runtime for an update propagation with BiGUL scales with model
size (and not with update size) requiring almost the same runtime with the initial transformation in all cases.
Update propagation with eMo on takes under one second for all cases up to 900K elements showing that the
update size is the decisive factor. Nevertheless, a relatively slow but steady linear growth in propagation time
can be observed with increasing model size, especially when eMo on starts running out of memory for more than
900K elements. Update propagation with medini QVT takes longer than the initial transformation, probably
because correspondence information is handled as additional constraints. This indicates a critical dependency
on model size for solutions depending on a costly vertical alignment.</p>
        <p>RQ-2.c: BiGUL and medini QVT exhibit symmetrical behaviour with respect to their scalability. In contrast,
the backward transformation with eMo on is much slower than its forward transformation. This can be partly
attributed to the price for enabling a exible runtime update policy that requires maintaining all possible
decisions (male to father or son, female to mother or daughter, in existing or new families).
To investigate RQ-3, we measured the size of F2P solutions developed with each individual tool. The most
compact solution (in terms of pure typing e ort) is with eMo on, consisting of 192 lines of code (256 words, 3195
characters), distributed over 12 les. The decomposition into multiple les (one TGG rule per le) is suggested
but optional; if a single le is used the solution is even more compact (as no imports are required): 168 lines
(208 words, 1995 characters). The medini QVT solution consists of 144 lines (374 words, 4163 characters) in a
single le, while the BiGUL solution consists of 181 lines (710 words, 6629 characters), again in a single le.</p>
        <p>The rst tool comparison based our proposed benchmarx framework indicates the heterogeneity of the bx tool
landscape: E cient batch transformation is provided by BiGUL, and to some extent, eMo on, but they swap
places when propagating updates. While BiGUL is reliable for a limited set of test cases, eMo on, and to some
extent medini QVT, strive to support a broader spectrum but can fail in some cases with limited support for
ne grained control over consistency restoration. Speci cations tend to be more compact in a declarative bx
approach (eMo on or medini QVT) as compared to a programming-based approach (BiGUL).
The need for a benchmark for bidirectional transformations has been recognised for long in the bx community.
The benchmarx framework presented in this paper provides an infrastructure that takes the heterogeneity of bx
tools into account. The functional tool interface is based on a synchronisation dialogue; the representation of data
is tool-speci c rather than prescribed by the framework. The framework supports a zoo of tool architectures,
including batch, incremental, state-based, delta-based and correspondence-based tools. As a pilot application,
we selected the well-known Families2Persons case and implemented it in three tools with signi cantly di ering
architectures (BiGUL, eMo on, and medini QVT). The F2P benchmarx provides a rich set of test cases which
is organised systematically with the help of a feature model. The solutions developed with the tools mentioned
above were evaluated with respect to both functionality and performance. This evaluation was performed
primarily to obtain feedback concerning the benchmarx infrastructure, the design of the F2P benchmark, and
the organisation of test cases according to the proposed feature model; we did not strive to optimise the solutions.</p>
        <p>To disseminate, evaluate, and improve the benchmarx infrastructure, we are planning to elaborate the
Families2Persons benchmark into a tool transformation case. Since this transformation problem is well known and
may be implemented with acceptable e ort, we hope that this case will be solved with a wide range of bx
tools, ideally covering the entire spectrum of tool architectures. Furthermore, we intend to apply the benchmarx
infrastructure to other bx problems to provide additional insights into the capabilities of bx languages and tools.
Acknowledgements
We would like to thank Robin Oppermann, Patrick Robrecht, Chandni Rikin Shah (Paderborn University), all
participants of the benchmarx working group of the BX Shonan meeting 2017, and our anonymous reviewers.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Bidirectional</given-names>
            <surname>Transformations - NII Shonan Meeting</surname>
          </string-name>
          <article-title>Seminar 091</article-title>
          . http://shonan.nii.ac.jp/seminar/ 091/. Date retrieved:
          <source>February</source>
          <volume>28</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Anjorin</surname>
          </string-name>
          .
          <source>FamilyToPersons v0</source>
          .
          <article-title>1 in Bx Examples Repository</article-title>
          . http://bx-community.wikidot. com/examples:home.
          <source>Date retrieved: February</source>
          <volume>28</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Anjorin</surname>
          </string-name>
          , Alcino Cunha, Holger Giese, Frank Hermann, Arend Rensink, and Andy Schurr. Benchmarx. In K. Selccuk Candan,
          <source>Sihem Amer-Yahia</source>
          , Nicole Schweikardt, Vassilis Christophides, and Vincent Leroy, editors,
          <source>Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT</source>
          <year>2014</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          , pages
          <volume>82</volume>
          {
          <fpage>86</fpage>
          . CEUR-WS.org,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>James</given-names>
            <surname>Cheney</surname>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            <given-names>McKinna</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Perdita</given-names>
            <surname>Stevens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Gibbons</surname>
          </string-name>
          .
          <article-title>Towards a Repository of Bx Examples</article-title>
          . In K. Selccuk Candan,
          <source>Sihem Amer-Yahia</source>
          , Nicole Schweikardt, Vassilis Christophides, and Vincent Leroy, editors,
          <source>Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT 2014)</source>
          , volume
          <volume>1133</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <volume>87</volume>
          {
          <fpage>91</fpage>
          . CEUR-WS.org,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Alcino</given-names>
            <surname>Cunha</surname>
          </string-name>
          and Ekkart Kindler, editors.
          <source>Proceedings of the 4th International Workshop on Bidirectional Transformations (BX</source>
          <year>2015</year>
          ), volume
          <volume>1396</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Zinovy</given-names>
            <surname>Diskin</surname>
          </string-name>
          , Arif Wider, Hamid Gholizadeh, and
          <string-name>
            <given-names>Krzysztof</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          .
          <article-title>Towards a Rational Taxonomy for Increasingly Symmetric Model Synchronization</article-title>
          . In Davide Di Ruscio and Daniel Varro, editors,
          <source>ICMT</source>
          <year>2014</year>
          , volume
          <volume>8568</volume>
          <source>of LNCS</source>
          , pages
          <volume>57</volume>
          {
          <fpage>73</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Zinovy</given-names>
            <surname>Diskin</surname>
          </string-name>
          , Yingfei Xiong, and
          <string-name>
            <given-names>Krzysztof</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          . From State- to
          <source>Delta-Based Bidirectional Model Transformations: the Asymmetric Case. JOT</source>
          ,
          <volume>10</volume>
          :6:1{
          <fpage>25</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Soichiro</given-names>
            <surname>Hidaka</surname>
          </string-name>
          , Massimo Tisi, Jordi Cabot, and
          <string-name>
            <given-names>Zhenjiang</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <article-title>Feature-Based Classi cation of Bidirectional Transformation Approaches</article-title>
          . SoSyM, pages
          <volume>1</volume>
          {
          <fpage>22</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kyo</surname>
            <given-names>C Kang</given-names>
          </string-name>
          , Sholom G Cohen,
          <article-title>James A Hess</article-title>
          , William E Novak, and
          <string-name>
            <given-names>A Spencer</given-names>
            <surname>Peterson</surname>
          </string-name>
          .
          <article-title>Feature-Oriented Domain analysis (FODA) Feasibility Study</article-title>
          .
          <source>Technical report, DTIC Document</source>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Andy</given-names>
            <surname>Schu</surname>
          </string-name>
          <article-title>rr. Speci cation of Graph Translators with Triple Graph Grammars</article-title>
          . In Ernst W. Mayr, Gunther Schmidt, and Gottfried Tinhofer, editors,
          <source>WG</source>
          <year>1994</year>
          , volume
          <volume>903</volume>
          <source>of LNCS</source>
          , pages
          <volume>151</volume>
          {
          <fpage>163</fpage>
          . Springer,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>