Introduction

Veri ed Bidirectional Transformations by Construction

K. Lano

kevin.lano@kcl.ac.uk 0

S. Yassipour-Tehrani

tehrani@kcl.ac.uk 0 0 Dept. of Informatics King's College London London , UK sobhan.yassipour

Bidirectional transformations (bx) are of increasing significance in model-based engineering. Currently bx are defined using a number of specialised transformation languages. In this paper we show how standard UML elements such as use cases and OCL constraints can be used to define bx, thus taking advantage of the wide industrial and educational adoption and support for UML. We define patterns and techniques for specifying bx so that they are correct by construction.

Introduction

Additionally, updates to a model by R! or R should be the minimal necessary to restore R: this is termed the principle of least change [13].

Section 2 describes current bx research, Section 3 describes some bx patterns, and Section 4 describes our approach to bx specification in UML. Section 5 provides a proof that the bx properties for R! are satisfied by transformations defined using our approach. Section 6 gives an evaluation of the approach. 2

Bx Speci cation Languages and Techniques

Bx are the subject of extensive theoretical research [16, 17, 4, 12] and an annual conference, BX. The most well-known transformation language which supports definition of bx is QVT-R [14]. However, QVT-R has had limited uptake because of its complex semantics and specification style, compared to simpler unidirectional MT languages such as ATL [6] and ETL [7]. Triple Graph Grammars (TGG) are a graph transformation-based formalism which supports bx definition, although with a restricted expression language compared to OCL. Other approaches, such as JTL [5], and extensions of TGG [ 1 ], use constraint-based programming techniques to interpret relations P (s; t ) between source and target elements as specifications in both forward and reverse directions. Model finding using Alloy is used to bidirectionalise ATL in [12].

In this paper we describe an approach to bx definition using standard UML 2 elements (class diagrams, use cases, OCL) which avoids the use of specialised languages, and therefore is preferable from the viewpoint of industrial uptake, and for the long-term maintenance and reuse of bx. In our approach, unidirectional transformations are specified using use cases and OCL, then a staticallycomputed bx relation Post & Inv for is derived, together with a forward transformation ! extending , and an inverse of !. The bx relation and transformations are derived from using a higher-order transformation. This is generally more efficient than the use of constraint programming and model finding. 3

Patterns for Bidirectional Transformations

We have identified a number of patterns for bx, which can be used to establish the bx correctness conditions [11]: { Auxiliary Correspondence Model: maintain a detailed trace between source model and target model elements to facilitate change-propagation in source to target or target to source directions. { Cleanup before Construct: for R!, remove superfluous target model elements which are not in the transformation relation R with any source elements, before constructing target elements related to source elements, and similarly for R . { Unique Instantiation: Do not recreate elements t in one model which already correspond to an element s in the other model, instead modify the data of t to enforce the transformation relation. Use key attributes to identify when elements should be created or updated.

All three of these patterns are used in our approach to define bx using UML. In addition, we have used three further patterns to structure bx: { Factor a Text-to-text bx by a Model-to-model bx: this architectural pattern defines a text-to-text bx as a composition of parsing, model-to-model and model-to-text transformations in both directions. { Inverse Recursive Descent: if a forward transformation ! is defined using structural recursion on the source language SL, the reverse transformation may be derived from ! as a dual structural recursion on TL. { Bx Data Correspondence: a special case of Auxiliary Correspondence Model which defines an auxiliary association in SL ∪ TL to express a bx relation concerning specific elements or basic values (eg., a correspondence between expression operators in one language and the other). 4

Bidirectional Transformation Speci cation in

UML-RSDS UML-RSDS is a particular version of executable UML (xUML), based on UML 2 and OCL 2.4, with a formal semantics [8] and an established toolset [9]. Model transformations are specified in UML-RSDS as UML use cases, defined declaratively by two main predicates, expressed in a subset of OCL: 1. Postconditions Post which define the intended effect of the transformation at its termination. These are an ordered conjunction C1 & ::: & Cn of OCL constraints (also termed rules in the following) and also serve to define a design (activity) and implementation of the transformation. 2. Invariants Inv which define expected invariant properties which should hold during the transformation execution. These can be derived from Post , or can be specified explicitly by the developer. They are the constraints of the use case, considered as a UML classifier.

A UML activity stat (Cn) is derived by the UML-RSDS tools for each postcondition constraint Cn. This activity is constructed so that it establishes Cn: [stat (Cn)]Cn. The activity for Post is formed as a composition of the stat (Cn). In this paper, stat (Post ) is the sequential composition of the stat (Cn).

Data-dependency relations between the Post constraints are important in ensuring the correctness of both unidirectional and bidirectional UML-RSDS transformations. Let rd (Cn) denote the read-frame of a constraint Cn, the set of entity type names and data features that it reads, and wr (Cn) denote its write frame. These are defined in [8].

A dependency ordering Cn < Cm is defined between constraints by wr (Cn)∩ rd (Cm) ̸= {} “Cm depends on Cn”. A use case with postconditions C1; : : : ; Cn should satisfy the syntactic non-interference conditions: 1. If Ci < Cj , with i ̸= j , then i < j . 2. If i ̸= j then wr (Ci ) ∩ wr (Cj ) = {}.

Together, these conditions ensure that the activities stat (Cj ) of subsequent constraints Cj cannot invalidate earlier constraints Ci , for i < j .

A constraint is termed a type 1 constraint if its write and read frames are disjoint. Such constraints usually have an implementation as a bounded loop, as opposed to a fixed-point/unbounded iteration.

An invariant Inv and explicit inverse predicate Post can be derived mechanically from Post [9, 11]. This derivation is automated in the UML-RSDS tools. Tables 1 and 2 show some examples of inverses P of predicates P . The transformation developer can also specify inverses for particular Cn by defining a suitable Cn constraint in Inv , for example, to express that a predicate t :z = s:x + s:y should be inverted as s:x = t :z − s:y .

In terms of the framework of [16], the source-target relation R associated with a UML-RSDS transformation is Post & Inv . R is not necessarily bijective. The forward direction of is normally computed as stat (Post ): the UML activity derived from Post when interpreted procedurally [8]. However, in order to achieve the correctness and hippocraticness properties, Inv must also be considered: before stat (Post ) is applied to the source model m and target model n, n must be cleared of elements which fail to satisfy Inv . This is a case of the Cleanup before Construct pattern.

Given a typical postcondition Ci of the form P(s,t) t:g = s:f t:g = s:f →pow (x )

P (s; t) s:f = t :g s:f = t :g→pow (1:0=x )

Tj →exists(t | TCondj (t ) & Pi (self ; t ))

the corresponding invariant constraint is Invi : Si ::

TCondj (self ) ⇒ Si →exists(s | SCondi (s) & Pi (s; self ))

P(s,t) P (s; t ) Condition t:rr = s:r →reverse() s:r = t:rr →reverse() r ; rr ordered association ends t:rr = s:r → rst () s:r → rst () = t :rr r ordered association end t:rr = s:r →last () s:r →last () = t:rr r ordered association end t:rr = s:r →including(s:p) s:r = t:rr →front () & rr , r ordered association ends t:rr = s:r →append (s:p) s:p = t:rr →last () p 1-multiplicity end t:rr = Sequence{s:p1; s:p2} s:p1 = t:rr →at (1) & rr ordered association end s:p2 = t:rr →at (2) p1, p2 1-multiplicity ends t:rr = s:r →select (P ) s:r = (s:r − (s:r →select (P ) − t :rr ))→ r , rr merge(t :rr →select (P )) set-valued s:r = t:rr →asSet () r set-valued, rr ordered

TCondj (self ) &

not (Si →exists (s | SCondi (s) & Pi (s; self ))) ⇒ self →isDeleted () However, this constraint may delete more Tj instances than is necessary: not only instances which correspond (i) to deleted Si instances, or (ii) to Si instances that no longer satisfy SCondi , but also (iii) to Si instances satisfying SCondi , but not Pi (s; self ). In this last case, self should not be deleted, instead stat (Pi ) should be performed to re-establish Pi between the modified s and self . Therefore, some form of tracing is required, to distinguish these cases.

The Auxiliary Correspondence Model pattern can be used to introduce tracing relations between source and target entities. In general such relations can be many-many, and are implemented by an auxiliary association Si $rtrace — $trace Tj linking corresponding elements in the two languages. In the transformation specification, s:$trace can be referred to as s→equivalents(), and t :$rtrace as t →requivalents (). The computational overhead of maintaining such relations can be substantial, and instead we usually apply the Auxiliary Correspondence Model pattern to introduce String-valued identity attributes (primary keys) for source and target entities to record and identify which source and target instances correspond: source instance s corresponds to target instance t when s:sId = t :tId . This also means that Si [t :tId ] = s and Tj [s:sId ] = t . Finally, the Unique Instantiation pattern is used to modify existing objects in one model which correspond to a modified object in the other model, instead of creating a new object.

If identity attributes have been introduced using Auxiliary Correspondence Model, the cleanup constraints can be simplified to:

TCondj (self ) &

not (tId : Si →collect (sId )) ⇒ self →isDeleted () E :: Si :: and

TCondj (self ) & tId : Si →collect (sId ) & not (SCondi (Si [tId ])) ⇒ self →isDeleted ()

which correspond to cases (i) and (ii) above. This version is also potentially more efficient than the original cleanup constraint. The second case is omitted if SCondi is the default true. The updates of case (iii) are carried out in the Post constraints. ! is defined to have postconditions consisting of the cleanup constraints, followed by Post , and hence it is an extension of : its postconditions imply those of .

The constraints of Post need to be interpreted using Unique Instantiation in !: the E →exists(e | P ) quantifier in rule succedents should be interpreted as “create a new e : E and establish P for e, unless there already exists an e : E satisfying P ”. This is also known as ‘check before enforce’ semantics. If key attributes are defined, a constraint

F →exists(f | f :fId = eId & P ) means that if eId : F →collect (fId ) (a corresponding F instance F [eId ] already exists), then it is selected and modified to satisfy P – this performs case (iii) of the cleanup actions to re-establish Inv after a source model change. Only if eId ̸∈ F →collect (fId ) is a new F instance created. 5

Correctness and Hippocraticness Properties for ! In this section we give a formal justification that bx defined using the UML specification approach described above do satisfy the bx properties.

We consider first the case of UML-RSDS transformations that satisfy the Structure Preservation pattern [10], in which source language entities Si correspond to target language entities Ti , and whose constraints are of type 1, satisfying syntactic non-interference. Identity attributes are used to implement Auxiliary Correspondence Model. This means that is a separate-models transformation with source language S and target language T , and postcondition Post is an ordered conjunction of constraints Ci each of the form:

SCondi (self ) ⇒ Ti →exists(t | t :tId = sId & TCondi (t ) & Pi (self ; t ))

and Inv is a conjunction of constraints Invi of the form

TCondi (self ) ⇒ Si →exists(s | tId = s:sId & SCondi (s) & Pi (s; self ))

and ! are therefore:

TCondi (self ) & tId : Si →collect (sId ) & not (SCondi (Si [tId ])) ⇒ self →isDeleted ()

The second constraint is omitted if SCondi is the default true predicate. We denote the collection of the cleanup constraints by Cleanup.

Bx correctness holds, since the cleanup constraints together with Post establish Inv : the Cleanup constraints remove any target model elements that fail to correspond by identity to source model elements in the domain of , and Post modifies target model elements that do correspond to valid source elements, in order to re-establish Inv . By construction, stat (Post ) preserves Inv and establishes Post . Therefore, the sequential composition of stat (Cleanup) and stat (Post ) establishes the bx relation R as Post & Inv .

Hippocraticness holds, since stat (Cleanup) has no effect if the models already satisfy R, and neither does stat (Post ), due to the Unique Instantiation interpretation of the exists operator. The principle of least change is satisfied, since changes to the source model either lead to deletion of exactly those target elements (and their incident links) which cannot be modified to correspond to source elements (cases (i) and (ii) above), or to creation/modification of target elements, using stat (Ci ). But stat (Ci ) is designed as a minimally-intrusive update to the target model which will establish Ci . Interpreting a quantifier Tj →exists (t | P ) using Unique Instantiation means that un-necessary creation of target elements is avoided.

A more complex case is Entity Merging, where two or more source entity types Si , Sj may map to the same target entity type Ti . Provided that the respective TCondi conditions are pairwise disjoint, the same construction and argument for the bx correctness apply as in the Structure Preservation case. Semantic non-interference of Post holds, because the set of Ti instances produced from Si is disjoint from the set produced from Sj , although syntactic noninterference fails1. Entity Splitting involves one source entity type mapping to two or more different target entity types. There are two versions of this pattern: (i) two or more separate postconditions on Si , with disjoint SCondi conditions, mapping to distinct target entity types; (ii) a single constraint which maps one Si instance to multiple linked instances of different Tj . The first case can be treated in a similar way to the Structure Preservation case, provided that syntactic or semantic non-interference of Post holds. The second case is common in 1 A use case satisfies semantic non-interference if for i < j : Ci ⇒ [stat (Cj )]Ci , where [act ]P is the weakest-precondition of P with respect to act . Syntactic noninterference implies semantic non-interference, but not conversely. refinement transformations, and in some migrations. Constraints, eg., C1, will have the form

SCond1(self ) ⇒ T1→exists (t 1 | t 1:t 1Id = sId &

T2→exists (t 2 | t 2:t 2Id = sId & TCond1(t 1) &

TCond2(t 1; t 2) & P1(self ; t 1; t 2))) Both t 1 and t 2 are given id values self :sId . This should be the only constraint which creates either T1 or T2 instances (this restriction is implied by syntactic non-interference). Inv1 is of the form

TCond1(self ) & t 2 : T2 & t 2:t 2Id = t 1Id & TCond2(self ; t 2) ⇒

S1→exists (s | t 1Id = s:sId & SCond1(s) & P1(s; self ; t 2)) The t 2 : T2 expression acts like an additional T2→forAll (t 2 quantifier over the remainder of the constraint.

The cleanup constraints are: S1 :: T1 :: T1 ::

TCond1(self ) & t 2 : T2 & t 2:t 2Id = t 1Id & TCond2(self ; t 2) & not (S1→exists (s | t 1Id = s:sId & SCond1(s))) ⇒

self →isDeleted () & t 2→isDeleted () The effect of such a bx is that a ‘cluster’ of target instances are related to each source instance, with all elements of the cluster sharing the source instance identity value.

Again, stat (Post ) establishes Post and preserves Inv , whilst stat (Cleanup) together with stat (Post ) establishes Inv . Thus correctness follows. Hippocraticness holds because target model pairs (or clusters) t 1, t 2, ... are only deleted if they do not correspond to a source instance. Unique Instantiation can apply to multiple exists quantifiers in the same manner as to single quantifiers. The principle of least change follows from the minimality of stat (Ci ) as an activity to establish Ci .

Finally, the Map Objects before Links pattern can be used. This separates the construction of target instances, and the linking of these instances. The linking constraints have the form:

S1 ::

Tj [idS1]:rr = TRef [r :idSRef ] These define target model association ends rr from source model association ends r , looking-up target model elements Tj [idS1] and TRef [r :idSRef ] which have already been created by a preceding constraint. However, the linking constraints are not invariants of the transformation, and are omitted from Inv . The cleanup constraints for such are defined as for the preceding cases, and by a similar argument, bx correctness, hippocraticness and least change can be shown. The re-establishment of Post and Inv for corresponding source and target elements is now performed by possibly a number of separate Post constraints.

A similar analysis applies to the reverse transformation . This has postcondition Post , the explicit form of Inv . Post may involve view updates [ 2 ] of the source language SL, based on changes to TL elements. For example, predicates s:f →last () = t :g or s:f →select (P 1) = t :g .

stat (Post ) establishes Inv , whilst the cleanup constraints together with Post establish Post . Thus satisfies correctness. Hippocraticness follows since neither the cleanup or Post constraints modify the source model if R already holds. 6

Evaluation

The approach described in this paper enables transformation developers to focus on writing correct postconditions Post for a conventional (unidirectional) forward transformation . From these, the necessary invariants and cleanup constraints for bx execution mode can be automatically generated by a higher-order transformation, producing a bx relation Post & Inv and forward and reverse transformations with respect to this relation. The forward transformation extends . This is the same approach as taken for the bidirectionalisation of ATL in [12]: from a unidirectional forward ATL transformation , a bx relation T and transformations T ! and T are derived, where T ! extends . In contrast to [12] we use the minimality of the activity stat (P ) to achieve the principle of least change, rather than reliance upon an external formalism (Alloy). Metamodel constraints are incorporated into the bx relation by [12]. Our approach is instead to prove separately that Post & Inv ⇒ ThTL, where ThTL are the target language constraints.

By using standard UML notations and concepts, our approach should enable bx specifications to be retained and maintained independently of particular transformation technologies. The use of OCL for transformation specification facilitates the automated derivation of transformation invariants and reverse rules. Our approach scales up to transformations of practical size, and we have applied it to a complex UML to C code generator with over 150 mapping rules. The approach defined here could also potentially be used with other similar specification formalisms, such as Alf/fUML [15] and EVL (http://www.eclipse.org/epsilon/doc/evl). 3. M. Beine, N. Hames, J. Weber, A. Cleve, Bidirectional transformations in database evolution: a case study `at scale', EDBT/ICDT 2014, CEUR-WS.org, 2014. 4. J. Cheney, J. McKinna, P. Stevens, J. Gibbons, Towards a repository of bx examples, EDBT/ICDT 2014, 2014, pp. 87–91. 5. A. Cicchetti, D. Di Ruscio, R. Eramo, A. Pierantonio, JTL: a bidirectional and change propagating transformation language, SLE 2010, LNCS vol. 6563, 2011, pp. 183–202. 6. F. Jouault, F. Allilaire, J. Bezivin, I. Kurtev, ATL: a model transformation tool,

Science of Computer Programming, vol. 72, no. 1, pp. 31–39, 2008. 7. D. S. Kolovos and R. F. Paige and F. Polack, The Epsilon Transformation Language, ICMT, 2008, pp. 46–60. 8. K. Lano, S. Kolahdouz-Rahimi, Constraint-based speci cation of model transformations, Journal of Systems and Software, vol. 88, no. 2, February 2013, pp. 412–436. 9. K. Lano, The UML-RSDS Manual, www.dcs.kcl.ac.uk/staff/kcl/uml2web/umlrsds.pdf, 2016. 10. K. Lano, S. Kolahdouz-Rahimi, Model-transformation Design Patterns, IEEE

Transactions in Software Engineering, vol 40, 2014. 11. K. Lano, S. Kolahdouz-Rahimi, S. Yassipour-Tehrani, Patterns for Specifying Bidirectional Transformations in UML-RSDS, ICSEA 2015. 12. N. Macedo, A. Cunha, Least-change bidirectional model transformation with QVT

R and ATL, Softw. Syst. Model, 2014, doi:10.1007/s10270-014-0437-x. 13. L. Meertens, Designing constraint maintainers for user interaction, Third Workshop on Programmable Structured Documents, Tokyo University, 2005. 14. OMG, MOF 2.0 Query/View/Transformation Speci cation v1.1, 2011. 15. OMG, Action Language for Foundational UML (ALF), v1.0.1, 2015. 16. P. Stevens, Bidirectional model transformations in QVT: semantic issues and open questions, SoSyM, vol. 9, no. 1, January 2010, pp. 7–20. 17. P. Stevens, A simple game-theoretic approach to checkonly QVT-Relations, Softw.

Syst. Model vol. 12, no. 1, 2013, pp. 175–199.

Anjorin ,

Varro ,

Schurr , Complex attribute manipulation in TGGs with constraint-based programming techniques , BX 2012 , Electronic Communications of the EASST vol . 49 , 2012 .

Bancilhon ,

Spyratos , Update semantics of relational views , ACM Transactions on Database Systems , vol. 6 , no. 4 , pp. 557 - 575 , 1981 .