-

Existing and New Ideas on Least Change Triple Graph Grammars

Milica Stojkovic

Sven Laux

Anthony Anjorin

aanjoring@mail.uni-paderborn.de 0 0 Paderborn University

2017

903

At least two actively developed model synchronization frameworks employ a conceptually similar algorithm based on Triple Graph Grammars as an underlying formalism. Although this algorithm exhibits acceptable behavior for many use cases, there are still scenarios in which it is sub-optimal, especially regarding the \least change" criterion, i.e., the extent to which models are changed to restore consistency. In this paper, we demonstrate this with a minimal example, discuss various suggestions from literature, and propose a novel approach that can be used to address this shortcoming in the future.

Folder name: String structure and package hierarchy. The same attribute condition f.name == p.name is used in R1 and R2 to require identical names of corresponding Folders and Packages. The model triple depicted in Fig. 2 is said to be consistent with respect to the TGG consisting of rules R1 and R2, because it can be created by applying R1 to the empty triple graph, then R2, and R2 again as required.

Assume the folder structure is now updated by deleting f2 and reconnecting f1 directly with f3. To restore consistency, this change must be forward propagated to the package hierarchy. According to Hermann et al. [ 3 ], forward synchronization is performed by applying three auxiliary functions: Forward alignment (fAln), then deletion (Del) and nally forward addition (fAdd) (backward synchronization is analogous).

The task of fAln is simply to delete all \dangling" correspondence elements (referred to as \corrs"), i.e., all corrs that were previously connected to deleted source elements. As we deleted the folder f2, the corr connecting it to p2 will be dangling and must be deleted as well. This situation is depicted in Fig. 4, showing the model triple after the source update but with the corr to be deleted highlighted in red (grey in a b/w printout).

The task of Del is to determine a maximal consistent sub-triple of the current state after applying fAln. To compute this maximal consistent sub-triple, Del attempts to mark the entire triple graph by applying suitable TGG rules. This marking process is visualized in Fig. 5 using checkboxes to represent the (un)marked state of all nodes and the edges that connect the individual triples accordingly (edges can be uniquely inferred for the example). The root elements f1, p1, and their connecting corr can be successfully marked with rule R1 (successfully applied rules are denoted in square brackets). The remaining elements cannot be marked with any rule, however, and are left unmarked. When the marking process terminates, Del deletes all unmarked elements (highlighted in Fig. 5 in red/grey) in the correspondence and target models.

In a nal step, the third function fAdd applies TGG rules to transform all unmarked elements in the source model, extending the correspondence and target models with new elements in the process. The nal consistent state after applying fAdd is depicted in Fig. 6, with the newly created elements highlighted in green (grey).

While the algorithm is correct, complete, stable, and invertible (cf. [ 3 ] for all details), our simple example shows clearly that it makes a rather rough approximation of the elements that must be deleted (and recreated) to ensure consistency and thus performs unnecessary changes. For a model triple with a very deep folder structure and corresponding package hierarchy, removing the second folder as we did would result in the deletion and identical re-creation of all sub-packages. For a realistic example, this will incur substantial information loss, as the sub-packages will probably have attribute values and other contents such as les with manually written code, which cannot be re-generated from the source model. Even without rigorously formalizing least change, this is clearly sub-optimal and most probably unwanted synchronization behavior. 2

Existing Ideas on How to Improve Least Change Behavior for TGGs An approach to improve the TGG algorithm is suggested in [ 2 ]. The idea is not to delete elements right away with Del, but rather to mark them for deletion and try to reuse them later with fAdd instead of re-creating them. This breaks the clear separation of fAln, Del and fAdd, allowing the functions to work together during the synchronization process. Applying this idea to our example, the improved algorithm would mark the target element p2 and its connected corr for deletion without actually deleting any of those elements (resulting in a \lazy" fAln+Del). During translation, before actually (re-)creating any new elements, fAdd can now rst check in the \deletion pool" if a suitable element can be recycled. In this manner, the package p3 and its corr to the folder f3 would only be marked for deletion in Fig. 5 and then re-used (instead of being re-created as before). The process is nalized by deleting all elements remaining in the deletion pool, in this case the package p2 and its dangling corr. While this can potentially avoid information loss by recycling elements, choosing the \right" elements to re-use makes the whole process highly non-deterministic and could either overwhelm end-users if this freedom of choice is left open, or surprise them if the decision is internalized and automated.

A di erent approach proposed by Hildebrandt [ 4 ] is to specify or automatically derive additional \ x rules" for the sole purpose of restoring consistency locally. This idea can be applied to the algorithm by Hermann et al. [ 3 ] by replacing fAln with a function fix, that attempts to restore consistency locally. Such x strategies can potentially enlarge the maximal consistent sub-triple determined subsequently by Del. In our example, such a local fix would be to mirror the update in the target model by deleting p2 (with its dangling corr) and reconnecting p1 with p3. In this case, Del would be able to mark the entire triple, avoiding unnecessary deletion and re-creation. It is, however, unclear how to derive these x rules automatically from arbitrarily complex TGGs, or if they are manually speci ed, how to guarantee that they do not contradict the original TGG.

Finally, practical guidelines for developing a TGG are presented by Anjorin et al. [ 1 ], including a \best practice" guideline concerning least change. Inspecting rule R2 in Fig. 3, we can see that sub-packages depend on their parent package as context. This means that the existence of the parent package is a precondition for the existence of the sub-package. For our example, however, this context dependency is not absolutely necessary; Anjorin et al. explain that dependencies like this should be avoided by splitting such rules into an \island rule" that only creates elements (a folder and corresponding package) and another \bridge rule" that adds the required edges (connecting the folder and package to their parents). Applying this guideline to our example replaces the \extension rule" R2 in Fig. 3, with the bridge R2' depicted in Fig. 7. According to the least change guideline [ 1 ], bridges are better than extensions as they re ect the actual context dependencies more accurately. With this \improved" TGG, synchronization is performed as follows: fAln behaves the same as in Fig. 4; when computing the maximal consistent sub-triple, however, Del can now mark the elements below the deleted folder f2 using rule R1 (Fig. 8). This means that only the package p2 and its incident edges are deleted. After deleting these elements, fAdd can now translate the unmarked edge connecting f1 and f3 with the bridge rule R2' (Fig. 9).

R2' (Bridge)

Although this works for our example, there are two issues: Firstly, replacing extension with bridge (and island) rules can change the language generated by the TGG, resulting in synchronizers that now accept and propagate previously invalid updates to produce results that are inconsistent with respect to the original TGG. As an example, note that it becomes possible to create arbitrary graphs of folders and packages using the bridge rule R2', while the original extension rule R2 could only generate trees. In this case, suitable application conditions for R2' could be used to solve the problem, but this requires challenging, manual e ort with no guarantees that both TGGs are equivalent. Secondly, although the refactored TGG might be more suitable for synchronization, TGGs are used for other consistency management tasks where this change might have a negative impact, e.g., consistency checks or model generation. The TGG developer might also prefer a certain style of speci cation, e.g., for readability reasons, and should not be forced to adhere to a certain structure for rather technical reasons. 3

Combining Optimization Techniques and TGGs to Improve Least Change Our idea stems mainly from the observation that Del establishes a maximal consistent sub-triple, which must be in the language generated by the TGG. For large TGG rules (e.g., extension instead of island and bridge rules), this leads to coarse, rigid steps in the synchronization process. The algorithm can thus be improved by automatically decomposing a given TGG into a new, more exible TGG consisting of rules that are as small as possible. While this enables taking smaller, more precise steps in the synchronization process, appropriate constraints ranging over valid rule application sequences must be automatically generated to ensure that the nal result is again in the language of the original TGG. To demonstrate why such constraints are required, we now discuss the forward propagation of an inconsistent update: instead of deleting the folder f2, let us assume that our le system allows us to add a symbolic link between folder f1 and folder f3 as depicted in Fig. 10, but that a corresponding modi cation is not allowed in the language of packages.

A tool based on our original TGG would refuse to forward propagate this update, as there is no rule that can create edges without creating the corresponding sub-folders and sub-packages. Using the decomposed bridge rule R2', however, the invalid change would be propagated to the target model, resulting either in an inconsistent triple or, perhaps worse, in a runtime exception as the target model cannot be manipulated in this manner. In order to correctly detect and reject this inconsistent update, we will use the following four steps: 1. Decompose the original TGG rules to derive a new TGG and corresponding constraint derivation logic. 2. Start the synchronization process using the decomposed rules and collect all possible rule applications. 3. Generate speci c constraints based on the constraint derivation logic from Step 1. 4. Solve the constraint problem to make a guaranteed consistent choice of rule applications from Step 2.

The rst step is to decompose complex rules into smaller ones. Extension rules can, for example, be split into island and bridge rules, which can themselves be decomposed further if they create multiple elements. For our example, extension rule R2 can be decomposed to produce the bridge rule R2' depicted in Fig. 7. To ensure that the language of the TGG is not changed by this decomposition, constraints demanding that every application of R2' be exclusively paired with a suitable application of R1 (creating sf and sp) must be derived during the synchronization process to ensure that a decomposed sub-derivation =R)1=R)20 can be fused to an equivalent

R2 derivation =) with the original TGG rule. This constraint derivation logic is produced during the decomposition and stored for later use during synchronization.

(a6)

The next step concerns the synchronization process: Once the change has been applied, the decomposed language is used to collect all existing and potential rule applications. As depicted in Fig. 10, we can annotate each existing rule application by introducing corresponding variables a1 to a5. Furthermore, we also annotate the potential rule application a6. Note that instead of performing any propagation directly, we only collect all existing and potential rule applications before considering additional constraints to guide our nal choice.

Constraints are now generated based on the constraint derivation logic obtained from the decomposition in Step 1. For our example, we have to make sure that every time R2' is used to mark elements, there are complementary applications of R1 marking the required context elements. To accomplish this there are two constraints that have to be taken into account: (1) context rule application constraints ensure that context requirements are ful lled (e.g., a4 depends on a1 and a2), while (2) exclusive rule application constraints ensure that the elements are marked only once. This ensures here that only trees and not graphs can be created. In our example, both a5 and the potential rule application a6 depend on a3, but our constraint derivation logic demands that a3 can only be combined with one of them, hence the derived constraint a5 XOR a6.

In a nal step, consistency is ensured by solving an optimization problem over the set of constraints generated in Step 3. To in uence this step, di erent objective functions can be speci ed depending on the desired outcome. For our example it would be reasonable to enforce least change by maximizing the number of marked elements, but in other scenarios one could prefer most recent changes, or ask the user to decide. For the situation depicted in Fig. 10, there exists no solution for which all elements are marked. Possible optimal solutions include rejecting the update (analogously to synchronization with the original TGG), and suggesting that a5 be revoked instead.

The underlying idea of combining standard optimization techniques (e.g., SAT or ILP solvers) with TGGs has already been proposed by Leblebici et al. [ 6 ], and promising rst results for the task of consistency checking have been obtained. Our suggestion for future work is to apply this technique to the task of synchronization as a means of improving the least change related quality of current TGG-based synchronizers.

Our suggestion generalizes and automates the island/bridge guideline, solving the two current problems as follows: Appropriate constraints are automatically derived to guarantee that the nal result is again consistent with respect to the original TGG thus allowing intermediate, potentially inconsistent states and the exibility for least change optimization. The TGG speci ed by the user remains unchanged as the optimal form for synchronization is derived automatically as part of the compilation process.

[1]

Anthony

Anjorin , Erhan Leblebici, Roland Kluge, Andy Schurr, and Perdita Stevens. A Systematic Approach and Guidelines to Developing a Triple Graph Grammar . In Alcino Cunha and Ekkart Kindler, editors, BX 2015 , volume 1396 , pages 81 { 95 . CEUR-WS.org, 2015 .

[2]

Joel

Greenyer ,

Sebastian

Pook , and

Jan

Rieke . Preventing Information Loss in Incremental Model Synchronization by Reusing Elements . In Robert B. France, Jochen M. Kuester , Behzad Bordbar, and Richard F. Paige, editors, ECMFA 2011 , volume 6698 , pages 144 { 159 . Springer, 2011 .

[3]

Frank

Hermann , Hartmut Ehrig, Fernando Orejas, Krzysztof Czarnecki, Zinovy Diskin, Yingfei Xiong, Susann Gottmann, and Thomas Engel. Model Synchronization Based on Triple Graph Grammars: Correctness, Completeness and Invertibility. Software and Systems Modeling , 14 ( 1 ): 241 { 269 , 2015 .

[4]

Stephan

Hildebrandt . On the Performance and Conformance of Triple Graph Grammar Implementations . PhD thesis , University of Potsdam, 2014 .

[5]

Cheney

James , Jeremy Gibbons, James McKinna , and Perdita Stevens . Towards a Principle of Least Surprise for Bidirectional Transformations . In Alcino Cunha and Ekkart Kindler, editors, BX 2015 , volume 1396 , pages 66 { 80 . CEUR-WS.org, 2015 .

[6]

Erhan

Leblebici . Towards a Graph Grammar-Based Approach to Inter-Model Consistency Checks with Traceability Support . In Anthony Anjorin and Jeremy Gibbons, editors, BX 2016 , volume 1571 , pages 35 { 39 . CEUR-WS.org, 2016 .