Automated Metamodel/Model Co-Evolution using a Multi-Objective Optimization Approach Wael Kessentini GEODES-DIRO University of Montreal, Canada kessentw@iro.umontreal.ca Abstract—Metamodels undergo many changes during the evo- A. Background: Metamodels and Models lution of several software modeling languages and projects. As a consequence, models have to be updated for preserving Metamodels are the means in MDE to specify the abstract their conformance with the new metamodel versions. A common syntax of modeling languages [3]. For defining metamodels, practice is to manually define rules for each metamodel evolution there are meta-modeling standards (such as MOF, Ecore) to co-evolve the corresponding models. In this paper, we propose available which are mostly based on a core subset of the UML a generic automated approach for the metamodel/model co- class diagrams, i.e., there are classes, attributes, and references evolution. In our approach, we view the co-evolution as a multi- objective optimization problem, and we solve it using the NSGA- used to specify the modeling languages, i.e., the intentional II algorithm. Our algorithm search for solutions that minimize description of all possible models of a given language. These (1) the non-conformities with the new metamodel version, (2) the metamodels are instantiated to produce models which are in changes to the existing models, and (3) the loss of information. essence object graphs, i.e., consisting of objects (instances of We successfully evaluated our approach on the evolution of the classes) representing the modeling elements, object slots for well-known UML state machine metamodel. Index Terms—Metamodel/model co-evolution, Model migra- storing values (instances of attributes), and links between the tion, Coupled evolution, NSGA-II objects (instances of references), which have to conform to the UML class diagram describing the metamodel. Therefore, I. I NTRODUCTION the abstract syntax of models is often represented in terms of UML object diagrams. In order for a model to conform to Model-Driven Engineering (MDE) considers models as its metamodel, there have to be several constraints fulfilled. first-class artifacts during the software lifecycle [1]. Available These constraints are normally referred to as conformsTo techniques, approaches, and tools for MDE are growing and relationship [4], [5]. they support a huge variety of activities, such as model To make the conformsTo relationship more concrete, we give creation, model transformation, and code generation. As in an excerpt of the constraints concerning objects in models MDE the modeling languages are explicitly modeled and and their relationship to classes in metamodels. Objects are therefore, as any other model changeable, the evolution of instantiated from classes. Thus for each referred type of an models often depends on the evolution of their metamodels. object in a given model, a corresponding class must exist in Metamodels are subject to many changes during the evolution the metamodel (name equivalence) and the corresponding class of software modeling languages and language maintenance must not be abstract. Such constraints may be formulated in projects, especially when it comes to domain-specific mod- in the following way: eling languages [2]. Thus, models have to be updated for context M!Object: preserving their conformance with the new metamodel version. inv typeExists: The remainder of this paper is structured as follows. Sec- MM!Class.allInstances() -> exists(c|c.name = self.type and not c.abstract) tion II provides the background of model co-evolution and demonstrates the challenges addressed in this paper based on a An example metamodel and corresponding model is shown motivating example. In Section III, we give an overview of our in Figure 2a and Figure 1a, respectively. This simple language proposal and explain how we adapted the NSGA-II algorithm allows to define simple state machines consisting of states to find optimal new models. Section IV discusses the design having a name and predecessors as well as successor states. and results of the empirical evaluation of our approach. After surveying related work in Section V, we conclude with some B. Metamodel/Model Co-Evolution: A Motivating Example pointers to future work in Section VI. While some metamodels, such as UML, are standardized and changed rarely, metamodels for Domain-Specific Model- II. M OTIVATING E XAMPLE ing Languages (DSMLs) [6], representing concepts within a This section introduces the background, namely the basic certain domain, are frequently subject to change [2]. notions of metamodels and models, the conformsTo relation- As most of the current metamodeling frameworks are cur- ship, as well as an example to demonstrate the challenges one rently strict in the sense that only fully conformed models can is facing when dealing with metamodel/model co-evolution. be used, metamodel evolution requires to co-evolve the already Fig. 1. Example Model Evolution Fig. 2. A Simplified Metamodel Evolution Example existing model instances, which may no longer conform to the III. M ODEL C O - EVOLUTION : A M ULTI -O BJECTIVE new metamodel version. In such cases, model migration scripts P ROBLEM have to be provided in current tools [7], to re-establish the A. Overview conformance between models and their metamodels. However, finding the best migration scripts to co-evolve the models is The goal of our approach is to derive a tentative new left to the user of such tools or default migration scripts are version, of an existing model, that conforms to a new version provided. However, exploring the actual co-evolution space is of its original metamodel. We view this derivation as a search still an open challenge. in the space of all possible sequences of modifications of the original models. The search is guided by three objectives, Figure 2 shows an example of a simplified metamodel which aims at minimizing (1) the number of non conformities evolution, based on the simple State Machine language. The with the new version of the metamodel, (2) the number of the metamodel evolution comprises three steps: extract sub-classes changes to the initial model and (3) the loss of information for State class resulting in InitialState, SimpleState, and Fi- after modifying the initial model. In other words, the generated nalState, make class State abstract, and push-down, create new revised model has to be similar as much as possible to and refine the cardinalities of the predecessor/successor ref- the initial model while conforming to the new metamodel erences for the subclasses. This results in the fact that besides version. Therefore, we implemented our idea in the form other constraints violations, the constraint shown previously of a multi-objective optimization algorithm that computes is violated when considering the initial model shown in an optimal sequence of edit operations representing the best Figure 1a and its conformance to the new metamodel version trade-off between the three objectives. More concretely, our in Figure 2b. algorithm takes as inputs the initial and revised versions of the To re-establish conformance for the given example, assume metamodel, a list of models to update and a list of possible for now that only two operations on models are used in edit operations to apply to the models (retype and delete). It this context. Non-conforming objects may either be retyped generates as output a sequence of edit operations that should (reclassified as instances of the concrete classes) or deleted. be applied to the initial version of the model to generate a Thus, the potential solution space for retyping or deleting new version compatible with the new metamodel version. non-conforming elements contains (c + 1)O solutions (with The space of all possible sequences of editing operations c=number of candidate classes + 1 for deletion, o=number of can be very large, especially when dealing with large models. nonconforming objects). An exhaustive search method could be inefficient in this cases. Alternatively, we use a heuristic search with a multi-objective This means, in our given example, we would end up with 64 evolutionary algorithm, NSGA-II [8]. In the next paragraphs, possible co-evolutions while one (probably the preferred one) we describe the adaptation of the generic NSGA-II algorithm of these is shown in Figure 1b. This one seems the preferred to the co-evolution problem. one due to several reasons such as the number of changes introduced (e.g., loss of information by deleting some model B. Adapting NSGA-II for Model Co-Evolution elements) and the number violated conformance constraints. NSGA-II [8] is one of the most-used multi-objective evo- To this end, we propose in this paper to consider the model lutionary algorithms (EAs) in tackling real-world problems, co-evolution problem as a multi-objective one to find a balance including software engineering ones [9], [10] to find trade- between the consistency with the previous version of the model offs between different objectives. It begins by generating an and the conformance to the new metamodel. offspring population from a parent one by means of variation operators (crossover and mutation) such that both populations Thus, we only have creation operations, but not retyping ones. have the same size. After that, it ranks the merged population Although we are not supposed to delete elements directly (parents and children) into several non-dominance layers, in the initial model, we use deletion operations to “repair” called fronts. candidate solutions during the evolution, i.e., deleting an Non-dominated solutions are assigned a rank of 1 and con- element that was previously created. A candidate solution stitute the first layer (Pareto front). After removing solutions (individual) is then represented with a vector whose elements of the first layer, the non-dominated solutions form the second are the creation/deletion operations to create the new version layer and so on and so forth until no non-dominated solutions of a model from the initial one. Each edit operation specifies remain. After assigning solutions to fronts, each solution is the model element to which it is applied. Consequently, vectors assigned a diversity score, called crowding distance, inside encoding different solution candidates may have different sizes each front. This distance defines a partial ranking inside the depending on the number of operations in the sequences. front which aims, later, at favoring solutions that are far from The proposed algorithm first generates a population of the others in terms of objective values. A solution is then random operation sequences (solution candidates), which are characterized by its layer and its crowding distance inside the used in the subsequent iterations to produce new solutions. layer. Solution derivation. In a search algorithm, the variation To finish an iteration of the evolution, we perform the operators play a key role of moving within the search space environmental selection to form the parent population for the with the aim of driving the search towards better solutions. next generation by picking half of the solutions. The solutions In each iteration, we select population size/2 individuals are included iteratively from the Pareto front to the lowest from the population popi to form population popi+1 . These layers. If half of the population is reached inside a front (population size/2) selected individuals will produce other than the crowding distance is used to complete the parent (population size/2) new individuals using a crossover and population. mutation operators. To select parents for reproduction, we used 1) Problem formulation: The model co-evolution problem the principle of the roulette wheel [11]. According to this involves searching for the best sequence of edit operations to principle, the probability to select an individual for crossover apply among the set of possible ones. A good solution s is a and mutation is directly proportional to its relative fitness in sequence of edit operations to apply to an initial model with the population. the objectives of minimizing the number of non-conformities We use a one-point crossover operator. For our problem, nvc with the new metamodel version, the number of changes this operator split each parent operation sequence S1 (resp. nbOp to the initial model, and the loss of information disIm S2) into two subsequences {S11 , S12 } (resp. {S21 , S22 }) ac- between  the initial and the evolved models. Formally: cording to a cut position k. Then, it combines the subsequences M inimize f1 (s) = nvc(s)  to create to sibling solutions {S11 , S22 } and {S21 , S12 }. Our M inimize f2 (s) = nbOp(s) crossover operator could create a child sequence that contains  conflict operations. In this case, it will be penalized by the M inimize f1 (s) = disIm(s)  component nvc of the fitness function. The first fitness function nvc(s) counts the number of vio- The mutation operator consists in randomly selecting one lation constraints w.r.t. the evolved metamodel after applying or two operations in a solution vector and modifying them. a sequence s of edit operations. Two modifications are used: (1) swapping the two selected We used in our experiments the implementation constraints operations in the sequence or (2) replacing an operation by a proposed by Schoenboeck et al. [4]. randomly created one. For the two remaining objectives, which aims at minimizing Solution evaluation. As mentioned in the problem formu- the changes to the initial models, we used simple approxima- lation, a solution is evaluated according to three objectives. tions. nbOp(s) is simply the number of editing operations in a Thus, for each solution s, we calculate nvc(s), nbOp(s), sequence s. disIm(s) measure the difference in size between and disIm(s). These values are used later to establish the the initial Mi and the revised Mr models. If the difference dominance relation between solutions. is negative or null then disIm(s) = 0 (no information loss). Otherwise, disIm(s) = size(Mi ) − size(Mr ). IV. VALIDATION 2) NSGA-II Application : To adapt NSGA-II to our prob- A. Research Questions lem, it is necessary to define (1) how to represent a co- evolution solution, (2) how to derive new solutions from The validation study was conducted to quantitatively and existing ones, and (3) how to evaluate a solution. qualitatively assess the completeness and correctness of our Solution representation. We mentioned earlier that a solu- co-evolution approach when applied to realistic settings and tion is a sequence of edit operations to be applied to the model to compare its performance with an existing deterministic to evolve. This can be seen as an inplace transformation. How- approach [12]. More specifically, we aimed at answering the ever, as we are dealing with two versions of a metamodel, we following research questions: decided to represent a solution as an outplace transformation. • RQ1: To what extend the obtained results are attributable Rather than editing the initial model, we create a new model. to our approach and not to the fact of exploring a large number of solutions? If a random search, exploring a same number of solutions as our approach, gives same or better results, this means that there is no need to use a metaheuristic search. • RQ2: To what extent can the proposed multi-objective approach co-evolve models to make them comply with a new metamodel version (in terms of correctness and completeness of proposed edit operations)? B. Experimental Setting 1) Studied Meta-models and Models: To answer the four research questions, we considered the evolution of UML State Machine Metamodel from version 1.4 to 2.0 [13]. Therefore, Fig. 3. Average correctness results of NSGA-II, GA, Wimmer et al., Random the two versions were manually analyzed to determine the Search on the 10 models. The results were statistically significant on 30 actually applied changes. Additionally, we collected from independent runs using the Wilcoxon rank sum test with a 95% confidence previous work [4], [12] all the edit operation types that can be level (α < 5%). applied to models for this metamodel evolution. We also se- lected 10 models from version 1.4 and evolved them manually to version 2.0, according to the collected edit operation types. The manually defined sequences for the selected models are used as baseline sequences for the calculation of precision and recall scores. 2) Evaluation Metrics: To compare our approach with the other alternatives, we use precision and recall measures. For an operation sequence corresponding to a given solution, precision indicates the fraction of correctly edit operations (w.r.t. the baseline sequence) among the set of all operations in the sequence. Recall is the fraction of correctly identified edit operations among the set of all expected operations. Roughly speaking, the precision represents the probability that Fig. 4. Correctness results of NSGA-II on the 10 State Machine models. a detected operation is correct whereas the recall is seen as the probability that an expected operation is detected. Both values range from 0 to 1, with higher values indicating good and mono-objective algorithms. For the random search, we solutions. generated 1000 x 100 solutions and took the one with the best The baseline sequences do not represent unique evolution fitness. This ensures that the tree alternatives (multi-objective, solutions for the used models. Indeed, more than one alter- mono-objective, and random) explore the same number of native can be possible to evolve a given model. Thus, in solutions to have a fair comparison. addition to automatic precision (AC-PR) and recall (AC-RE), we calculated a manual precision (MC). For manual precision, C. Results rather than comparing automatically the produced sequence Results for RQ1. We do not dwell long in answering with the expected one, we checked operation by operation if the first research question (RQ1) that involves comparing our they are correct w.r.t. the evolved model. approach based on NSGA-II with random search. Figure 3 3) Statistical Tests: Since the used metaheuristic algorithms confirm that using NSGA-II (as well as the GA and the (NSGA-II and GA) are stochastic by nature, different execu- deterministic algorithm) produce results by far better (and tions may produce different results for the same model with the statistically significant) than just randomly exploring a com- same execution parameters. For this reason, our experimental parable number of solutions. NSGA-II has precisions (AC-PR study is performed based on 30 independent simulation runs and MC) and recall (AC-RE) more than twice higher than and the obtained results by the alternative approaches are the ones of random search as shown in Figure 3 (∼ 85% vs compared using the Wilcoxon rank sum test [14] with a 95% ∼ 40%). confidence level (α = 5%). To formally answer RQ1, we state that there is an empirical 4) Parameter Settings: Parameter setting has a significant evidence that the quality of the co-evolution results obtained influence on the performance of a search algorithm. We are due to our multi-objective approach and not to the number tried different values to set the execution parameters. For of explored solutions. the reported results, we used crossover probability = 0.8, Results for RQ2. We evaluated the averages AC-PR, AC- mutation probability = 0.5, population size = 100, number of RE and MC scores for non-dominated co-evolution solutions iteration = 1000. The same parameters were used for the multi- proposed by NSGA-II. For the precision and recall, Figure 4 shows that the • Metamodel-matching techniques used to infer a migration produced solutions using NSGA-II are similar to the baseline strategy from the difference between the original meta- ones with more than 80% of precision and recall (AC-PR and model and the evolved metamodel. [19] [2] [20] [21]. AC-RE) in general. For four models (SM1, SM2, SM5, and • Operator-based approaches that records the metamodel SM8), we obtained more than 90% for the recall. From another changes as a sequence of co-evolutionary operations used perspective, we did not observe a correlation between the size later to infer a complete migration strategy [22] [7] [23] or the number of operations of the models and the precision [24]. and recall. For example, we obtained higher precision and None of the existing approaches allows the exploration of recall for SM7 (40 elements and 82 operations) than for different possible co-evolution strategies. On the contrary, only SM2 (17 elements and 38 operations). This means that the one specific strategy is either automatically derived or manu- correctness of the results is not degraded as the size of the ally developed from the calculated set of metamodel changes. models or the size of necessary modifications increase. For the The only work we are aware of discussing metamodel/model manual precision, the results are even better. Except for SM2 co-evolution using some search-based techniques is [25]. In and SM9 all the MCs are higher than 90% with a perfect score this paper, they authors discuss the idea of using search- for SM8. Here again, the scalability in terms of correctness is based algorithms to reason about possible model changes, but valid for MC. Indeed MC increases from 86% (SM2) to 94% in contrast to our approach, they rely again on metamodel (SM10) while the size of the model (resp. operation sequence) differences which have to be computed (probably using a goes from 17 to 44 (resp. 38 to 94). search-based approach) before the co-evolution of models can To formally answer RQ2, we state that the multi-objective be performed. To the best of our knowledge, this approach co-evolution approach allows to migrate models with higher is unique compared to previous approaches and outperforms precision and recall and with a limited number of edit oper- logic-based approaches for repairing models [12]. Further- ations. This achieved without any explicit knowledge on the more, we are not dependent on the quality of metamodel specific changes that occurred on the metamodel. change detection algorithms. VI. C ONCLUSION D. Threats to Validity This paper proposes a multi-objective approach for the co- There are several validity threats to the design of this study. evolution of models by finding the best operation sequence The first threat is related to the parameter tuning of our of changes applied to the initial model that generate the multi-objective algorithm. Further experiments are required to target model conforming as much as possible to the evolved evaluate the impact of the parameters setting on the quality metamodel. Therefore, a generated revised model is necessary of the solutions. A second threat is related to our choice that minimizes the number of inconsistencies (with the new of taking the average of the three objective function in the metamodel), the number of operations and the dissimilarity mono-objective algorithms. Other forms of combination, e.g., with the initial model. As the search space in terms of all weighted average, may give different results. Further exper- possible sequences of operations is potentially huge and we iments are required to compare our approach with different have three objectives to optimize, we considered in this paper settings of the mono-objective algorithm. In this study, we the co-evolution process as a multi-objective optimization performed our experiments on a single evolution scenario problem. (state machine metamodel from v1.4 to v2.0). Future repli- We evaluated our proposal with a metamodel evolution cations of this study are necessary to confirm our findings, in scenario concerning a state machine modeling language. The particular, with industrial settings. In addition, the comparison experiment results indicate clearly that the best generated of the performance of NSGA-II to other existing approaches models have a precision and recall of more than 80% and is limited to the approach of Wimmer et al. [12]. We plan a manual precision of more than 90%. Furthermore, the to conduct other comparisons by requesting the tools from the results provide strong evidence to support the claim that our authors or by re-implementing the state-of-the-art approaches. proposal outperforms both a mono-objective and deterministic approaches for model co-evolution. V. RELATED WORK We are working now on larger metamodels and models with larger lists of operations to apply. This is necessary to Several approaches emerged which aim to tackle metamod- investigate more deeply the applicability of the approach in el/model co-evolution from different angles using different practice, but also to study the performance of our approach techniques (cf. e.g., [2], [15], [16] for an overview). Meta- when dealing with very large models. model/model coevolution approaches can be classified in three categories [16]: R EFERENCES • Manual-specification approaches in which the migration [1] J. Bézivin, “On the unification power of models,” Software and System strategy is encoded manually by the modeler using Modeling, vol. 4, no. 2, pp. 171–188, 2005. [2] B. Meyers and H. Vangheluwe, “A framework for evolution of modelling general purpose programming language (e.g. Java), or languages,” Sci. Comput. Program., vol. 76, no. 12, pp. 1223–1246, transformation languages (e.g ATL, QVT) [17] [18]. 2011. [3] T. Kühne, “Matters of (Meta-) Modeling,” Software and Systems Mod- [25] J. R. Williams, R. F. Paige, and F. A. C. Polack, “Searching for model eling, vol. 5, no. 4, pp. 369–385, 2006. migration strategies,” in Proceedings of the 6th International Workshop [4] J. Schoenboeck, A. Kusel, J. Etzlstorfer, E. Kapsammer, W. Schwinger, on Models and Evolution, 2012, pp. 39–44. M. Wimmer, and M. Wischenbart, “CARE: A Constraint-based Ap- proach for Re-establishing Conformance-relationships,” in Proceedings of the Tenth Asia-Pacific Conference on Conceptual Modelling, ser. APCCM ’14, 2014, pp. 19–28. [5] L. Iovino, A. Pierantonio, and I. Malavolta, “On the Impact Significance of Metamodel Evolution in MDE,” Journal of Object Technology, vol. 11, no. 3, pp. 3:1–33, 2012. [6] J. Gray, J.-P. Tolvanen, S. Kelly, A. Gokhale, S. Neema, and J. Sprinkle, Domain-Specific Modeling, 2007, ch. 7, pp. 1–20. [7] M. Herrmannsdörfer, “COPE A Workbench for the Coupled Evolution of Metamodels and Models,” in Software Language Engineering, ser. LNCS, 2011, vol. 6563, pp. 286–295. [8] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A Fast Elitist Non- dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II,” ser. Lecture Notes in Computer Science, 2000, vol. 1917, pp. 849–858. [9] M. Harman, S. A. Mansouri, and Y. Zhang, “Search-based software engineering: Trends, techniques and applications,” ACM Comput. Surv., vol. 45, no. 1, pp. 11:1–11:61, 2012. [10] A. Ouni, M. Kessentini, H. Sahraoui, and M. Boukadoum, “Maintain- ability defects detection and correction: a multi-objective approach,” Journal of Automated Software Engineering, vol. 20, no. 1, pp. 47–79, 2013. [11] K. Deb and S. Gupta, “Understanding knee points in bicriteria problems and their implications as preferred solution principles,” Engineering Optimization, vol. 43, no. 11, pp. 1175–1204, 2011. [12] M. Wimmer, A. Kusel, J. Schoenboeck, W. Retschitzegger, and W. Schwinger, “On using inplace transformations for model co- evolution,” in MtATL Workshop, 2010. [13] http://www.omg.org, Object Management Group, Inc., OMG Unified Modeling Language Specification v1.4 and v2.0. [14] A. Arcuri and L. Briand, “A practical guide for using statistical tests to assess randomized algorithms in software engineering,” in 33rd International Conference on Software Engineering, 2011, pp. 1–10. [15] L. Rose, M. Herrmannsdoerfer, S. Mazanek, P. Van Gorp, S. Buchwald, T. Horn, E. Kalnina, A. Koch, K. Lano, B. Schtz, and M. Wimmer, “Graph and model transformation tools for model migration,” Software and Systems Modeling, vol. 13, no. 1, pp. 323–359, 2014. [16] L. M. Rose, R. F. Paige, D. S. Kolovos, and F. A. C. Polack, “An Analysis of Approaches to Model Migration,” in Proc. Models and Evolution (MoDSE-MCCM) Workshop, 12th ACM/IEEE International Conference on Model Driven Engineering, Languages and Systems, 2009. [17] A. Narayanan, T. Levendovszky, D. Balasubramanian, and G. Karsai, “Automatic domain model migration to manage metamodel evolution,” in Model Driven Engineering Languages and Systems, ser. LNCS, 2009, vol. 5795, pp. 706–711. [18] L. M. Rose, D. S. Kolovos, R. F. Paige, and F. A. C. Polack, “Model migration with Epsilon Flock,” in International Conference on Model Transformation, volume 6142 of LNCS, 2010, pp. 184–198. [19] B. Meyers, M. Wimmer, A. Cicchetti, and J. Sprinkle, “A generic in- place transformation-based approach to structured model co-evolution,” in 4th Int. Workshop on Multi-Paradigm Modeling @ MoDELS’10, 2010. [20] K. Garcés, F. Jouault, P. Cointe, J. Bézivin, and A. Emninria, “Man- aging model adaptation by precise detection of metamodel changes,” in ECMDA-FA09: European Conference on Model Driven Architecture- Foundations and Applications, 2009, pp. 34–49. [21] A. Cicchetti, D. D. Ruscio, R. Eramo, and A. Pierantonio, “Automat- ing co-evolution in model-driven engineering,” in 12th International Enterprise Distributed Object Computing Conference (EDOC), IEEE Computer Society, 2008. [22] M. Herrmannsdoerfer, S. Benz, and E. Juergens, “Cope - automating coupled evolution of metamodels and models,” in ECOOP 2009 Object- Oriented Programming, 2009, vol. 5653, pp. 52–76. [23] G. Wachsmuth, “Metamodel adaptation and model co-adaptation,” in ECOOP07. Volume 4609 of LNCS., Springer, 2007. [24] M. Herrmannsdörfer and G. Wachsmuth, “Coupled evolution of software metamodels and models,” in Evolving Software Systems, 2014, pp. 33– 63.