Automatic model repair using reinforcement learning Angela Barriga Adrian Rutle Rogardt Heldal Western Norway University of Western Norway University of Western Norway University of Applied Sciences Applied Sciences Applied Sciences Bergen, Norway Bergen, Norway Bergen, Norway abar@hvl.no aru@hvl.no rohe@hvl.no ABSTRACT models are repaired by using a rule-based interactive approach. When performing modeling activities, the chances of breaking a Other contributions take into account the editing history of a model model increase together with the size of development teams and such as [18] with a tool to generate repair suggestions and [20] number of changes in software specifications. One option to pre- with a prototype based on graph transformation theory. vent and repair broken models is to automatize this process with a However, and despite the potential already shown by Machine software tool, using techniques like Machine Learning (ML). De- Learning (ML) in achieving (and sometimes even surpassing) human spite its potential to help fixing models, it is still challenging to performance in repetitive tasks, we could not find in the literature apply ML techniques in the modeling domain due to the lack of any research applying ML to improve how developers deal with available datasets. However, some ML branches could offer promis- cyclical tasks in modeling, such as repairing. Authors in [3] stress ing results since they do not require initial training data, such as how the combination of these two fields, ML and modeling, could Unsupervised and Reinforcement Learning (RL). In this paper we be really beneficial for the future of modeling and state some appli- present a prototype tool for automatic model repairing by using cation examples. ML has already shown its effectiveness providing RL algorithms. These algorithms could potentially reach model support and independently solving different tasks in software en- repairing with human-quality without requiring supervision. To gineering like testing [5], quality assurance [4, 15] or verification conduct an initial evaluation we have tested our prototype with a [8]. set of broken models and studied its execution and results. The biggest challenge for ML adoption within modeling is the lack of historical data available publicly, since most ML algorithms CCS CONCEPTS need great amounts of datasets in order to achieve high-quality results. Some available model repositories are [1, 7, 11], yet the • Computing methodologies → Reinforcement learning; Model data they offer is limited in terms of diversity and labeling. Still, verification and validation; • Software and its engineering → until the available model datasets grow and improve, there are Risk management; • Theory of computation → Theory and al- some ML algorithms that could work overcoming this data issue. gorithms for application domains; Unlike other ML techniques, Reinforcement Learning (RL) does not require training datasets, since their purpose is to find structures in KEYWORDS the data they work with without supplying any target or outcome Model Repair, Machine Learning, Reinforcement Learning beforehand. This make them useful in situations and domains which lack historical data. 1 INTRODUCTION Another issue is that the size of search space when performing Models are used to develop key parts of systems in engineering automatic repairing can grow exponentially when the target model domains [23]. Therefore, the correctness and accuracy of such mod- has enough content. In addition, potential interesting fixes are usu- els are of the utmost importance to maintain good quality during ally not unique, the same model could be repaired in many different the development of these systems. The complexity of keeping mod- ways, adding complexity and uncertainty into ML algorithms. els free of errors grows together with the size of working teams RL offers algorithms able to learn by themselves how to interact and number changes during the development process. Tracking all in an environment by providing them a set of possible actions and versions of the same model and checking its correctness can be a rewards for each of them. Using these rewards they can learn which challenging task, especially if also the size of the model increases. are the best actions to perform in said environment. Rewards can be Automation can be a good solution to ease the complexity of this established so that algorithms learn how to fix undesired situations process by periodically checking if a model is free of errors and or problems in the environment, such as conflicts with non-unique repairing them when they occur. Tools that automatize or support repairing actions. The search space can be reduced by classifying error detection and repairing of models can greatly improve how actions available accordingly to the error to fix. Actions that could organizations deal with Model-Driven engineering processes by never interact with an error could be omitted. reducing the burden of manually dealing with correctness issues This technique dates from the 80s, however, it has been during and improving delivery time and final quality. last years due to the decreasing price of hardware and increasing To this end, there already exists many tools and research efforts of computational power, that it has produced breakthrough results [16] for automatically resolving software bugs that have shown in tasks as different as: reaching above-human-level performance promising results. Examples of developed prototypes for model in video and board games [19], improving existing algorithms [14] repairing can be found in [6] where authors achieve repairing by or teaching simulation entities and robots themselves to walk [9]. aligning models and their behavior logs or in [17] where EMF Given the potential these algorithms offer together with the need while the agent interacts with the environment with repeated calcu- of automated tools for model repair, we consider that studying how lations of the Bellman Equation [2]. This equation returns a Q-value to apply RL for autonomous model repair could provide interesting telling that the maximum future reward is the reward (r) the agent results, with the possibility to achieve human performance in fixing received for entering the current state (s) with some action (a) plus models. the maximum future reward for the next state (s’) and action (a’) Hence in this paper we present a prototype of how a RL algo- reduced by a discount factor (gamma) to avoid falling in a local rithm can repair errors in a set of broken models without human maximum (see Fig. 1). This allow to infer the value of the current intervention, showing its execution and results. In this first ap- state (s) based on the calculation of the next one (s’), which can be proach, we consider errors related to violations of well-formedness used to calculate an optimal policy to select actions. constraints, with the idea to generalize the type of errors supported after further development. Q(s, a) = r + γ (max ′ (Q(s ′, a ′ )) a 2 AUTOMATIC MODEL REPAIR TOOL From here the algorithm can determine the utility of a state and The Automatic Model Repair Tool (AMRT) is able, by using RL the optimal action to do next until it reaches its final goal (maximum techniques, to choose and apply the best actions to fix a broken reward). Each iteration (attempt made by the agent) is called an model. During this process the tool is independent from the user episode. For each episode, the agent will try to achieve the goal and does not require any supervision. state and for every transition (pair of state and action) it will keep on updating the values of the Q table. 2.1 Algorithm RL consists of algorithms that learn what to do, given a situation Q t +1 (st , at ) = Q t (st , at ) + α(r t +1 + γ max Q t (st +1 , a) − Q t (st , at )) a and a set of possible actions to choose from, in order to maximize a reward. These algorithms include an agent that must learn by itself 2.2 Phases what to do through interacting with the environment. Learning is The tool undergoes two different phases: learning and running. accomplished by choosing those available actions with best rewards. During the learning phase, the Q-learning algorithm will be trained In the model repair scenario, the agent is the model to be repaired using one or different models, updating its Q-table and improving and the environment the framework where it was defined (EMF the general Q-value. In other words, learning which are the optimal in our approach). EMF provides error diagnostics about the model, actions to fix a broken model. which can be altered by actions available in the framework. If the The algorithm will receive as initial input: algorithm selects actions reducing the number of errors in the model it gets a positive reward, contrarily it is penalized. This algorithm (1) An agent: model in XML format. workflow is detailed in Fig. 1. (2) Actions: given Ecore and Ecore-editor, find the exhaustive set When there are no more errors to solve the environment returns of editions which are possible actions available to perform a maximum reward, the ultimate goal to achieve. This guarantees on the model. to solve all errors on the model but since there might be many ways (3) State: given a model, get an error diagnostics from the EMF to solve them, the algorithm might not find the optimal solution. tool. However, it could be reached by providing the algorithm with more As the algorithm is executed, for each action performed on the models to solve so that it learns further or by manually favoring model it will need to know its current state, namely it will need to some rewards over others. be connected to EMF to iteratively update the model and analyse its errors. The Q-value will be optimized until the model has no errors left. This structure can be seen in detail in Fig. 2 and as pseudocde in Alg. 1. The more models provided to the algorithm the better performance it can get. Figure 1: Reinforcement learning applied to model repair workflow AMRT is powered by a Q-Learning algorithm [22]. We will apply this algorithm to a sample EMF model in section 2.3, here we only explain the main characteristics of the algorithm. In Q-Learning, the agent chooses the most optimal action by consulting a table structure called Q-table. This table is updated Figure 2: Tool training structure Once the algorithm has improved enough, the tool can enter into (3) Error 2 - Class super type of itself. the running phase like a trained package pluggable to EMF. As a (4) Error 3 - Duplicated class name. plugin it will be able to analyze and repair models directly on EMF The selected model represents part of a Web structure and con- without any supervision required. Still, it could be possible to open sists of three classes related with a reference, a composition and a its behavior for human interaction, allowing users to give feedback super type link, as can be seen in Fig 3. to the tool about decisions made so that it can learn from human choices and to personalize its performance. 2.3 Example [0..*] webpage [0..*] section In this section, we introduce a running example in AMRT and demonstrate how the algorithm repairs all errors in a sample model from scratch. For this example we have assumed the available ac- tions to perform are: Figure 3: First sample model (1) Action 0 - Rename class (2) Action 1 - Delete super type Each class has an error: (3) Action 2 - Rename attribute (1) Web: duplicated attribute name (Error 0). (4) Action 3 - Add type to attribute (2) Webpage: attribute without type (Error 1). (3) Section: the class is super type of itself (Error 2). Algorithm 1 AMRT Algorithm Initialize: Q-matrix, Gamma Table 1 details the execution of the algorithm on the sample model INPUT: from EMF (Model, Actions, Errors) step by step. The correct form to read the table is, for each episode, while Episodes not empty do to read steps from left to right consecutively (if there are two rows for each Error s do of steps in the same episode the correct way would be to read all for Steps in Episode do steps on the first one and then the second). Cells in red indicate that Select Action a // random or highest Q-value the selected action does not solve the current error, while green Evaluate changes in Model produced by a implies solving the error. (R) after an action means it was selected Update Q-table randomly. if s is solved then Delete s Table 1: Algorithm execution on sample model Exit for else Episodes Step 1 Step 2 Step 3 Error 0 <- Action 0 Error 0 <- Action 2 (R) s’ ← s // s’ stores s modified by a 1 Error 2 <- Action 0 Error 2 <- Action 1 Save s’ in Errors 2 Error 1 <- Action 0 Error 1 <- Action 1 Error 1 <- Action 2 3 Error 1 <- Action 1 (R) Error 1 <- Action 3 We have preferred to omit delete class/attribute actions, keeping the example as simple as possible since they could also technically The algorithm starts without any knowledge about the environ- solve errors in the models but would not provide correct solutions ment, therefore it starts by exploring for each error the different (e.g.; by deleting the class Web its attributes would have no dupli- actions available. First attempt is to solve Error 0 using Action 0 cated names anymore). However, this behavior can be controlled without success. The algorithm updates then the Q-table and learns by reducing the rewards of this kind of actions and they would not that Action 0 does not solve Error 0. Next action is picked randomly be a problem in a real environment. and it solves Error 0. Here the randomness component has increased The algorithm counts with a maximum of 20 episodes and 3 the speed for finding a solution. Next error to solve is Error 2, since steps per episode, this number is low as it also is the amount of it is the first time the algorithm processes this error, it tries actions pairs (action, error ) available, allowing us to take further insight available one by one. It finds the solution by applying Action 1. about how the algorithm interacts with the model (contrarily it Next the algorithm faces Error 1, again an unknown Error. This would find the solution faster but its decision process would not be time we can see how it explore all actions (even trying a random so visible). repetition with Action 1) until it finds the solution in the last one, Additionally, the algorithm can sometime pick random actions Action 3. instead of the one with biggest reward, in order to find new and After a few iterations, the algorithm has been able to fix all errors perhaps more optimal solutions. Our random probability for this in the model without any previous training. example is 10% since it is a really controlled environment and our objective is to prove how the algorithm learns by itself. 2.4 Evaluation During this section errors will be identified using numbers: Once the algorithm has learned about the environment, it is inter- (1) Error 0 - Duplicated attribute name. esting to evaluate its flexibility by repairing other models and how (2) Error 1 - Attribute with missing type. its performance improves. We have applied the AMRT on a set of models created with Errors: errors reparable by the available actions stated in Sec. 2.3. (1) Boat: duplicated attribute name, twice (Error 0). The set contains four models. For each model we include an (2) Boat: the class is super type of itself (Error 2). enumeration of present errors, its diagram and a table showing how its is fixed by the algorithm. We present them in the order Table 3: Algorithm execution on the third model they are processed by the algorithm. For each model the Q-table is updated and saved for the next one. Episodes Step 1 2.4.1 Second model. Error 2 <- Action 1 1 Error 1 <- Action 3 2 Error 0 <- Action 2 The algorithm is able to fix this model directly as can be seen in [0..1] car Table 5, already knowing which action solve each error. 2.4.3 Fourth model. Figure 4: Second sample model Errors: (1) Car: duplicated class name (Error 3). (2) Car: attribute without type (Error 1). (3) Car’: duplicated attribute name (Error 0). Table 2: Algorithm execution on the second model Figure 6: Fourth sample model Episodes Step 1 Step 2 Errors: Error 3 <- Action 1 (R) Error 3 <- Action 0 (1) House: duplicated attribute name (Error 0). 1 Error 0 <- Action 1 Error 0 <- Action 2 (2) House: attribute without type (Error 1). 2 Error 1 <- Action 3 Table 4: Algorithm execution on the fourth model With this model, a new error is introduced into the system, Error 3. One can see in Table 4 that the algorithm starts by selecting for Episodes Step 1 it a wrong random action but is able to fix it with the second action 1 Error 0 <- Action 2 chosen. Next, although Error 0 is already known, Q-table still needs Error 1 <- Action 0 (R) 2 to be tuned and it finds a solution on step 2. Finally, it is able to Error 1 <- Action 2 find a solution for Error 1 at the first attempt. Although it is only the second model processed by the algorithm, In execution showed by Table 6, the algorithm selects one wrong the fixing process is already performed faster. action, but only because it is selected randomly. 2.4.2 Third model. 2.4.4 Fifth model. [0..1] reference Figure 7: Fifth sample model Errors: Figure 5: Third sample model (1) Machine: duplicated class name (Error 3). (2) Machine: duplicated attribute name (Error 0). cover the whole state space even if there is total conformance (3) Machine: attribute without type (Error 1). between models and logs. Our proposed technique avoid processing (4) Machine: the class is super type of itself (Error 2). the totality of the state space at once, allowing subdivision for each (5) Machine’: duplicated attribute name (Error 0’). error and its available actions. (6) Machine’: the class is super type of itself (Error 2’). Other efforts focus on rule-based approaches. Nassar et al. [17] present a prototype where EMF models are automatically trimmed Table 5: Algorithm execution on the fifth model and completed, with user intervention in the process. Their tool works as a support guide for modelers. For us, it is also important to Episodes Step 1 Step 2 Step 3 provide user interaction but our aim is to give the prototype freedom 1 Error 3 <- Action 0 Error 1 <- Action 3 Error 0 <- Action 2 to perform its own decisions (although they can be supervised by 2 Error 2 <- Action 1 Error 2’ <- Action 1 the user if he desires so). 3 Error 0’ <- Action 1 Alternative contributions take into account the editing history of a model such as Ohrndorf et al. [18] with a tool that generates The final model is the biggest of the set with six errors. Table 7 repair suggestions. Their method presents the interesting idea of displays how it is fixed by the algorithm straightforwardly. identifying which change in the editing history originated an error. However, the tool works again just as a mere recommendation 2.4.5 Evaluation results. system and it does not have decision power. Additionally their approach only works if the model comes together with its edit The evaluation shows that AMRT is able to automatically repair history, not if the model is introduced with errors from scratch. a model without any previous training. Once the algorithm gains Taentzer et al. [20] present a prototype based on graph trans- enough knowledge, it is able to repair new models straightforwardly formation theory for change-preserving model repair. In their ap- as long as they are reparable by the set of known actions. proach, authors check operations performed on a model to iden- Performance improves really quick, allowing the AMRT to fix tify which caused inconsistencies and apply the correspondent models almost directly after a couple of models processed. consistency-preserving operations, maintaining already performed Figure 8 shows how the accuracy percentage evolves through the changes on the model. Their preservation approach is interesting, models processed. Improvement is clear, the percentage increases however it only works assuming that the latest change of the model each iteration except for the fourth model. However, as showed in is the most significant. This could not work in all environments Table 6, the only wrong action picked for this model was caused lacking therefore adaptability and reusability, for instance in situa- by the random dimension of the algorithm (10% of the actions are tions where inconsistencies happen due to simultaneous commits selected randomly instead of picking the best possible one in the from team members. Q-table), and without it 100% accuracy would have been reached. Kretschmer et al. introduce in [13] an approach for discovering and validating values for repairing inconsistencies automatically. Values are found by using a validation tree to reduce the state space size. Their approach is similar to ours but we believe RL can reduce state space even more than a tree-based search, especially when tuning rewards and tackling errors independently. Also, trees tend to lead to the same solutions once and again due to their exploitation nature (probing a limited region of the search space). Differently, RL algorithms include both exploitation and exploration (randomly exploring a much larger portion of the search space with the hope of finding other promising solutions that would not be selected normally), allowing to find new and, sometimes more optimal solutions for a given problem. Lastly, again Kretschmer et al. [12] propose an automated tool to repair inconsistencies. They propose generator functions to make their tool more generic and users have to add their own generator Figure 8: Accuracy evolution of the algorithm functions. We go a step forward using RL techniques that extend the scope of our approach and provide the flexibility for reusing the framework to be applied in a wide variety of model repairing 3 RELATED WORK cases. Also, we get the available actions directly from EMF, without Model repair is a research field which results can greatly improve needing further intervention from the user. how engineers interact with model-driven projects. Thus, it has drawn the interests of many researchers to formulate approaches and build different tools to fix broken models. 4 FUTURE WORK Some research tackle model repair relating a given model to its In the following section, we discuss the next research steps and past event logs. Fahland et al. [6] study model repairing by aligning how to improve our current results. a given model with its behavior logs. Their model repair technique At the moment, the tool simulates to be connected with EMF [9] Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval in order to receive models, errors and actions. Next immediate Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 step is to actually implement and test the tool connected with the (2017). framework. Additionally, it will be necessary to develop a plugin [10] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. 1996. Rein- forcement learning: A survey. Journal of artificial intelligence research 4 (1996), and test how it works with models directly on the EMF environment. 237–285. In order to reach better results and further validation, we will [11] Bilal Karasneh and Michel RV Chaudron. 2013. Online Img2UML Repository: An test the tool on a bigger set of models. These will be obtained by Online Repository for UML Models.. In EESSMOD@ MoDELS. 61–66. [12] Roland Kretschmer, Djamel Eddine Khelladi, Andreas Demuth, Roberto E Lopez- creating a number of EMF models (either manually or from tools Herrejon, and Alexander Egyed. 2017. From Abstract to Concrete Repairs of like [21], from a public repository such as [1, 7, 11] or by extracting Model Inconsistencies: an Automated Approach. In 2017 24th Asia-Pacific Software them from general repositories like GitHub by using data mining Engineering Conference (APSEC). IEEE, 456–465. [13] Roland Kretschmer, Djamel Eddine Khelladi, and Alexander Egyed. 2018. An techniques. These models will be analyzed and repaired by the automated and instant discovery of concrete repairs for model inconsistencies. tool using the complete set of actions available in EMF. We will In Proceedings of the 40th International Conference on Software Engineering: Com- panion Proceeedings. ACM, 298–299. study how the tool performances in this more complex scenario [14] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor and modify it until it reaches an optimal performance. Additionally, Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive we will research different modifications of the algorithm in order environments. In Advances in Neural Information Processing Systems. 6382–6393. [15] Ruchika Malhotra. 2015. A systematic review of machine learning techniques to find the best way to face model repairing, as well as testing other for software fault prediction. Applied Soft Computing 27 (2015), 504–518. RL algorithms [10]. [16] Martin Monperrus. 2018. Automatic software repair: a bibliography. ACM Finally, we would like to study further to which degree the Computing Surveys (CSUR) 51, 1 (2018), 17. [17] Nebras Nassar, Hendrik Radke, and Thorsten Arendt. 2017. Rule-Based Repair of interaction between users and the algorithm can benefit the final EMF Models: An Automated Interactive Approach. In International Conference result, which are the best terms to develop said interaction and to on Theory and Practice of Model Transformations. Springer, 171–181. [18] Manuel Ohrndorf, Christopher Pietsch, Udo Kelter, and Timo Kehrer. 2018. ReVi- identify the most crucial operations for model developers subject to sion: a tool for history-based model repair recommendations. In Proceedings of the be improved by using ML. We would like to focus especially on how 40th International Conference on Software Engineering: Companion Proceeedings. users can boost the algorithm decidability when facing situations ACM, 105–108. [19] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja where different actions are able to apply the same fix on the model. Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354. 5 CONCLUSIONS [20] Gabriele Taentzer, Manuel Ohrndorf, Yngve Lamo, and Adrian Rutle. 2017. In this paper, we have proposed the use of RL algorithms to fix Change-preserving model repair. In International Conference on Fundamental Approaches to Software Engineering. Springer, 283–299. broken models. Contrarily to other ML techniques, RL algorithms [21] Antonio Vallecillo, Frank Hilken, Loli Burgueño, Martin Gogolla, et al. 2016. do not need as many data as they do not require an initial dataset Generating Effective Test Suites for Model Transformations Using Classifying and are known for working well in finding solutions in unknown Terms. (2016). [22] Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning environments. 8, 3-4 (1992), 279–292. We present AMRT, a tool that given a broken model and a set of [23] Jon Whittle, John Hutchinson, and Mark Rouncefield. 2014. The state of practice in model-driven engineering. IEEE software 31, 3 (2014), 79–85. editing actions available, is able to find mistakes and which actions can repair them. This combination of RL and automatic fixing could potentially allow model repairing at human-level without requiring supervision. While the results obtained are still at an early stage, we consider them promising and an indicator of the potential in this research line. REFERENCES [1] Francesco Basciani, Juri Di Rocco, Davide Di Ruscio, Amleto Di Salle, Ludovico Iovino, and Alfonso Pierantonio. 2014. MDEForge: an Extensible Web-Based Modeling Platform.. In CloudMDE@ MoDELS. 66–75. [2] Richard Bellman. 2013. Dynamic programming. Courier Corporation. [3] Jordi Cabot, Robert Clarisó, Marco Brambilla, and Sébastien Gérard. 2017. Cogni- fying Model-Driven Software Engineering. In Federation of International Confer- ences on Software Technologies: Applications and Foundations. Springer, 154–160. [4] Kanika Chandra, Gagan Kapoor, Rashi Kohli, and Archana Gupta. 2016. Improving software quality using machine learning. In Innovation and Challenges in Cyber Security (ICICCS-INBUSH), 2016 International Conference on. IEEE, 115–118. [5] Richard Chang, Sriram Sankaranarayanan, Guofei Jiang, and Franjo Ivancic. 2014. Software testing using machine learning. US Patent 8,924,938. [6] Dirk Fahland and Wil MP van der Aalst. 2015. Model repair-aligning process models to reality. Information Systems 47 (2015), 220–243. [7] Robert B France, James M Bieman, Sai Pradeep Mandalaparty, Betty HC Cheng, and Adam C Jensen. 2012. Repository for model driven development (ReMoDD). In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 1471–1472. [8] Oliver Friedrichs, Alfred Huger, and Adam J O’donnell. 2015. Method and appa- ratus for detecting malicious software through contextual convictions, generic signatures and machine learning techniques. US Patent 9,088,601.