MaRTS: A Model-Based Regression Test Selection Approach Mohammed Al-Refai Computer Science Department Colorado State University Fort Collins, CO, USA Email: al-refai@cs.colostate.edu Abstract—Models can be used to plan the evolution and that does not affect the operation’s signature and contract [4]. runtime adaptation of a software system. Regression testing of Fine-grained changes are those that can be made at a low the evolved and adapted models is important to ensure that level of abstraction, such as changes to a statement inside the previously tested functionality is not broken. Regression testing is performed with limited time and resource constraints. a method implementation. Second, they do not support the Thus, regression test selection (RTS) techniques are needed to identification of changes to inherited and overridden operations reduce the cost of regression testing. Existing model-based RTS along the inheritance hierarchy [4], [7], [8], which leads to approaches cannot detect all types of fine-grained changes that situations where relevant test cases that traverse such inherited can be made at a low level of abstraction, and they do not consider and overridden methods are not selected for regression testing. the impact of inheritance hierarchy changes on the selection of test cases. We propose a model-based RTS approach called MaRTS We propose a model-based RTS approach called MaRTS that to be used for regression testing of unanticipated fine-grained classifies test cases based on changes performed to UML class and adaptations performed at the model level. MaRTS uses UML activity diagrams. It supports both fine-grained and inheritance design class and activity diagrams to represent behaviors of hierarchy changes. We compared MaRTS with two code-based a software system and its test cases. MaRTS is based on RTS approaches using four applications. MaRTS achieved results comparable to a dynamic code-based RTS approach (DejaVu), (1) static analysis of the UML class diagram to identify the and outperformed a static code-based RTS approach (ChEOPSJ). changes in the inheritance hierarchy, (2) fine-grained model The fault detection ability of the selected test cases was equal to comparison to identify changes performed to UML class and that of the baseline test cases. activity diagrams, and (3) dynamic analysis of the test case Index Terms—inheritance hierarchy, model-based adaptation, execution at the model level to determine the coverage for model-based regression test selection, UML activity diagram, UML class diagram each test case. We evaluated MaRTS on four applications, and compared it with two code-based RTS approaches. We also evaluated I. I NTRODUCTION the fault detection ability of the reduced test sets achieved by Regression testing is one of the most expensive activities MaRTS. performed during the lifecycle of a software system [1], [2]. Regression test selection (RTS) [3] is an approach that II. A PPROACH improves regression testing efficiency and reduces regression MaRTS classifies the test cases as obsolete, retestable testing time by selecting a subset of the original test set for or reusable. Obsolete test cases are invalid and cannot be regression testing [3], [4]. executed on the modified version of the software system. RTS approaches can be based on the analysis of code or Retestable test cases exercise the modified parts of the soft- model-level changes of a software system. Model-based RTS ware system, and need to be selected for regression testing. has some advantages over code-based RTS. First, it enables Reusable test cases only exercise unmodified parts of the early estimation of the effort required for regression testing [4]. system, and they do not need to be re-executed for safe Second, it can scale up better than code-based RTS approaches regression testing [9]. A safe RTS technique must select all for large scale software systems [5]. Third, model-based RTS modification-traversing test cases for regression testing [10]. techniques can be more convenient for approaches that already A test case is considered to be modification-traversing for a apply evolution/adaptation at the model level because both program 𝑃 if it executes changed code in 𝑃 , or if it formerly the evolution/adaptation and test selection processes can be executed code that had been deleted in 𝑃 [5]. performed at the same level of abstraction [6]. In a prior work [6], we applied MaRTS within the context Existing model-based RTS approaches suffer from the fol- of a Fine Grained Adaptation (FiGA) framework [11], [12] lowing limitations. First, they cannot detect all types of fine- that uses UML diagrams to support unanticipated and fine- grained changes from UML class, sequence, and state machine grained adaptations on running Java software systems. FiGA diagrams used in these approaches [4], [7], [8]. An example of uses ReverseR [13] to extract UML class and activity diagrams such a change is a modification to an operation implementation from Java source code, and JavAdaptor [14], [15] to update a running Java program without stopping it. In FiGA, each indi- in each activity diagram are executed. This information is used vidual method is represented as an activity diagram. The UML to obtain the activity-level and flow-level traceability matrices activity diagram elements that are supported are initial and that relate each test case to the activity diagrams and their final nodes, action nodes, call behavior nodes, and decision and flows that were traversed by the test case. merge nodes. An activity diagram generated using ReverseR C. Model Change Identification is executable, where each action node in the activity diagram has a code snippet associated with it, and Java statements are MaRTS uses RSA model comparison to identify the model contained inside the code snippet. When the model execution changes after developers adapt the class and activity dia- flow reaches an action node, then the code snippet associated grams. The class diagram changes that can be identified with the action node is executed. Additionally, ReverseR maps are addition/deletion/modification of interfaces, classes, class a code-level method invocation statement to a call to the attributes, operations, and generalization and realization rela- correspoding activity diagram. When the model execution flow tions. The activity diagram changes that can be identified are reaches such a call, the called activity diagram is executed [6], addition/deletion/modification of nodes, transition flows, code [16]. stored in a code snippet associated with an action node, and In MaRTS, each method of the software system is repre- the boolean expression associated with a transition flow. sented as a UML activity diagram. The same thing applies D. Extraction of the Operations-Table from the Adapted Class to each test case. These activity diagrams are executable. Diagram We exploit the Rational software architect (RSA) simulation When developers adapt the class diagram, the declared and toolkit 9.01 to execute test cases at the model level. inherited operations in each class might change. Therefore, an MaRTS consists of the following five steps: operations-table is extracted from the adapted class diagram. 1) Extract operations-table from the original class diagram. The information stored in the operations-tables that are ex- 2) Calculate the traceability matrix. tracted from the original and adapted class diagrams are used 3) Identify model changes. to determine changes to inherited or overridden operations in 4) Extract operations-table from the adapted class diagram. each class. 5) Classify test cases. E. Test Case Classification MaRTS can scale up to large programs because all of its steps are automated. MaRTS requires the UML models used We proposed a classification algorithm that takes the fol- with it to be detailed and executable in order to obtain the lowing inputs: (1) the operations-tables extracted from the coverage of test cases at the model level. Therefore, MaRTS original and adapted class diagrams, (2) the identified model is not applicable to model-driven development approaches differences, (3) the flow-level and activity-level traceability that use models at a high level of abstraction and lack matrices, (4) the set of UML activity diagrams representing traceability links between the code-level test cases and the the methods of the software system, and (5) the set of activity models representing the software system. diagrams representing the baseline test cases. The algorithm classifies the test cases as obsolete, retestable, or reusable. A. Extraction of the Operations-Table from the Original Class Initially, all the test cases are assumed to be reusable. The Diagram algorithm compares the operations-tables to identify which This step is performed before developers adapt the models. operations were changed along the inheritance hierarchy. The An operations-table is extracted from the class diagram. This activity-level traceability matrix is used to determine each test table stores for each class the operations that are declared and case that is affected by those changes. The following rules are inherited by the class. For each operation, the operations-table applied: stores the operation’s declaring class, name, formal parameter 1) If an operation op is initially declared or inherited by a types, and return type. For each class in the table, the name class C, and is now neither declared nor inherited by C, of its superclass is also stored. then, find each test case that traverses op on a receiver of type C. If a found test case directly calls op on a receiver B. Traceability Matrix Calculation of type C, then flag the test case as obsolete. Otherwise, This step is performed before developers adapt the models. flag the test case as retestable. The activity diagrams representing the test cases are executed 2) If an operation op is with the activity diagrams representing the program methods a) initially inherited by a class C from an ancestor class in order to obtain the coverage of test cases at the model level. B, and is now overridden by C, or is inherited by C During model execution, four types of coverage information from one of its ancestors other than B, or are collected for each test case: (1) what activity diagrams b) initially declared by a class C, and is now inherited are executed by the test case, (2) what activity diagrams are by C from one of its ancestors. directly called by the test case, (3) what is the receiver object then, flag any reusable test case that traverses op on a type for each executed activity diagram, and (4) which flows receiver of type C as retestable. Once the algorithm completes iterating over all entries of 1 http://www-03.ibm.com/software/products/en/ratisoftarchsimutool the operations-tables, the test cases that are still flagged as reusable are classified based on the identified model dif- the generated test cases for XML-security from this study, ferences. If such a test case traverses deleted or modified and only considered the existing test cases that come with the transition flows and/or nodes, then the test case is flagged as application. retestable. TABLE II III. C ASE S TUDY A DAPTATIONS P ERFORMED ON M ODELS The goals of the evaluation were to (1) compare the in- Subject Evolution Changes clusiveness and precision of MaRTS with that of two code- classes & generalizations realizations operations based RTS approaches that support changes to the inheri- interfaces JUNG 1.3.0 → 1.4.0 5 7 2 79 tance hierarchy, and (2) evaluate the fault detection ability Siena 1.8 → 1.12 0 0 0 9 of the retestable test set with that of the original test set. 0 0 0 11 Siena 1.8 → 1.14 Inclusiveness measures the extent to which a regression test Chess 0→1 1 6 6 56 selection technique selects modification-traversing test cases XML-security 2→3 52 37 2 311 for regression testing, and precision measures the extent to which a regression test selection approach excludes test cases We extracted class and activity diagrams from the original that are non-modification-traversing [10]. version of each subject program and its test cases. Then, we We compared MaRTS with DejaVu [2] and ChEOPSJ [17]. adapted the class and activity diagrams from one version to DejaVu detects fine-grained changes at the statement level, and the following version in a systematic way. First, we identified ChEOPSJ detects fine-grained changes to method invocations. the code-level differences between the two versions. Second, Both tools support the identification of changes to the inher- we manually applied these differences at the model level. The itance hierarchy, and support RTS for Java software systems. changes at the model level involved additions and deletions of We did not compare MaRTS with the existing model-based classes, interfaces, operations, generalization and realization RTS approaches because they lack tool support (or tools are relations, and modifications to method implementations by unavailable). modifying the activity diagrams representing these methods. Table II summarizes the changes performed on models. A. Subject Programs and their Adaptations After the model-level adaptation process was completed, we We used four subject programs: (1) graph package of the applied MaRTS to classify test cases at the model level, and Java Universal Network/Graph Framework (JUNG)2 , (2) Sie- applied DejaVu and ChEOPSJ at the code level. na3 , (3) XML-security4 , and (4) chess program, which is a classroom project that only supports the functionality to create B. Inclusiveness and Precision Results a chessboard and move chess pieces. These programs were Table III shows the results of running the three RTS implemented using Java 6 and 7. They do not use generic types approaches. For example, MaRTS and DejaVu classified all and multithreaded programming. Table I summarizes the data the 188 test cases of JUNG as retestable, and ChEOPSJ for the original versions of each subject. classified 178 of out of the 188 test cases as retestable. For the XML-security subject, MaRTS classified 10 out of 94 TABLE I test cases as obsolete, and classified the remaining 84 test O RIGINAL P ROGRAMS cases as retestable. We found that the 10 obsolete test cases Num. Num. Num. contain calls to deleted operations. DejaVu and ChEOPSJ do Subject Version LOC classes interfaces methods not address the identification of obsolete test cases. DejaVu JUNG 1.3.0 13 12 146 3655 classified all the 94 test cases as retestable. Therefore, we Chess 0 7 1 65 1074 excluded the 10 obsolete test cases from the calculations of Siena 1.8 9 0 95 1605 the inclusiveness, precision, false positives, and false negatives XML-security 2 173 6 1172 16800 for the three RTS tools. We did not get RTS results for ChEOPSJ when we ran it We used EvoSuite [18] to generate JUnit test cases for on the XML-security subject because of a bug in ChEOPSJ. each of these versions. For JUNG, 188 test cases that achieve It did not detect code changes that it is supposed to detect, 81% statement coverage were generated. For Siena, 107 test and did not produce results. Table III and Table IV do not cases that achieve 89% statement coverage were generated. show results for ChEOPSJ with respect to the XML-security For chess, 130 test cases that achieve 96% statement coverage subject. were generated. The XML-security package has JUnit test Table IV shows the number of false positives and false suite that comes with it and achieves 31% statement coverage. negatives for each of the studied RTS approaches. DejaVu The generated test cases for XML-security did not improve is a safe tool and classifies all modification-traversing test the coverage of the existing test suite. Therefore, we excluded cases as retestable, and therefore, its inclusiveness was 100% 2 http://jung.sourceforge.net/download.html for all the subject programs. The same set of test cases that 3 http://sir.unl.edu/portal/bios/siena.php was classified as retestable by DejaVu was also classified as 4 http://sir.unl.edu/portal/bios/xml-security.php retestable by MaRTS for all the subject programs (excluding TABLE III Mutator, (4) Math Mutator, (5) Negate Conditionals Mutator, T EST C ASE C LASSIFICATION R ESULTS and (6) Void Method Calls Mutator. We configured PIT to Retestable Test Cases only mutate the adapted methods. We ran PIT with both the Subject Evolution Number of Test Cases original and retestable test sets on both the versions. DejaVu ChEOPSJ MaRTS JUNG 1.3.0 → 1.4.0 188 188 178 188 TABLE V Siena 1.8 → 1.12 107 26 54 26 M UTATION T ESTING R ESULTS Siena 1.8 → 1.14 107 36 59 36 Full Test Set Retestable Test Set Chess 0→1 130 130 126 130 Subject Mutants size score size score XML-security 2→3 94 94 N/A 84 Siena 1.12 134 107 29.8% 26 29.8% Siena 1.14 136 107 30.9% 36 30.9% the 10 obsolete test cases for XML-security). Therefore, the inclusiveness of MaRTS was also 100%. ChEOPSJ missed Table V shows the mutation testing results. Both the original some modification-traversing test cases, and its inclusiveness and retestable test sets killed exactly the same set of mutants in was 94% for JUNG, 96% for Chess, 92% for Siena version both the versions. The fault detection ability of the retestable 1.12, and 88% for version 1.14. The reason is that ChEOPSJ test set was equal to that of the original test set. only records changes to method invocations, but not to other D. Threats to Validity types of statements in method bodies. We identify several threats to validity of the results of our TABLE IV case study. N UMBER OF FALSE P OSITIVES (FP) AND FALSE N EGATIVES (FN) External validity. It is difficult to generalize from a study of only four subject programs. However, we selected program DejaVu ChEOPSJ MaRTS Subject Evolution versions that incorporate various types of modifications, such FP FN FP FN FP FN as changes to classes, methods, inheritance hierarchy, and class JUNG 1.3.0 → 1.4.0 0 0 0 10 0 0 attributes. Siena 1.8 → 1.12 0 0 30 2 0 0 Siena 1.8 → 1.14 0 0 28 4 0 0 Internal validity. The unknown factors that might affect Chess 0→1 0 0 0 4 0 0 the outcome of the analyses are possible errors in our algo- XML-security 2→3 0 0 N/A N/A 0 0 rithm implementation, and that the test cases were generated only using one test case generation tool. To control the first The precision was 100% for MaRTS and DejaVu because factor, we tested the implementation of MaRTS on different neither classified any non modification-traversing test case as change scenarios. We also compared the results achieved by retestable for each subject program. The precision of ChEOPSJ MaRTS for the case studies with those of DejaVu. was 100% for JUNG and Chess, 62% for Siena version 1.12, We used EvoSuite to generate JUnit test cases for the subject and 60% for version 1.14. The reason is that ChEOPSJ is based programs. The results could change if other test generation on static analysis of dependencies between modified code tools were used or test sets with different coverage numbers and test cases, which leads to classifying non modification- were used. Additionally, the test cases generated for the Siena traversing test cases as retestable. subject achieved low mutation scores. The fault detection C. Fault Detection Ability Results ability results could change if other test sets that achieve different mutation scores were used. We plan to evaluate the The results for MaRTS showed a reduction in the number of proposed approach on additional test suites generated by other selected test cases only for the Siena subject for the adaptation test case generation tools. from version 1.8 to 1.12, and from 1.8 to 1.14. We used Another threat is that the same person selected the subject mutation testing to evaluate the fault detection ability of these programs, generated the test cases, reverse engineered the reduced test sets. We excluded the XML-security subject from models, performed the model-level adaptations, and executed the fault detection ability evaluation because all of its test cases the RTS tools. There is a potential for getting different results were selected by MaRTS (excluding the 10 test cases that were if different people worked on these steps. The test generation classified as obsolete by MaRTS). process and RTS approaches were automated, and thus, having There are no tools (to the best of our knowledge) that other people perform those steps would not make a difference support systematic generation of mutations at the model level. if they used the same tool configurations. The adaptations are Therefore, we used a code-level mutation testing tool. In manual, which can lead to different modifications. However, particular, we used PIT5 to apply first-order method-level since we started from a particular version of code and finished mutation operators to the code-level versions 1.12 and 1.14. at a well-defined version of code, the differences are not likely The applied mutation operators6 were (1) Conditionals Bound- to be significant. ary Mutator, (2) Increments Mutator, (3) Invert Negatives Construct validity. We used inclusiveness and precision 5 http://pitest.org to evaluate MaRTS. However, there are other metrics that can 6 http://pitest.org/quickstart/mutators/ be used to evaluate an RTS approach, such as its efficiency in terms of reducing regression testing time. We plan to evaluate ACKNOWLEDGMENT the efficiency of MaRTS in the future. This material is based upon work supported by the National IV. R ELATED W ORK Science Foundation under Grant No. CNS 1305381. The RTS problem has been studied for over three R EFERENCES decades [5], [19]. Most of the existing approaches are code- [1] G. Rothermel and M. J. Harrold, “A Safe, Efficient Regression Test based [1], [2], [17], [20], [21], [22], and little work exists in Selection Technique,” ACM Transactions on Software Engineering and the literature on model-based RTS. We summarize the existing Methodology, vol. 6, no. 2, pp. 173–210, Apr. 1997. model-based RTS approaches and compare them with MaRTS. [2] M. J. Harrold, J. A. Jones, T. Li, D. Liang, A. Orso, M. Pennings, S. Sinha, S. A. Spoon, and A. Gujarathi, “Regression Test Selection Chen et al. [23] use UML activity diagrams to perform for Java Software,” in Proceedings of the 16th Conference on Object- specification-based black-box RTS. In their approach, an ac- Oriented Programming, Systems, Languages, and Applications (OOP- tivity diagram represents the requirements of a system. In con- SLA’01), J. Vlissides, Ed. Tampa, FL, USa: ACM, Oct. 2001, pp. 312–326. trast, MaRTS uses activity diagrams to represent fine-grained [3] M. J. Harrold, “Testing Evolving Software,” Journal of Systems and behaviors of a software system. Korel et al. [24] use control Software, vol. 47, no. 2-3, pp. 173–181, Jul. 1999. and data dependencies in an extended finite state machine to [4] L. C. Briand, Y. Labiche, and S. He, “Automating Regression Test Selection Based on UML Designs,” Journal on Information and Software identify the impact of model changes and perform RTS. This Technology, vol. 51, no. 1, pp. 16–30, Jan. 2009. approach does not support changes to the inheritance hierarchy [5] S. Yoo and M. Harman, “Regression Testing Minimization, Selection because it does not use UML class diagram. and Prioritization: A Survey,” Journal of Software Testing, Verification and Reliability, vol. 22, no. 2, pp. 67–120, Mar. 2012. Farooq et al. [7] use UML class and state machine models [6] M. Al-Refai, S. Ghosh, and W. Cazzola, “Model-based Regression Test for RTS. This approach does not support the identification of Selection for Validating Runtime Adaptation of Software Systems,” in (1) the addition and deletion of the generalization relations, Proceedings of the 9th IEEE International Conference on Software Test- ing, Verification and Validation (ICST’16), L. Briand and S. Khurshid, and (2) the overridden and inherited operations along the Eds. Chicago, IL, USA: IEEE, 10th-15th of Apr. 2016, pp. 288–298. inheritance hierarchy. [7] Q.-u.-a. Farooq, M. Z. Z. Iqbal, Z. I Malik, and M. Riebisch, “A Model- Briand et al. [4] present an RTS approach based on UML Based Regression Testing Approach for Evolving Software Systems with Flexible Tool Support,” in Proceedings of the 17th IEEE International use case models, class models, and sequence models. Zech Conference and Workshops on Engineering of Computer-Based Systems et al. [8] present a generic model-based RTS platform, which (ECBS’10). Oxford, UK: IEEE, Mar. 2010, pp. 41–49. is based on the model versioning tool, MoVE. The approach [8] P. Zech, M. Felderer, P. Kalb, and R. Breu, “A Generic Platform for Model-Based Regression Testing,” in Proceedings of the 5th Inter- consists of the three phases that are controlled by OCL queries, national Symposium on Leveraging Applications of Formal Methods, namely, change identification, impact analysis, and test case Verification and Validation (ISoLA’12), ser. Lecture Notes in Computer selection. The approaches of Briand et al. and Zech et al. can Science 7609, T. Margaria and B. Steffen, Eds. Heraclion, Crete: Springer, Oct. 2012, pp. 112–126. identify the addition and deletion of generalization relations [9] H. K. N. Leung and L. J. White, “Insights into Regression Testing,” between classes. However, they do not identify the impact of in Proceedings of Conference on Software Maintenance. Miami, FL, such changes to the inherited and overridden operations along USA: IEEE, Oct. 1989, pp. 60–69. [10] G. Rothermel and M. J. Harrold, “Analyzing Regression Test Selection the inheritance hierarchy, which can result in missing some Techniques,” IEEE Transactions on Software Engineering, vol. 22, no. 8, retestable test cases. pp. 529–551, Aug. 1996. In contrast to the above mentioned model-based RTS ap- [11] W. Cazzola, N. A. Rossini, M. Al-Refai, and R. B. France, “Fine- Grained Software Evolution using UML Activity and Class Models,” proaches, MaRTS can identify changes along the inheritance in Proceedings of the 16th International Conference on Model Driven hierarchy and classify test cases accordingly. Engineering Languages and Systems (MoDELS’13), ser. Lecture Notes in Computer Science 8107, A. Moreira and B. Schätz, Eds. Miami, V. C ONCLUSIONS AND F UTURE W ORK FL, USA: Springer, Sep. 2013, pp. 271–286. In this work, we presented a model-based RTS approach that [12] W. Cazzola, N. A. Rossini, P. Bennett, S. Pradeep Mandalaparty, and R. B. France, “Fine-Grained Semi-Automated Runtime Evolution,” in supports fine-grained changes in method implementation and MoDELS@Run-Time, ser. Lecture Notes in Computer Science 8378, changes to the inheritance hierarchy, and takes into account N. Bencomo, B. Chang, R. B. France, and U. Aßmann, Eds. Springer, the impact of such changes on the selection of test cases. Aug. 2014, pp. 237–258. [13] W. Cazzola, S. Pini, A. Ghoneim, and G. Saake, “Co-Evolving Applica- MaRTS was evaluated on four subjects and compared with two tion Code and Design Models by Exploiting Meta-Data,” in Proceedings code-based RTS approaches, DejaVu and ChEOPSJ, which of the 22nd Annual ACM Symposium on Applied Computing (SAC’07). consider changes to the inheritance hierarchy and support Seoul, South Korea: ACM Press, Mar. 2007, pp. 1275–1279. [14] M. Pukall, A. Grebhahn, R. Schröter, C. Kästner, W. Cazzola, and Java software. MaRTS outperformed ChEOPSJ and achieved S. Götz, “JavAdaptor: Unrestricted Dynamic Software Updates for comparable results to DejaVu in terms of inclusiveness and Java,” in Proceedings of the 33rd International Conference on Software precision. MaRTS was able to identify a certain type of Engineering (ICSE’11). Waikiki, Honolulu, Hawaii: IEEE, on 21st-28th of May 2011, pp. 989–991. obsolete test cases. DejaVu and ChEOPSJ do not address the [15] M. Pukall, C. Kästner, W. Cazzola, S. Götz, A. Grebhahn, R. Schöter, identification of obsolete test cases. The retestable test sets and G. Saake, “JavAdaptor — Flexible Runtime Updates of Java obtained by MaRTS achieved the same fault detection ability Applications,” Software—Practice and Experience, vol. 43, no. 2, pp. 153–185, Feb. 2013. that was achieved by the full test sets. We will evaluate the inclusiveness and precision of MaRTS on additional subject programs, and evaluate its efficiency in terms of reducing regression testing time. [16] M. Al-Refai, W. Cazzola, S. Ghosh, and R. France, “Using Models to May 1997. Validate Unanticipated, Fine-Grained Adaptations at Runtime,” in Pro- [21] D. C. Kung, J. Gao, P. Hsia, Y. Toyoshima, and C. Chen, “On Regression ceedings of the 17th IEEE International Symposium on High Assurance Testing of Object-Oriented Programs,” Journal of Systems and Software, Systems Engineering (HASE’16), H. Waeselynck and R. Babiceanu, Eds. vol. 32, no. 1, pp. 21–40, Jan. 1996. Orlando, FL, USA: IEEE, 7th-9th of Jan. 2016, pp. 23–30. [22] M. Skoglund and P. Runeson, “Improving Class Firewall Regression [17] Q. D. Soetens, S. Demeyer, A. Zaidman, and J. Pérez, “Change- Test Selection by Removing the Class Firewall,” International Journal Based Test Selection: An Empirical Evaluation,” Empirical Software of Software Engineering and Knowledge Engineering, vol. 17, no. 3, pp. Engineering, pp. 1–43, Nov. 2015. 359–378, Jun. 2007. [18] A. Arcuri, J. Campos, and G. Fraser, “Unit Test Generation Dur- [23] Y. Chen, R. L. Probert, and D. P. Sims, “Specification-Based Regression ing Software Development: EvoSuite Plugins for Maven, IntelliJ and Test Selection with Risk Analysis,” in Proceedings of the Conference Jenkins,” in Proceedings of the 9th IEEE International Conference on of the Centre for Advanced Studies on Collaborative Research (CAS- Software Testing, Verification and Validation (ICST’16), L. Briand and CON’02), D. A. Stewart and J. H. Johnson, Eds. IBM Press, Sep. S. Khurshid, Eds. Chicago, IL, USA: IEEE, Apr. 2016, pp. 401–408. 2002, pp. 1–14. [19] E. Engström, P. Runeson, and M. Skoglund, “A Systematic Review [24] B. Korel, L. H. Tahat, and B. Vaysburg, “Model Based Regression Test on Regression Test Selection Techniques,” Information and Software Reduction Using Dependence Analysis,” in Proceedings of the Inter- Technology, vol. 52, no. 1, pp. 14–30, Jan. 2010. national Conference on Software Maintenance (ICSM’02), G. Antoniol [20] L. J. White and K. Abdullah, “A Firewall Approach for Regression and I. D. Baxter, Eds. Montréal, Quebec, Canada: IEEE, Oct. 2002, Testing of Object-Oriented Software,” in Proceedings of the 10th In- pp. 214–223. ternational Software Quality Week (QW’97), San Francisco, CA, USA,