Steps towards zero-touch mutation testing in Pharo Mehrdad Abdi1,2 , Serge Demeyer1,2 1 Universiteit Antwerpen, Middelheimlaan 1, 2020 Antwerpen, België 2 Flanders Make, België Abstract Mutation testing is injecting artificial faults into the code to assess the written test methods. Not surprisingly, this process is time-consuming and may take hours and days to complete. On the other hand, developers, who are busy with different tasks, may find it cumbersome to run mutation testing in their workstations. In this paper, we propose some steps to develop a zero-touch mutation testing framework and facilitate employing mutation testing by developers. We extend MuTalk, the mutation testing framework in the live programming environment of Pharo, by (1) adding hierarchical mutation operators, (2) integrating it to GitHub-Actions, (3) visualizing the result in a web-based mutants explorer. Keywords Mutation testing Ops, DevOps, Zero-touch testing, Test amplification 1. Introduction Software is everywhere, and its failures cost. Unit testing is writing small test code snippets that exercise the unit under test and asserts the intended values. In mutation testing [1], some artificial bugs (mutations) are injected into the program under test to evaluate the test suite’s strength. We say the test suite kills a mutant when at least one of the tests fails in the mutated program. Alive mutants show that the test suite needs improvements because it is indifferent to the injected faults. Pharo [2, 3] is a dynamically typed language with a live programming environment focusing on simplicity and immediate feedback. The observations from the experiments in our past work in Pharo motivated us for this work. We developed a test amplification tool, Small-Amp [4], that analyzes the program under test and its test suite and suggests new test methods to kill some of the mutants. During the experiment, we noticed that MuTalk, the mutation testing in Pharo, generates too few mutants compared to the mutation testing framework in Java from another work [5]. Mutation testing in Pharo generated 1102 mutants for 52 classes (≈21 mutants per class), while there were 7980 mutants in 40 classes in Java (≈200 mutants per class). To the extent that in one of the cases (TLLegendTest), it failed to generate any mutant despite the class under test having 96 lines of code. This observation led us to expand the mutation operator in MuTalk of which the details come in Section 2. BENEVOL 2022, The 21st Belgium-Netherlands Software Evolution Workshop Mons, 12-13 September 2022 $ mehrdad.abdi@uantwerpen.be (M. Abdi); serge.demeyer@uantwerpen.be (S. Demeyer) € https://github.com/mabdi/ (M. Abdi); https://win.uantwerpen.be/~sdemey/ (S. Demeyer)  0000-0001-6984-3098 (M. Abdi); 0000-0002-4463-2945 (S. Demeyer) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) After adding new mutation operators, we witnessed that the number of times Pharo has recovered from freezing has increased, and the main reason was entering an infinite loop. Section 3 explains this problem with an example and how we overcome this problem. In the next step, we created MuTalkCI as a zero-touch mutation testing for the live program- ming environment of Pharo. By default, developers are expected to use MuTalk manually by loading it in their Pharo image, running it over their project, and waiting a considerable time to finish. We created a workflow in GitHub-Actions that loads the project under test and runs a hierarchical mutation testing on it. We call it zero-touch because the burdensome parts of the process are automated, and the developers’ attention is needed when the result is ready to be audited. We also call it MutationTestingOps because of its similarities to DevOps in running continuous mutation analysis. MuTalkCI is explained in Section 4. The framework also includes a web-based mutant explorer to stash the mutation coverage status over the development time (similar to coveralls.io but for mutation testing). Using this mutant explorer, developers can assess the alive mutants and decide which to kill. We also equip the mutant explorer with a coverage indicator based on the RIPR model [6, 7, 8, 9, 10] which helps developers in their assessment. This web interface is bidirectional and allows the developer to mark the mutants as to be killed, which creates an issue in the repository on GitHub. The interactive mutant explorer comes in Section 4.1. 2. Expanding Mutation Operators in MuTalk 2.1. Pharo and MuTalk Pharo is a pure object-oriented, dynamically typed language based on Smalltalk. It offers a simple language model: every action in the language is accomplished by sending messages to objects. In the context of Pharo, the term message sending is used instead of method invocation. As an example, there is no predefined if statement in the language: it is implemented as sending the message ifTrue: with a block argument to boolean objects. Another significant differences between Pharo and other programming languages are Phaor’s live programming environment and its snapshot base nature. Unlike most programming languages, Pharo provides a live programming environment. In Pharo, developers snapshot the state of their image when they exit the environment, and reload the snapshot when they reenter. This nature of Pharo makes it vulnerable to unrecoverable changes in the system by a mutation testing tool unintentionally. MuTalk1 is a mutation testing framework for programs written in Smalltalk. The original mutation operators in MuTalk includes some known patterns related to Boolean messages, Magnitude messages, Collection messages, Number messages and Flow control messages [11]. Most of the original operators interchange a known messages with other know messages. For example, one of operators replaces ifTrue: messages with ifFalse:. Other operators may remove the function return operator, remove exception handling blocks, replace a block with an empty block, or replace the ifTrue: receiver object with true/false objects. 1 https://github.com/pavel-krivanek/mutalk 2.2. Mutation Operators Learned from previous works and other mutation testing frameworks2 , we added the following new mutation operators to MuTalk3 . The list is sorted from the most coarse-grained to the finer operators: • Extreme transformation. We adopted an extreme transformation operator [12, 13] that stips the whole body of the test method. In Pharo, these stipped methods always return their object (^ self). We use this operator as the most coarse-grained mutation that verifies whether the tests are sensitive to removing all statements from a covered method or not. • Disabling invocations. As we explained earlier, every action in Pharo is achieved by sending messages. The message #yourself is a special message that returns the object itself. We implemented a mutation operator that replaces the sent message with #yourself to disable an invocation. We use this operator as the second coarse mutation that verifies whether the tests are sensitive to disabling a statement from a covered method or not. • Nullifying the arguments. In this mutation operator, we replace an argument in a message send node with nil. This operator also verifies whether the tests are sensitive to disabling an argument in one of the statements. • Mutating the literals. In this mutation operator, we mutate the literal values. We use a negation for the Boolean constants, an increase/decrease or zero for the numerical constants, and replacing with an empty string or a specific predefined string for the string values. 3. Detecting Infinite Loops After adding the new operators, we witnessed the number of times Pharo freezes has increased so that scarcely an execution finishes. The main freezing reason was entering an infinite loop. Here we explain it using an example. The first code in the Listing 1 shows a method in which the factorial of an integer number is calculated recursively. The next code snippets are mutated versions of this method. In these mutants, when MuTalk runs the test to verify mutation detection, an infinite loop happens because the mutation operator disables the conditional statement. Sometimes, the operating system kills the process by an Out of memory error. factorial: anInt anInt == 1 ifTrue: [ ^ 1 ]. ^ anInt ∗ (self factorial: anInt −1) "Mutant 1: disabled the conditional statement by replacing the message" factorial: anInt (anInt == 1) yourself. ^ anInt ∗ (self factorial: anInt −1) 2 PIT: https://pitest.org/quickstart/mutators/ 3 https://github.com/mabdi/mutalk "Mutant 2: replaced the condition with always false" factorial: anInt false ifTrue: [ ^ 1 ]. ^ anInt ∗ (self factorial: anInt −1) "Mutant 3: removed return operator" factorial: anInt anInt == 1 ifTrue: [ 1 ]. ^ anInt ∗ (self factorial: anInt −1) Listing 1: Examples of an infinate loop after mutation testing. In a language like Java, the mutation testing framework and the test runner run in two different processes. As a result, the test runner process fails with a StackOverFlow error in a similar mutation and is detected effortlessly by mutation testing. However, the story is different in Pharo because it is a live programming environment. The mutation testing framework and the test runner run in a shared process called Pharo image. So, an infinite loop for the test runner means the whole process losses its availability. We explained this problem in [14]. To solve this problem, we need a mechanism similar to StackOverflow error in Pharo. We added an auxiliary statement at the beginning of the mutated method that counts the number of its executions and throws an exception that fails the test if it reaches the defined threshold. We exploited the technique used in the class Halt in Pharo internals for its implementation. Although this technique significantly decreased the number of freezings, the process still may crash or freeze for other reasons. We leave recovering from other crashes as future work. Listing 2: Auxiliary exception for avading infinate loops. factorial: anInt RecursionError onCount: 1024. "I will go off if executed 1024 times" (anInt == 1) yourself. ^ anInt ∗ (self factorial: anInt −1) 4. Zero-touch MuTalk For using MuTalk, developers should perform some tedious tasks, including installing the tool on their Pharo image, initializing it, running it over their programs, and waiting a considerable time to obtain the results. These burdensome steps may hinder MuTalk from being used regularly. In this part, we propose a zero-touch mutation testing solution to automate the unnecessary involvement of developers. Recently, mutation testing has been employed at scale in Google by integrating it into the build system and using a diff-based probabilistic approach to reduce the number of mutants [15]. Then in the code-review process, alive mutants are shown to developers, and they decide to kill or ignore them. In this part, we try to setup a similar process for Pharo’s open-source projects. Figure 1 illustrates the proposed hierarchical approach for running MuTalk in the CI/CD build servers. This framework is also comparable to DevOps [16] frameworks. DevOps provides agility in continuous software delivery by an iterative approach based on automation and Figure 1: Hierarchical zero-touch mutation testing or MutationTestingOps for Pharo collaboration. Similarly, we can define Mutation testing Ops (MutOps) as a continuous mutation analysis based on automation and collaboration. Mutation testing is a time-consuming process. For a mutation testing analysis in a reasonable time, we reduce the mutation testing surface by (1) employing a diff-based mutation testing works [17, 15] that only considers the changed part in the repository and (2) using a hierarchical analysis to exclude some part of code before a full feature mutation testing. The continuous mutation testing workflow is triggered when a new code is pushed to the repository. It runs a hierarchical analysis on the selected portion: 1. Firstly, it runs a code coverage tool to find the uncovered parts (report number 1: un- covered methods). If a method is not covered, all its mutants will survive, so we do not need to run mutation testing on it. So, we exclude all uncovered parts from the following analysis. 2. Then, a light mutation testing is executed (report number 2: undetected extreme trans- formations). In our implementation, we only use the extreme transformation operator. Similarly, if an extreme mutation on a method is not detected, we exclude it from the next analysis. 3. In the third step, a more detailed mutation testing, including all remained operators, is executed on the parts detected by the previous step, and report number 3 is formed. Based on the RIPR model [6, 7, 8, 9, 10], a test method can kill a mutant if it reaches the mutant (reachability); the program state is different from the state in the original version at that point (infection); the infected change is propagated to the state of the test (propagation); finally, the change is revealed by an assertion statement (reveal). To help developers to kill the mutant manually, we provide two types of coverage status for alive mutants: the list of tests covering each alive mutant and the list of tests having a propagated change (report number 4). The tests covering a mutant are start points for manual investigations on how to kill a mutant. A method with a propagated change is also interesting for developers because it says that they can kill the mutant only by adding an oracle statement to assert the state change caused by the mutation. We developed a GitHub-Actions workflow4 that runs MuTalk, and exports the reports as json outputs. The outputs are sent to the mutants explorer API (See Section 4.1) using GitHub’s authenticated account token. We use GitHub-Actions because most of Pharo’s projects currently are hosted on GitHub, and it is freely available for all open-source projects. 4.1. Mutants Explorer Since interpreting the reports generated in Section 4 may be cumbersome, we designed a web-based mutant explorer5 . The explorer keeps the history of all builds (similar to coveralls) and visualizes mutants and their coverage status. Furthermore, it is interactive and allows developers to assess the mutants and decide whether they should be killed or ignored. If they decide a mutant to be killed, the explorer adds an item to a GitHub issue related to this build in the repository. 4 https://github.com/mabdi/smalltalk-SmallBank/blob/master/.github/workflows/mutalkCI.yml 5 https://github.com/harolato/mutation-testing-coverage Figure 2: A mutant view and its generated issue Figure 2 shows an example issue to remind the developer how to kill the mutant manually. The left figure is a mutant shown to the developer in which the mutated part is displayed as a diff view on top. Then test methods covering this method are listed with an RIPR indicator. This indicator has three levels: • If none of the levels are active, it means that the test does not reach the mutant. The tests with this degree of coverage do not help developer in killing the mutant manually, so they are excluded from the user interface. • If only the first level is activated, it shows that the test method reaches the mutant. • If there are two active levels, the test reaches the mutants and the change in the program state is propagated to the test state. • If all of levels are activated, it means that the test is killed by this test method. The mutant explorer hides the killed mutants by default. It is noteworthy that we have three levels in our proof-of-concept because we skip infection level for simplicity. In this example, we see that testWithdraw not only covers the method (the first green block), but the state change from this mutant is propagated to its context (second green block). Using this report, developers understand that they can add an assertion statement to this test method to verify the method’s return value withdraw: and kill the mutant. They can click the FIX button to add an issue (right figure) to the GitHub repository. Using GitHub’s REST APIs and the user’s token obtained with oAuth, the web interface creates an issue per build and appends all items to fix. Developers can refer to this issue later and amplify their tests manually by adding new test methods or updating their existing tests. 5. Conclusion and Future work In this paper, we propose an approach for creating a zero-touch mutation testing (or Mutation- TestingOps) framework with: (1) adding new mutation testing operators to MuTalk and use an approach to identify the infinite loops and evade freezings; (2) developing a zero-touch mutation testing to automate burdensome tasks by implementing a GitHub-Actions workflow that loads the project under test and MuTalk, and runs a mutation testing process; (3) the outputs are sent to a mutant explorer in which the history of mutations is recorded and allows developers to assess mutants and mark them as to be fixed. The assessments are collected in a GitHub issue that developers can refer to in the future to amplify the tests manually. In future work, the system will be run in practice, and a user study will be conducted to evaluate it. Acknowledgments This work is supported by (a) the Fonds de la Recherche Scientifique-FNRS and the Fonds Wetenschap- pelijk Onderzoek - Vlaanderen (FWO) under EOS Project 30446992 SECO-ASSIST (b) Flanders Make vzw, the strategic research centre for the manufacturing industry. Mutant explorer is developed by Haroldas Latonas. References [1] M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, M. Harman, Chapter six - mutation test- ing advances: An analysis and survey, volume 112 of Advances in Computers, Elsevier, 2019, pp. 275–378. URL: https://www.sciencedirect.com/science/article/pii/S0065245818300305. doi:https://doi.org/10.1016/bs.adcom.2018.03.015. [2] O. Nierstrasz, S. Ducasse, D. Pollet, Pharo by Example, Square Bracket Associates, c/o Oscar Nierstrasz, 2010. [3] A. Bergel, D. Cassou, S. Ducasse, J. Laval, Deep Into Pharo, Square Bracket Associates, 2013. URL: http://books.pharo.org/deep-into-pharo/. [4] M. Abdi, H. Rocha, S. Demeyer, A. Bergel, Small-amp: Test amplification in a dynamically typed language, Empirical Software Engineering 27 (2022) 128. URL: https://doi.org/10. 1007/s10664-022-10169-8. doi:10.1007/s10664-022-10169-8. [5] B. Danglot, O. L. Vera-Pérez, B. Baudry, M. Monperrus, Automatic test improvement with dspot: a study with ten mature open-source projects, Empirical Software Engineering, Springer Verlag (2019). [6] N. Li, J. Offutt, Test oracle strategies for model-based testing, IEEE Transactions on Software Engineering 43 (2017) 372–395. doi:10.1109/TSE.2016.2597136. [7] O. L. Vera-Pérez, B. Danglot, M. Monperrus, B. Baudry, Suggestions on test suite improve- ments with automatic infection and propagation analysis, arXiv preprint arXiv:1909.04770 (2019). [8] L. J. Morell, A theory of fault-based testing, IEEE Transactions on Software Engineering 16 (1990) 844–857. [9] R. A. DeMillo, A. J. Offutt, et al., Constraint-based automatic test data generation, IEEE Transactions on Software Engineering 17 (1991) 900–910. [10] J. M. Voas, Pie: A dynamic failure-based technique, IEEE Transactions on software Engineering 18 (1992) 717. [11] H. Wilkinson, N. Chillo, G. Brunstein, Mutation testing, 2009. European Smalltalk User Group (ESUG 09). Brest, France. http://www.esug.org/data/ESUG2009/Friday/Mutation_ Testing.pdf. [12] R. Niedermayr, E. Juergens, S. Wagner, Will my tests tell me if i break this code?, in: 2016 IEEE/ACM International Workshop on Continuous Software Evolution and Delivery (CSED), IEEE, 2016, pp. 23–29. [13] O. L. Vera-Pérez, B. Danglot, M. Monperrus, B. Baudry, A comprehensive study of pseudo- tested methods, Empirical Software Engineering 24 (2019) 1195–1225. URL: https://doi. org/10.1007/s10664-018-9653-2. doi:10.1007/s10664-018-9653-2. [14] M. Abdi, H. Rocha, S. Demeyer, Reproducible crashes: Fuzzing pharo by mutating the test methods, in: International Workshop on Smalltalk Technologies, IWST, 2020. [15] G. Petrović, M. Ivanković, State of mutation testing at google, in: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP ’18, Association for Computing Machinery, New York, NY, USA, 2018, p. 163–171. URL: https://doi.org/10.1145/3183519.3183521. doi:10.1145/3183519.3183521. [16] L. Leite, C. Rocha, F. Kon, D. Milojicic, P. Meirelles, A survey of devops concepts and challenges, ACM Computing Surveys (CSUR) 52 (2019) 1–35. [17] W. Ma, T. Laurent, M. Ojdanić, T. T. Chekam, A. Ventresque, M. Papadakis, Commit-aware mutation testing, in: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2020, pp. 394–405.