SmallEvoTest: Genetically Created Unit Tests for
                                Smalltalk
                                Alexandre Bergel1 , Geraldine Galindo-Gutiérrez2 , Alison Fernandez-Blanco3 and
                                Juan-Pablo Sandoval-Alcocer3
                                1
                                  RelationalAI, Switzerland
                                2
                                  CICEI, Universidad Católica Boliviana “San Pablo”
                                3
                                  Department of Computer Science, School of Engineering, Pontificia Universidad Católica de Chile


                                                                         Abstract
                                                                         Evolutionary test generation techniques have emerged as a popular approach in recent years for en-
                                                                         hancing the testing of software systems. However, while these techniques have proven to be efficient
                                                                         in programming languages that support static type annotations, dynamically typed programming lan-
                                                                         guages have not received significant attention from the automatic test generation community.
                                                                             This paper introduces an approach aimed at automatically generating fully executable unit tests
                                                                         suitable for dynamically typed programming languages. In particular, our approach is tuned for
                                                                         dynamically-typed and class-based programming languages, and it is implemented in the Pharo and
                                                                         GToolkit programming languages. To address the absence of static type annotations, our approach uses
                                                                         a type profiling mechanism and employs a genetic algorithm to drive the evolution of the unit tests.

                                                                         Keywords
                                                                         Automatically Test Suite Generation, Genetic Algorithms, Pharo Programming Language


                                1. Introduction
                                Automatically Test Suite Generation (ATSG) consists in creating executable unit tests for a
                                particular class. ATSG produces unit tests that exercise methods for a given target class. Such
                                generated tests complement manually hand-written tests by (i) focussing on untested branches
                                or code portions or (ii) exercising corner-case scenarios. ATSG has been gaining popularity
                                thanks to EvoSuite1 and Randoop2 for Java [1, 2, 3, 4].
                                   EvoSuite considers the automatic test generation as a mathematical optimization process
                                through an evolutionary algorithm and a fitness function [5, 6]. In particular, the evolution of
                                the unit tests being generated is designed to maximize the branch coverage of the class being
                                tested [7].
                                   This short paper presents SmallEvoTest, a tool for Pharo to create unit tests for a particular
                                class automatically. Similarly to EvoSuite and Randoop, SmallEvoTest does not require a training

                                IWST 2023: International Workshop on Smalltalk Technologies, August 29-31, 2023, Lyon, France
                                { https://bergel.eu (A. Bergel)
                                 0000-0002-0877-7063 (A. Bergel); 0009-0002-3801-5227 (G. Galindo-Gutiérrez); 0000-0003-1784-814X
                                (A. Fernandez-Blanco); 0000-0002-8335-4351 (J. Sandoval-Alcocer)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                            CEUR Workshop Proceedings (CEUR-WS.org)
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073


                                1
                                  https://www.evosuite.org/
                                2
                                  https://randoop.github.io/randoop/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
dataset and does not use any large language model such as ChatGPT. SmallEvoTest is available
online3 under the MIT license.
Outline. The paper is organized as follows. Section 2 gives a running example of SmallEvoTest;
Section 3 gives a highlight of a number of design aspects of our tool; Section 4 lists studies and
tools related to this paper. Section 5 concludes and outlines our future work.


2. SmallEvoTest In A Nutshell
SmallEvoTest is relatively easy to configure and use. To illustrate this, let’s consider the class
GCPoint, which is defined as follows:

Object subclass: #GCPoint
  instanceVariableNames: 'x y'

GCPoint>>initialize
  super initialize.
  x := 0.
  y := 0

GCPoint>>add: anotherPoint
  ↑ GCPoint new x: x + anotherPoint x y: y + anotherPoint y; yourself

GCPoint>>negated
  ↑ GCPoint new x: x negated y: y negated; yourself

GCPoint>>x: xValue y: yValue
  x := xValue.
  y := yValue.

GCPoint>>x
  ↑ x

GCPoint>>y
  ↑ y

  This simple class mimics the standard Point class, and we use this example throughout this
paper. Using SmallEvoTest, unit tests for this class can be generated by executing the following
code:
SmallEvoTest new
  targetClass: GCPoint;
  generateTestNamed: #GCPointTest;
  numberOfTestsToBeCreated: 15;
  nbOfStatements: 8;
  executionScenario: [
      (GCPoint new x: 3 y: 10)
          add: (GCPoint new x: 1 y: 12) ];
  run.


  The class SmallEvoTest expects as arguments the target class (the GCPoint class for which we
want to generate unit tests), the name of the test case to be generated (GCPointTest), and an
execution scenario block exercising the target class. The execution scenario block is meant to


3
    https://github.com/bergel/GeneticallyCreatedTests
provide hints about the argument types. In this example, the scenario block invokes x:y: and
add: with some arguments. Note that the result of the scenario is not used.
   As a result, the class GCPointTest is created and will contain 15 test methods, each with 8
statements (excluding the assertions). Here is an example of how a test method looks like:
GCPointTest>>testGENERATED10
  | v1 v2 v3 v4 v5 v6 v7 v8 |
  v1 := GCPoint new.
  v2 := 4.
  v3 := v1 x: v2 y: v2 .
  v4 := v3 negated.
  v5 := GCPoint new.
  v6 := v1 y.
  v7 := v3 negated.
  v8 := v5 add: v3 .
  self assert: v4 printString equals: 'GCPoint(-4,-4)'.
  self assert: v6 equals: (4).
  self assert: v7 printString equals: 'GCPoint(-4,-4)'.
  self assert: v8 printString equals: 'GCPoint(4,4)'.


   This test has been produced by genetic algorithms and the generation process was guided
by the objective to maximize the number of executed methods of the target class. Each of the
twenty generated test methods has eight statements (indicated with the assignments of v1 to v8)
and a number of assertions. Note that SmallEvoTest uses a set of hyperparameters, including the
number of tests to be generated and the number of (non-assertion) statements to be contained
in a test.
   As the invocation of SmallEvoTest illustrates, three essential parameters must be provided to
generate tests. First, the class to be tested is specified using targetClass:. Generated tests will
directly exercise the methods defined in this class. The result of the test generation will be kept as
test methods in a class named GCPointTest. The code provided as a block to executionScenario:
is meant to exercise the class under test and is solely used to extract argument types.
   A central aspect of SmallEvoTest is to use a type profiling technique to infer possible types
to be provided. In our example above, the scenario invokes (i) x:y: with two integers and (ii)
add: with another point. Type information is useful to produce and use object examples during
the test generation. Such examples are used to provide the necessary type information when
generating and composing statements through genetic operations. In particular:
    • The fact that x:y: uses two integers lead to the creation of the statement v2, then v3 is
      produced, and
    • add: use another point as a parameter, it produces the statement v8, using v3 as argument.
  The next section describes some of the design aspects we had to consider when generating
unit tests for Smalltalk.


3. Design of SmallEvoTest
3.1. Background: Genetic Algorithms
Genetic algorithms are a type of machine learning algorithm inspired by biological and natural
evolution principles. In genetic algorithms, a population is made of individuals, and each
individual has a chromosome. A chromosome is a linear sequence of values. Genetic algorithms
are commonly employed to mathematically optimize a function 𝑓 (𝑥) = 𝑦, i.e., finding a sequence
of 𝑥 leading to maximize the value 𝑦. The variable 𝑥 is a datapoint in a multi-dimensional
domain, and the variable 𝑦 is a number. The function 𝑓 is called fitness function, which indicates
how fit the individual 𝑥 is. The function 𝑓 may model an arbitrary complex operation, such as
generating unit tests produced (obtained from the variable 𝑥) and measuring the coverage of
the target class (the 𝑦 variable).
   Evolution with genetic algorithms happens by randomly selecting fit individuals from a given
population, breeding these individuals through genetic operations to produce a new population.
The population becomes fitter with each generation by producing individuals with higher fitness
values. Overall, the population is getting fitter, thus increasing the likelihood of finding the
optimal solution (i.e., the highest possible value of 𝑦).

3.2. Genetic Encoding
Applying genetic algorithms to produce a test implies that the content of the test must be
adequately encoded as a chromosome. We denote 𝑥 as a test and refer to the number of covered
methods of the target class with 𝑦. By optimizing 𝑓 (𝑥) = 𝑦, genetic algorithms will search for
a test 𝑥 with high code coverage.
   The test testGENERATED10 given above consists of two parts: (i) initialization of the tests
made of object creations and message sends, and (ii) assertions. As produced by SmallEvoTest,
assertions do not contribute to increasing the test coverage; as such, we exclude the assertion
generation from the genetic encoding to treat it in a separate way4 .
   The unit test 𝑥 is a value in the space 𝑆 𝑁 where 𝑆 corresponds to the domain of statements,
and 𝑁 is the length of the test to be generated in terms of a number of statements. We consider
two kinds of statements, either an object creation or a message send. For example, if we arbitrarily
say that 𝑁 = 8 (i.e., generated test method will have eight statements as in the example above),
then 𝑥 will be encoded as (𝑠1 , 𝑠2 , ..., 𝑠6 ), in which each 𝑠𝑖 is either an object creation or a
message send. In the example given above, 𝑠1 corresponds to the statement v1 := GCPoint new,
an object creation, while 𝑠3 , 𝑠4 , 𝑠6 , 𝑠7 , 𝑠8 corresponds to message sends.

3.3. Mutation and crossover
Genetic algorithms employ two biologically-inspired operators, mutation and crossover. A
mutation consists in replacing a statement with another. As such, a message sent can be
replaced by sending another message, e.g., v4 := v3 negated is replaced by v4 := v3 add: v1
or by an object creation. A crossover replaces a segment in an individual with a segment from
another individual. Consider two tests 𝑥 = {𝑠1 , 𝑠2 , 𝑠3 , 𝑠4 , 𝑠5 } and 𝑥′ = {𝑠′1 , 𝑠′2 , 𝑠′3 , 𝑠′4 , 𝑠′5 },
a possible result for crossover(𝑥, 𝑥′ ) = {𝑠1 , 𝑠2 , 𝑠′3 , 𝑠′4 , 𝑠′5 }, assuming a cutpoint on the third
statement.
   After each genetic operation, variables used as message arguments may have to be readjusted
to satisfy type requirements. For example, if 𝑠′3 was originally v3 := v2 add: v1 then it expect
v2 and v1 to be a GCPoint. However, in the result of the crossover, v1 (defined in 𝑠1 ) and v2

4
    Note that this decision was also taken in EvoSuite.
(defined in 𝑠2 ) may have different types. The receiver and arguments of a message may have to
be replaced by variables meeting the type requirements.

3.4. Generating Assertions
During the source code generation, assertions are appended to the statement source code.
The test’s statements are executed in a local environment, and assertions are produced by
determining simple equality relations against different variables.
  In the current version of SmallEvoTest, assertions are produced for leaf variables, i.e., not
used as argument or receiver or other statements.


4. Related Work
In recent years, ATSG has gained popularity with the introduction of new or improved generation
tools [8, 9]. An example of this growth can be seen in the SBST Tool Contest, which reached
its 10th edition in 2022 in the category of Java Unit Testing Contest. Two of its participants,
EvoSuite and Randoop, have been awarded for several years [10, 11].
EvoSuite. Using a genetic algorithm to produce unit tests was pioneered by EvoSuite5 [5].
EvoSuite evolves unit tests in a similar fashion as we do and operates for the Java programming
language. SmallEvoTest uses some of the ideas from EvoSuite, such as test evolution, individual
encoding, and test generation. However, SmallEvoTest provides an explicit repository for type
information populated with a code example.
Randoop. A popular alternative to EvoSuite is Randoop6 , which is a Java unit test generator
that uses feedback-directed random generation which consists of creating statements using
a randomly chosen method call and previous statements as arguments [13]. The result of
executing each new statement is then verified by the tool. EvoSuite uses evolutionary search to
generate test suites [5]. It is guided by multiple coverage criteria (e.g., branch distance, mutation
testing) [14]. Studies have shown that its latest search algorithm, DynaMOSA (Dynamic Many-
Objective Sorting Algorithm) [15, 16, 17], produces short tests with higher coverage than
previous algorithms (e.g., MOSA [18], WSA [19]).
Pynguin. Besides test generation in Java language, Lukasczyk et al. recently presented Pynguin
(Python General Unit Test Generator) [20, 21]. This tool uses evolutionary algorithms to
explore the challenge of test generation on dynamically typed languages. Unlike strongly typed
languages, generation in languages such as Python or Pharo faces the problem of missing
type information. Similar to the previous tools, our work focuses on test generation using
evolutionary search. However, we focus on Pharo, a dynamically typed language, and use a
running example to collect type information used in the generation process.


5
    https://www.evosuite.org
6
    https://randoop.github.io/randoop/
5. Conclusion and Future Work
This paper presents SmallEvoTest, a tool to generate unit tests for any arbitrary Pharo class
automatically. SmallEvoTest relies on a code example to extract argument type information and
uses a just-in-time example collecting technique to combine method invocations and generate
assertions. SmallEvoTest is a proposal for a foundation for automatic test generations, and our
effort will be followed up with various points:

    • Conducting case studies: Conducting case studies on representative classes of prominent
      Pharo systems is an obvious next step. This will help us illustrates some limitations of our
      approach and will help us identify actions to take to generate unit tests for large classes.
    • Improving assertions: Many aspects of SmallEvoTest are based on immediate decisions
      taken from ad-hoc examples. In particular, the generation of assertions can be significantly
      improved. Incorporating tests about collections or structural similarities seems to be a
      reasonable step forward to produce.
    • Abstract template: The objective of the generated tests is to cover a particular part of the
      code given a particular budget, expressed in terms of test methods and statements. As
      such, the generated tests differ from manually written tests. As a future work, we plan to
      incorporate an abstract template as a way to better structure the generated tests. Abstract
      templates are meant to generate tests that follow a particular structure, e.g., accessors
      must be invoked to properly initialize the object before invoking methods with business
      logic.

  SmallEvoTest is available under the MIT License for the Pharo and GToolkit platforms.


Acknowledgments
Juan Pablo Sandoval Alcocer thanks ANID FONDECYT Iniciacion Folio 11220885 for supporting
this article.


References
 [1] A. Bacchelli, P. Ciancarini, D. Rossi, On the effectiveness of manual and automatic unit
     test generation, in: 2008 The Third International Conference on Software Engineering
     Advances, 2008, pp. 252–257. doi:10.1109/ICSEA.2008.66.
 [2] G. Fraser, M. Staats, P. McMinn, A. Arcuri, F. Padberg, Does automated unit test generation
     really help software testers? a controlled empirical study, ACM Transactions on Software
     Engineering and Methodology (TOSEM) 24 (2015) 1–49.
 [3] J. S. Kracht, J. Z. Petrovic, K. R. Walcott-Justice, Empirically evaluating the quality of
     automatically generated and manually written test suites, in: 2014 14th International
     Conference on Quality Software, 2014, pp. 256–265. doi:10.1109/QSIC.2014.33.
 [4] J. M. Rojas, G. Fraser, A. Arcuri, Automated unit test generation during software de-
     velopment: A controlled experiment and think-aloud observations, in: Proceedings
     of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015,
     Association for Computing Machinery, New York, NY, USA, 2015, p. 338–349. URL:
     https://doi.org/10.1145/2771783.2771801. doi:10.1145/2771783.2771801.
 [5] G. Fraser, A. Arcuri, Evosuite: Automatic test suite generation for object-oriented
     software, in: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th Eu-
     ropean Conference on Foundations of Software Engineering, ESEC/FSE ’11, Associ-
     ation for Computing Machinery, New York, NY, USA, 2011, p. 416–419. URL: https:
     //doi.org/10.1145/2025113.2025179. doi:10.1145/2025113.2025179.
 [6] A. Panichella, J. Campos, G. Fraser, Evosuite at the sbst 2020 tool competition, in: Proceed-
     ings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops,
     ICSEW’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 549–552.
     URL: https://doi.org/10.1145/3387940.3392266. doi:10.1145/3387940.3392266.
 [7] G. Fraser, A. Arcuri, Whole test suite generation, IEEE Transactions on Software Engi-
     neering 39 (2012) 276–291.
 [8] P. Tonella, Evolutionary testing of classes, in: Proceedings of the 2004 ACM SIGSOFT
     International Symposium on Software Testing and Analysis, ISSTA ’04, Association for
     Computing Machinery, New York, NY, USA, 2004, p. 119–128. URL: https://doi.org/10.1145/
     1007512.1007528. doi:10.1145/1007512.1007528.
 [9] A. Sakti, G. Pesant, Y.-G. Guéhéneuc, Instance generator and problem representation to
     improve object oriented code coverage, Software Engineering, IEEE Transactions on 41
     (2015) 294–313. doi:10.1109/TSE.2014.2363479.
[10] S. Panichella, A. Gambi, F. Zampetti, V. Riccio, Sbst tool competition 2021, in: 2021
     IEEE/ACM 14th International Workshop on Search-Based Software Testing (SBST), 2021,
     pp. 20–27. doi:10.1109/SBST52555.2021.00011.
[11] A. Gambi, G. Jahangirova, V. Riccio, F. Zampetti, Sbst tool competition 2022, in: 2022
     IEEE/ACM 15th International Workshop on Search-Based Software Testing (SBST), 2022,
     pp. 25–32. doi:10.1145/3526072.3527538.
[12] G. Fraser, A. Arcuri, Evosuite: automatic test suite generation for object-oriented software,
     in: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference
     on Foundations of Software Engineering, ESEC/FSE ’11, ACM, New York, NY, USA, 2011,
     pp. 416–419. URL: http://doi.acm.org/10.1145/2025113.2025179. doi:10.1145/2025113.
     2025179.
[13] C. Pacheco, S. K. Lahiri, M. D. Ernst, T. Ball, Feedback-directed random test generation,
     in: Proceedings of the 29th International Conference on Software Engineering, ICSE ’07,
     IEEE Computer Society, USA, 2007, p. 75–84. URL: https://doi.org/10.1109/ICSE.2007.37.
     doi:10.1109/ICSE.2007.37.
[14] S. Vogl, S. Schweikl, G. Fraser, A. Arcuri, J. Campos, A. Panichella, Evosuite at the sbst
     2021 tool competition, in: 2021 IEEE/ACM 14th International Workshop on Search-Based
     Software Testing (SBST), 2021, pp. 28–29. doi:10.1109/SBST52555.2021.00012.
[15] A. Panichella, F. M. Kifetew, P. Tonella, Automated test case generation as a many-objective
     optimisation problem with dynamic selection of the targets, IEEE Transactions on Software
     Engineering 44 (2018) 122–158. doi:10.1109//TSE.2017.2663435.
[16] J. Campos, Y. Ge, N. Albunian, G. Fraser, M. Eler, A. Arcuri, An empirical evaluation
     of evolutionary algorithms for unit test suite generation, Information and Software
     Technology 104 (2018) 207–235. URL: https://www.sciencedirect.com/science/article/pii/
     S0950584917304858. doi:https://doi.org/10.1016/j.infsof.2018.08.010.
[17] A. Panichella, S. Panichella, G. Fraser, A. Sawant, V. Hellendoorn, Test smells 20 years
     later: Detectability, validity, and reliability, Empirical Software Engineering 27 (2022).
     doi:10.1007/s10664-022-10207-5.
[18] A. Panichella, F. M. Kifetew, P. Tonella, Automated test case generation as a many-objective
     optimisation problem with dynamic selection of the targets, IEEE Transactions on Software
     Engineering 44 (2017) 122–158.
[19] J. M. Rojas, M. Vivanti, A. Arcuri, G. Fraser, A detailed investigation of the effectiveness
     of whole test suite generation, Empirical Software Engineering 22 (2017). doi:10.1007/
     s10664-015-9424-2.
[20] S. Lukasczyk, F. Kroiß, G. Fraser, Automated unit test generation for python, in: Proceed-
     ings of the 12th Symposium on Search-based Software Engineering (SSBSE 2020, Bari,
     Italy, October 7–8), volume 12420 of Lecture Notes in Computer Science, Springer, 2020, pp.
     9–24. doi:10.1007/978-3-030-59762-7\_2.
[21] S. Lukasczyk, F. Kroiß, G. Fraser, An empirical study of automated unit test generation for
     python, CoRR abs/2111.05003 (2021). arXiv:2111.05003.