ModelDefenders: A novel gamified mutation testing
                                game for model-driven engineering
                                Felix Cammaerts1 , Monique Snoeck1
                                1
                                    Research Center for Information System Engineering, Leuven, KU, Belgium


                                              Abstract
                                              Recently, there has been a surge in Model-Driven Engineering (MDE), where code is automatically
                                              generated from a model. While this has certainly enabled non-technical people to become something
                                              of a programmer, it doesn’t necessarily make them good testers or good modelers. In mutation testing
                                              syntactic variations (mutants) are created from the source code and run against a test suite. Mutants that
                                              pass all the test cases in the suite are called alive, while mutants that fail are called dead. Good testers are
                                              able to develop test suites that kill all mutants. This can also be applied to MDE, where the mutants are
                                              created on the models used for code generation. This paper presents a gamified approach for mutation
                                              testing on models and discusses the specific challenges and caveats encountered when defining mutants
                                              and setting up such a gamified approach.

                                              Keywords
                                              Mutation testing, MDE, Education


                                1. Introduction
                                Mutation testing is a code-based testing technique in which syntactic deviations of the system
                                under test (SUT) are generated, under the assumption that programmers write near-correct
                                code. The mutants are run against the test suite used for the SUT. Mutants that pass all test
                                cases are said to be alive, while mutants that fail one or more test cases are said to be dead.
                                The ratio of live mutants gives an indication of how well the SUT has been tested. The living
                                mutants can be used as feedback for the programmer, as they are different from the SUT, but
                                have still managed to pass all the test cases, indicating a possible incomplete testing of the SUT
                                or required changes to the code.
                                   Mutation testing can also be applied to MDE, in two different ways. A first approach is to
                                create mutants on the generated code to check whether the transformations work correctly
                                [1]. A second approach is to create mutants on the models. Here, deviations in the modelling
                                constructs of a model cause different outputs of the generated source code [2]. Since our focus
                                is on improving the modelling skills of non-technical students, we will concentrate on the latter
                                approach in this paper with the goal of proposing a gamified educational tool.


                                Companion Proceedings of the 16th IFIP WG 8.1 Working Conference on the Practice of Enterprise Modeling and the 13th
                                Enterprise Design and Engineering Working Conference, November 28 – December 1, 2023, Vienna, Austria
                                $ felix.cammaerts@kuleuven.be (F. Cammaerts); monique.snoeck@kuleuven.be (M. Snoeck)
                                 https://www.kuleuven.be/wieiswie/nl/person/00143708 (F. Cammaerts);
                                https://www.kuleuven.be/wieiswie/nl/person/00012755 (M. Snoeck)
                                 0000-0002-0037-3865 (F. Cammaerts); 0000-0002-3824-3214 (M. Snoeck)
                                            © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
  This paper presents ModelDefenders, a gamified approach to introducing mutation testing in
MDE. ModelDefenders is intended to be used as an educational tool where students learn to
become better testers and modellers by evaluating and creating mutants on these models.


2. Related work
A similar gamified approach, as the one being introdued in this paper, has been developed for
code-based mutation testing, this gamified approach is called CodeDefenders [3] and was the
main source of inspiration for this paper. CodeDefenders uses the same game mechanics that
we will use. CodeDefenders also aims to teach novice testers how to adequately test a software
program. Research has shown that CodeDefenders is well received by students and has positive
learning effects, with students performing steadily better [4]. It has also been found that the
test suites and mutants developed within the game are stronger than those from automated
tools [5].
   For the implementation of ModelDefenders, the MERODE MDE approach was chosen [6]. The
method is supported by a modelling tool that provides different levels of support for developing
models [7] and a companion prototyper that allows students to experiment with their models
[8] which includes a feature that provides students with feedback on their manual actions [9, 10].
The availability of a code generator makes a good basis for implementing a model defender
game. MERODE is actively being taught in two modelling courses in two universities, which
also provides opportunities for experimental evaluation.
   This paper presents the dynamics of the ModelDefenders tool, specifically for the artifacts
used in the MERODE MDE approach. It is explained how test cases and mutants can be defined
for those artefacts, and how these developed test cases and mutants can be used to engage
students in the practice of modelling and testing in a gamified approach. We attempt to formulate
an answer to the following research questions:
RQ1. How can test cases be defined on artefacts used within the MERODE MDE approach (i.e.
     Finite state machines and class diagrams)?
RQ2. How can syntactic changes (mutants) be defined on artefacts within the MERODE MDE
     approach (i.e. Finite state machines and class diagrams)?
RQ3. How can these test cases and mutants provide a gamified approach to teach students the
     practice of software testing and modelling?


3. Defining test cases
In code-based testing approaches, test cases are usually written in the same programming
language as the SUT. In MDE, the modelling language constructs are usually only used to
develop the model, without the ability to define test cases using the same modelling constructs.
Therefore, it is necessary to properly define how test cases should be defined for the different
artefacts in MDE. The artefacts used in the MERODE MDE approach are a class diagram (CD)
and statecharts that model the dynamic behaviour of the object types (OT). MERODE uses a
subset of statecharts, namely finite state machines (FSM).
Figure 1: Finite state machine of the Patient model.


Figure 2: Class diagram of the Tuxedo model. The Tuxedo object type has as parameter color and the
Person object type has as parameter name


3.1. Finite state machines
When defining a test case for an FSM, an execution sequence and an expected result should be
provided. The execution sequence is a series of events that are executed sequentially on the FSM.
The events in the test case should all be present in the FSM’s alphabet. The result of the test case
is a state in the FSM, or the error state if the execution sequence cannot be fully executed on
the FSM. In the Patient model (Figure 1), the sequence of events MEcrPatient, upgrade, upgrade,
downgrade would place the patient in the lowPriority state. While MEcrPatient, upgrade, operate
is not possible as patient cannot be operated in the lowPriority state. The expected result is
therefore an error state.

3.2. Class diagram
Similarly, when defining a test case on a CD, an execution sequence and an expected result
should be provided. However, in a CD the sequence constraints are less explicit than in an
FSM as there are no explicitly modelled states. Nevertheless, the sequence of instantiating and
removing objects in the CD can be considered as an execution sequence. For example, an object
p1 of Person must first be created before it can be removed. The expected outcome of such a
sequence of events is either success or failure. Success if the entire sequence of events can be
executed according to the multiplicities and relationships of the CD. Failure if one of the steps
in the sequence is not possible at that point in the sequence, for example, deleting an object
before it exists.
   Specific to MERODE is that the relationships between objects express an existential depen-
dency (ED) relationship, where one of the objects is the master and the other is the dependent.
ED means that the master object must exist before the dependent object can be created and that
the master object cannot be deleted until all the child objects have been deleted. In addition, a
dependent can only be related to one master object throughout its life. For example, consider
the Tuxedo model (Figure 2). Here Rental is dependent on Tuxedo and Person. This means
that before a Rental can be initiated, there should already be a Tuxedo and a Person to which
the Rental object can be associated.
   When instantiating several objects of one OT, additional object pointers need to be provided to
the events to identify the objects that are subject of the action. The minimum information needed
is the identification of the object that is impacted by the action, as well as the identification
of related objects via object pointers when a new object is created. Object pointers are given
between square brackets [], and paramerters between round brackets (). An execution sequence
for a CD might look like this crTuxedo[](Red): t1, crPerson[](’Felix’): p1,
crRental[p1, t1]: r1. This test case instantiates a Tuxedo t1, a Person p1 and a Rental
r1 which is dependent on Tuxedo t1 and Person p1. The expected result of this execution
sequence would be success. Conversely, the execution sequence crTuxedo[](Red): t1,
crRental[t1, p1]: r1, crPerson[](’Felix’): p1 would fail as the Person object
must be created before the Rental object can be created. When taking into account the parameters
of the object types as well, it is important to check whether the datatype of the given parameter
matches with the data type of the MUT. If the parameter does not match, the test case is
considered invalid.


4. Defining mutants
Mutation testing involves making small syntactic changes to the source code to create mutants,
such as changing < to ≤. These syntactic deviations are run against the test-set, and mutants
that manage to pass all the test-cases in the test-set are an indication of bad or missing test-cases
in the test-set. When defining such mutants in code-based approaches, it is important to note
that these deviations usually keep the skeleton of the code the same (for example, using the
same method calls, and classes keeping the same relationships to other classes). Similarly, when
defining mutants for mutation testing of model-based approaches, it is important to clearly
distinguish between what is considered to be the skeleton of the model and what parts can be
mutated. This section therefore provides an overview of possible mutations.

4.1. Finite state machines
When creating a mutant for an FSM, the states of the MUT must remain unchanged. Mutations
can be modelled by changing the labels of transitions and thus the events that would trigger
those transitions. It is also possible to add and remove transitions. In addition, the following
constraints should be observed to ensure that the test cases remain executable on the mutant:
(1) The mutated FSM must not contain any nondeterminism, as this would make the outcome
of the test case nondeterministic. (2) The mutated FSM must retain the names of the states as in
the Model Under Test (MUT). Failure to do so would make it impossible to correctly verify the
outcome of the test cases on the mutant.

4.2. Class diagram
When creating a mutant for a CD, the OTs of the MUT must remain unchanged. Mutations can
be modelled by changing the multiplicities of the relationships between OTs. It is also possible
to add and remove relationships and changing the datatype of the parameters. In addition, when
modelling a mutant for a CD, the following constraints should be observed to ensure that the
test case remains executable on the mutant: (1) No cyclic dependencies can be introduced in the
Figure 3: Attacker view when attacking an FSM as model under test.


mutant. (2) The mutated CD must retain the names of the OTs as modelled in the MUT. (3) The
number of parameters and their positions must remain the same.


5. Gamification
To encourage students to develop mutants and test cases under the constraints mentioned above,
a gamification approach can be used. This gamification approach is baptised ModelDefenders. In
ModelDefenders, students are assigned one of two roles: attacker or defender. Defenders are
tasked with creating test cases that consist of a sequence of events and an expected outcome
(Section 3). The attacker is tasked with creating mutants on the models (CD and FSMs), according
to the aforementioned rules (Section 4). Similar to mutation testing, the mutants (from the
attacker) are run against the test cases (from the defender). A mutant that successfully passes
all test cases is said to be alive, a mutant that fails at least one test case is said to be killed. If a
mutant survives, the attacker gains one point, while if the defender’s test cases have successfully
killed a mutant, the defender gains one point.

5.1. Attackers
The attacker is given the MUT, which is either a CD or an FSM. Figure 3 shows an example of
an FSM under test (top left). Below it is an overview of the test cases developed by the defender.
The attacker can click on "View" to see the complete test case. This shows the full sequence of
events from the defender and the expected outcome. At the top right, the attacker is given an
editing area in which he can modify the MUT to model a mutant. Once the attacker is finished
modelling the mutant, he can click on ’attack’, which will run the current test cases against
the mutant. If one of the test cases has a different outcome than the expected outcome for the
Figure 4: Defender view when defending an FSM as model under test.


mutant, the mutant is dead; if all the test cases have the expected outcome, the mutant is alive.
On the bottom right the attacker is given an overview of the mutants that have previously been
used as attack, including which are dead and which are alive. The attacker can click "View
Killing Test" if the mutant is dead to see which of the tests the mutant failed. The attacker can
click ’View’ to see the mutant.

5.2. Defenders
The defender is also given the MUT, which is either a CD or an FSM, in the top-left corner
(see Figure 4). The defender can define test cases for this FSM in the top right pane. Here, the
defender can add each of the possible events of the FSM one by one and specify the expected
outcome state (i.e. a state of the FSM or an error). Once a test case is fully defined, the defender
can add it to the test suite. Before the test case is actually added to the test suite, it is checked
whether the expected outcome of the test case actually matches the outcome that would be
expected from the FSM. If this is not the case, the test case is not added to the suite and the
user is informed that his test case has been incorrectly defined. Once the defender feels that he
has fully developed the test suite, he can use it to defend against the mutants modelled by the
attacker. These mutants are shown at the bottom left. Each mutant is labelled as dead or alive.
The defender can look at the mutants to help define the test cases for the test suite. If a mutant
is dead, the defender can also "view killing test" to see which of the test cases killed it.

5.3. Running test cases against a mutant
To check whether a mutant defined by an attacker is dead or alive, the defender’s test cases are
run against the mutant. If all the test cases have the same outcome on the mutant as on the
MUT, the mutant is alive. If at least one test case has a different outcome on the mutant than on
the MUT, the mutant is dead.
   For FSMs it is quite straightforward to check the outcome of the test case on the mutant. The
sequence of events can be run on the mutant, the state of the mutant after executing the last
event of the test case is the outcome of the test case on the mutant. This result can be compared
with the result on the MUT. As soon as one of the events of the sequence cannot be executed
on the current state of the mutant, the resulting state of the mutant becomes the error state.
   Even though, adhering to the constraints imposed for defining mutants of a FSM would mean
there is no explicit focus on common mistakes when modelling FSMs, such as liveliness aspects,
these are still implicitly present. Namely, if the MUT is valid (i.e. contains no backwards/forward
inaccessible states and this no liveliness problems), a well-defined suite of test cases, would be
able to detect any mutant that does introduce these mistakes. For example, in Figure 1, a mutant
that omits the upgrade transition between mediumPriority to highPriority, would be killed by
the test case MEcrPatient, upgrade, upgrade, upgrade with as expected outcome highPriority.
This should allow a defender to understand that the accessibility of each of the states should be
tested. In this case this is done indirectly by defining a test case with an expected outcome state
on the FSM. For an attacker this should allow to understand that the absence of such test cases
allows for mutants to be designed which do not adhere to the liveliness properties of FSMs.
   For CDs, it is not enough just to look at the order of the events; the parameters, object pointers
and associations present in the MUT and the mutant should also be considered. For example,
consider the Tuxedo case in figure 2. A tuxedo can be associated with 0 or 1 rentals, while a
person can be associated with 0 or many rentals. Considering only the order of events, a test
case crTuxedo[](Red): t1, crTuxedo[](Blue): t2, crPerson[](’Felix’): p1,
crRental[t1, p1]: r1, crRental(t2, p1): r1 with expected success, would be able
to kill a mutant in which the relation between Tuxedo and Rental has been changed to a 0 to 1,
since the last crRental is not possible on the mutant. However, this test case would not be able
to detect a mutant in which the relationship between Tuxedo and Rental has been omitted.
For this mutant the sequence of events crTuxedo[](Red): t1, crTuxedo[](Blue):
t2, crPerson[](’Felix’): p1, crRental[t1, p1]: r1, crRental[t2, p1]:
r1 would also be successful. To take the object pointers into account, one should look at
each association to objects present in the event of a test case. Take for example the event
crRental[t1, p1]: r1. This test case creates two relations, one between t1 and r1 and
another between p1 and r1. For each of the OTs that are part of these relations (Tuxedo-Rental
and Person-Rental), the following rules should be checked. X is the master OT, Y is the
dependent OT.

    • If there is a direct relation between X and Y in both the MUT and the mutant, then check
      with the multiplicity of the mutant if a new object of type Y can be created.

    • If there is a direct relation between X and Y in the MUT, but not in the mutant, no check
      for this specific relation is needed. However, the other relations still need to be checked.

    • If there is no direct relation between X and Y in the MUT, but there is one in the mutant,
      check whether there already exists an object of type X at that point in the event sequence
         and whether a new object of type Y can be instantiated on the mutant considering the
         multiplicity of the relationship.

   Concerning parameters, it is sufficient to check whether at each object instantiation, the
parameters used in the test cases are of the same data type as the parameter in the MUT at the
same position. If the data types are the same, even though the values of the data might not
match, then the mutant survives the test case. While if the data types are different, the test case
kills the mutant.
   We do acknowledge that most CDs are based on UML and do not restrict the relations into
being existent-dependent relationships. Nonetheless, the way of defining mutants for CDs and
defining test cases remains the same. The rules for comparing a test case against a mutant are
now applicable to any X and Y which are directly related to each other.


6. Evaluation
ModelDefenders is currently under development. Mockups have been created in Figma1 to
understand the possible user interactions with the tool. These mockups are currently being
translated into a web application. Once the development of ModelDefenders is complete, it can
be evaluated. Firstly, the usability of the tool will be evaluated. Secondly, it will be assessed
whether ModelDefenders motivates student to test more. Finally, we will assess whether this
increased ’exploration’ of models in turn increases students’ understanding of modelling.


Acknowledgments
This paper is being funded by the ENACTEST Erasmus+ project number 101055874.


References
    [1] A. Gonzalez, C. Luna, G. Bressan, Mutation testing for java based on model-driven
        development, in: 2018 XLIV Latin American Computer Conference (CLEI), IEEE, 2018, pp.
        1–10.
    [2] F. Belli, C. J. Budnik, A. Hollmann, T. Tuglular, W. E. Wong, Model-based mutation
        testing—approach and case studies, Science of Computer Programming 120 (2016) 25–48.
    [3] J. M. Rojas, G. Fraser, Teaching mutation testing using gamification, in: European
        Conference on Software Engineering Education (ECSEE), 2016.
    [4] G. Fraser, A. Gambi, M. Kreis, J. M. Rojas, Gamifying a software testing course with code
        defenders, in: Proceedings of the 50th ACM Technical Symposium on Computer Science
        Education, 2019, pp. 571–577.
    [5] J. M. Rojas, T. D. White, B. S. Clegg, G. Fraser, Code defenders: crowdsourcing effective tests
        and subtle mutants with a mutation testing game, in: 2017 IEEE/ACM 39th International
        Conference on Software Engineering (ICSE), IEEE, 2017, pp. 677–688.
1
    https://www.figma.com/file/lJ6HVBH0hmJGz2RSTMQMc6/ModelDefenders?type=design&node-
    id=0%3A1&mode=design&t=siCzSe9kKoYsK4oa-1
 [6] M. Snoeck, Enterprise information systems engineering, The MERODE Approach (2014).
 [7] M. Snoeck, MERLIN: An Intelligent Tool for Creating Domain Models, in: Research
     Challenges in Information Science: 14th International Conference, RCIS 2020, Limassol,
     Cyprus, September 23–25, 2020, Proceedings 14, Springer, 2020, pp. 549–555.
 [8] G. Sedrakyan, S. Poelmans, M. Snoeck, Assessing the influence of feedback-inclusive
     rapid prototyping on understanding the semantics of parallel UML statecharts by novice
     modellers, Information and Software Technology 82 (2017) 159–172.
 [9] B. Marín, S. Alarcón, G. Giachetti, M. Snoeck, Tescav: An approach for learning model-
     based testing and coverage in practice, in: Research Challenges in Information Science:
     14th International Conference, RCIS 2020, Limassol, Cyprus, September 23–25, 2020,
     Proceedings 14, Springer, 2020, pp. 302–317.
[10] F. Cammaerts, C. Verbruggen, M. Snoeck, Investigating the effectiveness of model-based
     testing on testing skill acquisition, in: IFIP Working Conference on The Practice of
     Enterprise Modeling, Springer, 2022, pp. 3–17.