Applications of Unit Tests in Computer Science
             and Software Engineering Education

          Atanas Semerdzhiev[0000-0002-7760-1619], Petar Armyanov[0000-0002-4903-8945],
           Trifon Trifonov[0000-0002-2247-1968] and Kalin Georgiev[0000-0002-6283-1040]

        Department of Computer Informatics, Faculty of Mathematics and Informatics,
                     Sofia University “St. Kliment Ohridski”, Bulgaria
     {asemerdzhiev, parmyanov, triffon, kalin}@fmi.uni-sofia.bg


      Abstract. The article discusses the authors’ experience with incorporating unit tests
      in the learning materials and examples used in Computer Science and Software
      Engineering courses and applying them in the grading process. This approach
      served several distinct purposes: (1) To teach students how to control and improve
      the quality of their solutions to training problems; (2) To provide additional means
      of grading students’ submissions; (3) To automate tasks performed by the teaching
      team; and (4) To improve the quality of learning materials provided to the students.
      The article describes how unit tests were integrated in the courses led by the authors,
      what conclusions were reached, and what best practices were established as a result.


      Keywords: Education, Unit Test, Sofia University, Computer Science, Software
      Engineering.


1   Introduction
In the past 5 years, the teaching teams led by the authors of this article, worked
towards integrating unit testing into courses they teach. This was done not only
as a part of the curriculum (i.e., including unit tests as a topic being taught), but
also to improve the quality of the work and to automate certain tasks. To be more
precise, unit tests were used: (1) To teach students how to control and improve
the quality of their solutions to training problems; (2) To provide additional
means of grading students’ submissions; (3) To automate tasks performed by the
teaching team; and (4) To improve the quality of learning materials provided to
the students.
     The main factors that motivated this change were:
     • the authors’ belief that unit tests are an important part of modern software
         development and should be incorporated as a part of the curriculum;
     • feedback from companies in the IT sector (among other prospective em-
         ployers of the students), which stated that students are lacking knowledge
         in this area, which is recognized by the companies as highly important;

 Copyright © 2021 for this paper by its authors. Use permitted under
 Creative Commons License Attribution 4.0 International (CC BY 4.0).
     • feedback from the students, who asked questions about the concept and
        the mechanics of unit tests and expressed general interest to gain knowl-
        edge and skills in the area; and
     • internal needs of the teaching teams.
     The present study describes how unit tests were integrated in the courses
led by the authors, what conclusions were reached, and what best practices were
established as a result. The introduction of unit tests as teaching tools has been
previously experimented and reported on by other authors as well [1].

2   Use of unit tests
IEEE’s Guide to the Software Engineering Body of Knowledge (SWEBOK)
version 3.0 describes unit testing as follows: “Unit testing verifies the functioning
in isolation of software elements that are separately testable. Depending on the
context, these could be the individual subprograms or a larger component made of
highly cohesive units. Typically, unit testing occurs with access to the code being
tested and with the support of debugging tools. The programmers, who wrote the
code typically, but not always, conduct unit testing.” [2]. Unit tests have become
an industry standard – an indispensable part of software development and are
tightly integrated in any modern software development processes, as described,
for example, in [3] and [4].
     Unit testing practices are traditionally introduced to students as part of a ded-
icated intermediate or advanced course on software testing. The common percep-
tion is that the concept is too complicated for introductory students to grasp and
effectively apply. It is clear that there is a certain cognitive cost to be paid when
introducing an additional concept at an early stage. We argue that this cost can be
offset by applying the concept of unit tests in multiple contexts, thus increasing
the gains and the students’ confidence in its applicability.
     The goal of our study was to examine how unit tests can be introduced into
introductory programming courses. Our methodology was to introduce unit test-
ing throughout the entire educational process and demonstrate that it is a multi-
faceted concept applicable not only to student grading, but as a hinting assistant
during an exam, as a validation tool for course and examination materials, and as
program analysis tool, among others.
     In this section, we describe in more detail how our teams applied unit tests in
a variety of educational activities.

2.1 As a part of the curriculum
An established beneficial routine in the teaching of classes in Computer Science and
Software Engineering (CS/SE) focused on software development is to introduce
students, often by example, to as many best practices as possible. This helps

                                         64
them, especially during the early learning stages, to develop a habit of discipline
in programming, conforming to the contemporary expectations for quality in
software development. For the students from the Information Systems program
in our faculty, it is even more important to achieve knowledge for different types
of testing, including unit testing, as they have courses, related to higher level
of software design, modeling and testing in the third and fourth year. In such
a way, we are attempting to stimulate and develop behavioral competencies,
such as, for example, recognizing development of unit tests as a best practice
of a seasoned professional, and being able to explain clearly its advantages, and
utilizing unit tests as means for intra-team and inter-team communication of the
intended purpose of the code [5]. Accordingly, we address the topic of unit testing
at an early stage and aim to develop a culture where the code and the tests are
considered as one whole – the software unit is only complete when the unit tests
for it are also fully developed.
     This introduction can be challenging at the beginning of the undergraduate
program, especially in the first semester of the first year. One typical obstacle is
that unit-testing frameworks typically rely on advanced mechanisms, which the
students have not been introduced to at that stage of their education. For example,
a typical C/C++ unit test framework relies on the preprocessor and some basic
understanding of linkage, translation units and how to organize a solution into
multiple projects (for example, cf. [6,7]). It may be argued that students should
simply consider the unit-testing framework as a “black box”, whose inner work-
ings they do not need to comprehend. In practice, advanced C/C++ knowledge
may be required to utilize fully and properly a unit test framework properly, as
well as to be able to resolve non-obvious unit test errors occurring at compilation
time or at runtime.
     That said, we have found it quite possible to teach students to start using
unit tests even in the first semester. Our methodology introduced unit tests in a
simplified and controlled manner, which may sacrifice efficiency of the unit tests
for the sake of simplicity. A parallel can be drawn with how first semester classes
in C++ introduce students to input and output using the iostream library. At
the very beginning, students do not have knowledge of classes, inheritance, op-
erator overloading, or ADL. However, the abstraction allows usage std::cout
and std::cin for output and input and they learn to do so quite well, despite
the fact, they will understand the details of the implementation much later, and
many will never become aware of the full implementation. In other words, one
can learn how to use a well-designed programming construct properly, at least on
a certain level, even if they do not yet understand how it is implemented.
     From our experience, the introduction of unit testing early on has many posi-
tive effects. Students learn how to properly design units of code and gain a better
understanding of some of the processes and requirements of real-world develop-

                                        65
ment. They understand the benefits and the importance of program decomposi-
tion by observing them in practice. Finally yet importantly, they acquire a certain
degree of confidence, because they feel they are gaining real-world skills and that
they are growing as IT professionals.
     Of course, as in actual software development, the benefits of unit tests come
at a cost. Firstly, the teaching team must be prepared to work with unit testing in a
training scenario. Obviously, at a minimum they must have a good understanding
of the topic themselves, so that they can relate it to their students. The less evident
fact is that a significant amount of additional work arises for the team:
     • Unit tests developed by the students must be reviewed and appropriate
         feedback should be provided from the teaching team.
     • For each assignment, it should be considered what aspects of unit testing
         should be included in it (for example should the students develop unit
         tests, what part of the grade comes from the unit tests, etc.).
     • Incorporating unit tests in the standard curriculum is not a straightforward
         endeavor and the team needs to make a conscious and focus effort to
         identify how to incorporate them in its daily teaching activities.
     The second aspect of the cost of introducing unit tests is that, just as in real-
world development, unit test development may consume as much time as the
development of a given code unit, or even more. Thus, when assigning tasks to
the students, it should be ascertained they have enough time to develop both the
code units and the tests. As a result, often the team needs to resort to two tactics
– either increase the time limit for assignments, and/or simplify the tasks, so that
the development of code units becomes simpler, leaving more time to develop
unit tests.
     Additionally, tasks given to students need to be carefully selected and
thought through. As unit tests themselves become a part of the curriculum, there
may be tasks that focus solely on the tests. For example, the students may receive
an assignment that requires them to develop one or more units of code, which
are not challenging in themselves, however covering them with unit tests might
be. Respectively, for such assignments it is the unit tests that are being examined
and graded, and not the actual solution. For “hybrid” tasks, where both the solu-
tion and the accompanying unit tests are graded, a careful advance consideration
needs to be made as to what code units has to be developed and what it will take
to cover them with unit tests. Last, but not least, unit tests should not be turned
into a religion: exceptions could also be made where appropriate. For example, if
a class needs to work with the file system, it may not be easy to subject it to unit
tests for students in the first year of their studies. At this stage, they may not yet
be able to deal with topics such as mocking, stubs, etc. Therefore, unit tests for
the problematic areas may be simplified, omitted, or given as an extra credit for
advanced students.

                                          66
2.2 As an aid for students during exams
In the courses, which are in the focus of this paper, students are often subjected
to a particular type of exam. It is of relatively short duration, usually between
1 and 3 hours. At the beginning of the exam, the students receive one or more
problems they are required to provide a solution. For example, they may need to
create a computer program as a solution to the exam problems. In other situations,
students are not required to implement an entire program, but a single class or a
single function or a combination of such program units. Each student solves the
exam on their own, using a computer with preinstalled IDE, compiler, etc. At the
end of the exam, the students submit their work in the form of one or more source
files. Submissions are uploaded to our faculty’s e-learning system, which is an
instance of the popular LMS Moodle [8].
      In such exams, we have found it helpful to provide the students with a com-
prehensive set of unit tests that cover the major use cases of the code units they
are required to develop. Students are free to add unit tests of their own, if they see
fit (this may be useful if they choose to develop additional code units, apart from
the ones mandated by the exam text).
      The unit tests are provided simultaneously with the problem statement of the
exam. Revealing them in the days before the exam may compromise it: based on
the unit tests it is often straightforward to determine what the exam problem is.
This may allow the students to prepare their solution in the days before the exam
has started, using external help, and only be able to reproduce it verbatim during
the actual exam. This defeats its purpose, as the exam checks the ability of the
students to solve a problem in a limited period, while the teaching staff can moni-
tor their work and can make sure they do so independently and do not receive aid
from third parties.
      It is important to note that while the ideal conditions for conducting such an
examination is in a controlled environment with a university-provisioned ma-
chine on which the student has restricted rights, the authors have often been com-
pelled to work in a less suitable examination environment. As an example, some
examinations need to be performed simultaneously for a large number of students
during a time slot with limited availability of university-provisioned machines or
under the conditions of restricted or unavailable Internet access. One potential
solution for such scenarios is to allow students to use their own computers and
distribute the unit tests in advance as an encrypted archive. The students are re-
quired to download the archive in advance and are provided with the decryption
key only at the exam. For increased security, we prefer ZIP archives encrypted
with the industry-standard AES-256 algorithm as opposed to the legacy and less
secure ZipCrypto algorithm [9]. One drawback is the lack of native support for
AES-256 in Windows, macOS, and Linux, which can be overcome by utilizing


                                         67
the open source 7-zip archiver [10]. For testing purposes, students may be pro-
vided with a test encrypted ZIP archive with a known password so that they can
test their ability to decrypt successfully such an archive.
      The unit tests are provided for several reasons. Firstly, they serve as an aid to
students and can help focus their attention to specific corner cases, which they may
otherwise omit. In addition, they will know well before the exam is graded whether
their solution works correctly or not, based on the outcome from running the tests.
Secondly, they help the students work in an environment, which is closer to that in
real-world development, where they will have all kinds of tests to help them reduce
the risk of introducing bugs in their code. Thirdly, it also aids our goal to help stu-
dents build a habit of using unit tests and think of them as “assistants” in solving
their problem. In this case, the tests are provided by another person (members of the
teaching staff) and the students need to use them properly.
      Since the exam is of short duration, the teaching staff do not want the stu-
dents to waste time trying to configure their projects to run the unit tests (unless
that activity is a part of the exam, of course). For this reason, an empty template is
provided to the students. Such template could even be provided in advance of the
exam, as it reveals nothing of substance about the actual examination. The tem-
plate consists of one or more files, which the students download. It may consist
not only of source code files, but also project configuration files that bundle all of
the solution together and can also contain an empty “placeholder” file for the unit
tests. In the day of the exam, the students simply overwrite it with the actual set
of unit tests they are provided.
      Depending on the language being used, the template should be organized in a
way to make it possible to develop the solution without the unit tests interfering.
For example, in a C/C++ project, a Visual Studio solution with two separate proj-
ects inside it may be provided, or a CMakeLists.txt file with two separate
targets, etc.
      A variation of this technique is to provide only a subset of all unit tests that
would normally be used to cover the entire solution. It can further be extended
to two more sub-variations. On the one hand, the students may be required to
complete the set of unit tests to provide optimal coverage. In this case, the unit
tests they develop are too subject to evaluation. The second option is to make the
development of additional unit tests optional. In this case, the students can choose
not to invest time in developing unit tests at the higher risk of missing significant
errors in their solution.

2.3 As a grading aid
When grading exams, the teaching teams often face several issues summarized
below:


                                          68
     • There are repetitive checks that need to be performed on each submission.
         For example, if the task requires the students to write a function f, it may
         be required to test how f behaves when called with certain inputs (does
         it throw an exception, does it calculate proper results, etc.). As another
         example: students may be required to implement a class for which the
         exam text describes its contract. This class may need to be tested in mul-
         tiple scenarios. If executed manually each time, these checks consume a
         lot of time.
     • The checks need to be performed with all submissions and no checks
         must be left out to ensure objective and equal grading. In addition, each
         check must be executed fully, faithfully, and under the same conditions.
         When ran manually, it is easy for a person to forget a certain check, or to
         degrade its quality.
     • The checks should be graded in the same manner for each submission to
         ensure fairness. When executed manually, it is easy to miscalculate the
         number of passed and failed checks.
     As can be easily seen, all issues described above could be addressed by
utilizing unit tests. In fact, the rationale is the very same as for any software
development team, which in itself is instructive for the students. However, one
potential setback here must be stated. In regular software development scenarios,
at a given point in time, the source code may fail to build, or may raise runtime
errors, or not implement all the requirements of a given class’ contract. Over time
all those issues are typically overcome, or, if for some reason it is decided that
some of the requirements will be omitted from scope, then the unit tests need to
be changed accordingly. In short, both the code and the unit tests are not seen as
static, but as evolving as the needs addressed by the software evolve. In contrast,
during an examination, this is not the case. The students work on their solution
for a limited period and after they submit it, it cannot be modified anymore. Ad-
ditionally, both the requirements and the unit tests are provided to students in the
beginning of the examination. They remain static throughout the course of the
exam. Thus, a student may submit a solution, which implements only a part of
the required units/functionality, and which contains bugs and does not build cor-
rectly. It may be impossible to run the original unit tests against it (if it does not
build), or there may be such a bug inside it, which causes most tests to fail, due
to runtime errors, while at the same time there may be valid solutions to certain
parts of the exam text, which should be graded positively.
     For the reasons described above, it is necessary to view unit tests as a com-
plement to, rather than a core part of, the examination. In particular, all solutions
submitted need to be reviewed and graded manually and appropriate feedback to
be given to the students, akin to a peer code review in a regular software develop-
ment process. The unit test results are thus seen as assisting tools and providing

                                         69
additional information to the reviewer with minimal effort. Also, for submissions
which do not implement all required units and/or contain bugs (and thus cause the
tests to fail), the teacher should decide whether to adapt the unit tests, so that they
can run for the given submission, or to leave them out altogether, if this requires
more effort than grading the submission manually.

2.4 As means to improve the quality of teaching materials
Often, when a teacher supplies learning materials to their students, they are bound
to contain errors and omissions. Sometimes such errors are easy to spot and may
be considered as a mere inconvenience to the students. However, they may also
be subtle and manifest in such a way as to change the meaning of the original
document. In such a case, they may cause the students to internalize falsehoods.
Another problem, albeit hugely dependent on the cultural context, is how those
errors affect the reputation of the teacher. In some workplaces, if the materials
supplied by a teacher, contain errors, they may be perceived as a professional
with poor skills and practice. In our experience, this issue may have a significant
impact on the self-esteem and confidence of the teacher and cause considerable
amount of stress.
     One particular type of teaching materials, that are seriously affected by the
issues described above, are those used in exams. For example, in certain pro-
gramming exams, students are given a specific problem and they are required
to solve it in a limited period (e.g., a couple of hours). After the exam, they are
provided a sample solution to the exam. If this solution contains errors (i.e.,
compilation errors, bugs, etc.), this raises a question about the validity of the
exam. The thinking usually is along the following lines: “If the teaching staff
cannot provide a proper solution, when they have more time on their hands and
a greater experience and expertise than the students, is it reasonable to expect
that the students can solve the problem flawlessly in a couple of hours?” Of
course, if the situation is such that there are significant errors in the solution,
or the exam was inappropriately difficult, this line of thinking may be justi-
fied. However, even a simple and inconsequential oversight may provoke such
statements. There are situations when the students with a biased viewpoint and
are caused by emotions such as resentment and anger, ignited after receiving
a poor grade. In such cases, it is difficult to rationalize that in a programming
exam simple oversights are not the underlying cause for a poor grade. Students
may fail to appreciate that, in fact, educators understand that under the stressful
conditions of an exam it is easy to make errors and, as a result, are tolerant of
such oversights. Furthermore, the students need to be taught that human pro-
gramming errors are an integral part in the discipline of software development
and, consequently, need to be managed by appropriate methods and techniques


                                          70
instead of simply being punished by a poor grade, or, similarly, by a financial
deduction for a professional. It should be made clear and explained that it is the
more significant errors, that speak of poor understanding of the contents of the
curriculum, that are indicated with lower scores.
     Based on the above, it is easy to see how unit tests may be beneficial. For
example, when providing source code examples that accompany the lectures, the
teacher can use unit tests to provide some level of guarantee that they work cor-
rectly. In programming exams, a working solution can be developed before the
exam. This serves two purposes. Firstly, it helps the teacher validate the task that
will be given to the students. They can see how large (as volume of source code)
the solution will be; whether it requires the development of additional units of
code, which were not foreseen, while writing the exam text; whether the solution
involves topics not yet covered in the classes, etc. Secondly, this solution may
be released to students immediately after the exam. To validate this solution, the
teacher may cover it with unit tests. This will not only help them fix any errors
and oversights, which could be introduced in the solution, but it will also greatly
reduce the stress experienced by the teaching staff, as the proper test coverage
will give them a peace of mind and increase their confidence in the correctness
of the supplied solution. Finally, the solutions will help validate the unit tests
themselves. Invalid or incorrect unit tests may cause students even greater dis-
tress than an erroneous reference solution. Even if they are considered comple-
mentary material, our experience shows that unit test validity is as important to
the students as the correctness and clarity of the statements of the examination
problems.

3   Practical considerations
    This section contains certain best practices that were discovered by our
teaching teams in the process of integrating unit tests in our work.

3.1 Exam preparation
Step 1: Develop the exam assignment.
      While the text of the exam is being developed, in parallel, a solution to it
is also being written. Of course, it is also possible to achieve this in a waterfall
style – first develop the exam text and after that the solution, but it seems more
natural and easier, when they are done in parallel, or even if the solution precedes
the text of the exam. This may seem strange at first, but here is a viewpoint, which
helps understand the underlying principle. In reality, the text of the exam is only a
physical externalization of a concept, a set of requirements that the solution must
fulfill. If this concept is kept only in our heads, it is easy to omit one or more of
its aspects. By externalizing it as a fully working, assembled software system, we

                                         71
can consider it in its entirety and be able to describe better it. On the other hand,
if we first develop the solution and only then begin to describe it, there is an in-
creased risk of missing one or more of the important points.
     During this process, unit tests may be written in parallel too, but often this
leads to a more complicated workflow. They can also be implemented in a test-
driven development (TDD) style [11], or after the solution is fully developed.
This depends a lot, on what the teaching staff is comfortable.
     During this process, it may be discovered that the exam task is too compli-
cated or too easy, that it contains one or more elements that should be removed,
that it does not address properly the curriculum of the course, etc. If this turns out
to be the case, there is an excellent and timely opportunity for the problem to be
adjusted appropriately.

Step 2: Reevaluate the scope of the assignment.
     After the solution is completed and properly covered by unit tests and they
all pass, it may be considered valid. At this point, it should be considered if the
amount of source code that needs to be written is proper for the time limit of the
exam, whether the required knowledge and skills correspond to what the students
have been taught in the course, etc. The statement of the exam problems should
also be given a final revision. Our experience shows that during this final revi-
sion process it is crucially important that the three exam components: problem
statement, unit tests, and reference solutions, are all kept in sync, and any change
made to one of them is immediately propagated to the other to avoid confusion
and misalignment.

Step 3: Determine whether to supply the solution to students after the exam.
     Depending on the situation at hand, it should be decided whether to supply
the solution to the students and if so, when (i.e., immediately after the exam, or
after a given period, etc.). For example, in certain cases, the students are allowed
to attempt a given exam multiple times. Typically, a given time interval must pass
before students are granted another attempt (a week, a month, etc.). If one sup-
plies the solution immediately after the first attempt, this may make it pointless to
give an option for a second try.

Step 4: Determine whether to supply the unit tests to students after the exam.
     It should also be considered whether to supply the unit tests as part of the
supplied solution. This may be an excellent opportunity for the students to learn
how to cover their solution with unit tests, how to organize them, etc. However, it
also means that the teaching staff needs to spend additional time organizing (and
probably documenting) the unit tests, so they are in an appropriate form, which
can be presented to the students.

                                         72
Step 5: Determine whether to supply the unit tests to students during the
exam.
     Another thing to consider is whether to supply the unit tests to the students
during the exam and if so – whether to supply the entire suite of tests, or just a
proper subset of it. It should also be considered whether the students will be re-
quired to develop unit tests themselves and if so, how this will affect evaluation.
In this case, the teaching staff, which oversees the exam, may need to be quali-
fied to provide aid and/or to answer any questions that the students may have,
related to the unit tests. This however is not always necessary, nor desired. For
example, in certain exams in our university, the staff overseeing the examination
is not allowed to discuss the topic of the examination and/or to give any advice
or guidance.

3.2 Exam grading
If the teaching team has followed the practices described in the previous section,
they will have a suite of unit tests that can be used to check the submissions.
However, there are some important things to consider.
     Let us assume the exam statement requires the students to develop a given set
of code units. For example, they may be required to implement specific classes,
based on a description of their interfaces. A given student may fail to implement
all units. In such a case, there will be unit tests that refer to units not present in the
source code and this will most likely prevent us from running a successful build.
A possible solution is for the examiner to provide simple, “dummy” definitions
for such units, which fail all tests. For example, if students are required to imple-
ment a certain function, but it is missing in a certain submission, the examiner
can add a version of the function, which does nothing, but fail. Usually, unit test
frameworks have a dedicated function/macro for that purpose, or the programmer
can simply write an “assert false” statement in the code.
     Secondly, it is possible there are errors in the submission, which prevent the
program from building properly. In such a case, the examiner can comment out
the problematic code and supply a “dummy” unit, as described above.
     Thirdly, a very common (at least in our experience) mistake made by stu-
dents is to misspell a code unit. For example, the exam text may ask them to
implement a function called sortArray, but they may call it sort_array,
or SortArray in their code. This will also cause the unit tests to fail. An easy
solution may be to add a definition, which links the identifier used in the unit tests
to the one the student has written. For example, the examiner may define a macro,
which replaces sortArray with the identifier used by the student, or they may
define a new function sortArray, which simply delegates to the function de-
veloped by the student.


                                           73
     All of the above, however, may cause a significant amount of work for the
examiner and for each submission; it should be considered if the effort spent,
supplying dummy units outweighs the benefits of using unit tests to grade the
submission.
     To alleviate this issue, it may be beneficial to design the exam with unit
testing in mind. For example, students may be required to organize their code
in separate files/modules, so that code units are divided into subsets that can be
tested independently. Of course, this depends a lot on the exam and such an ap-
proach may not be possible.
     In our view, it is important to state explicitly that unit tests should be viewed
as an aid, and they can in no way replace proper examination of the submitted
solution by a human.
     • The examiner should always check the source code, regardless of wheth-
         er it passes all checks or not.
     • The final grade should never be automatically assigned, based on how
         many tests have passed successfully, but should be determined by the
         examiner after reviewing the solution. If any grade suggestion is being
         automatically calculated, all team members must have a clear understand-
         ing that it is intermediate and subject to review and change.
     • The grading process should be clear and transparent to students. We rec-
         ommend that the teaching team assure explicitly students that each sub-
         mission is carefully reviewed and graded by the members of the team.
     It should also be stated that there are limits to what can be checked with unit
tests. Obviously, they cannot be used to evaluate the coding style, and may not
be able to detect bad practices. It is also neither feasible, nor necessary to check
for every possible error that can be introduced in each code unit. On the other
hand, submissions usually contain all sorts of errors – memory leaks, improperly
organized code, poor architecture, bad complexity, etc. A certain balance must be
found: again, an excellent teaching opportunity for students, as such a balance is
important in actual software development as well.
     Finally, it may be beneficial for the team to keep the unit tests used to check
submissions in a centralized place and to improve them as grading proceeds. For
example, one examiner may find that many submissions contain a certain type of
error, which is not detected by the unit tests. It is very likely that the error may be
present in the submissions checked by the other members of the team. Thus, the
examiner may implement one or more tests that detect the error and push them
back to the central repository. The other examiners can pull the improved unit
tests and run them again against the submissions they are processing.


                                          74
4   Related work
The application of unit tests to programming education has been explored
by multiple authors in a variety of contexts. Automated testing of student
submissions in programming courses for the purposes of grading has a long
history spanning over 50 years and have been studied extensively [12,13]. Most
of the case studies rely on end-to-end tests, which test the entire solution on
various inputs and validate against an expected set of outputs. Such an approach
aids the development of a mindset in the students for considering a wide spectrum
of input possibilities for their program in order to ensure obtaining a maximum
grade. It also teaches them that the correctness of their solution can be evaluated
automatically given the appropriate set of tests, which is something they can
do themselves before submitting their programs for grading. With end-to-end
tests, incremental grading is typically achieved by developing an appropriate set
of tests, which aim to capture corner and error cases (e.g., small inputs, null
or invalid inputs, large inputs) or potential logical flaws (e.g., by falsifying a
potentially incorrect assumption about the input). This approach, however, rarely
focuses on testing individual units of the solution, and a simple omission in the
program (e.g., off-by-1 error) can lead to a zero score.
     The discipline of writing unit test has typically been taught in intermediate
and advanced programming courses dedicated to software testing and production
programming, as opposed to introductory programming courses (cf. [14,15]). It is
only a recent trend that software-testing practices are increasingly introduced in in-
troductory courses. A useful systematic survey of research related to the application
of unit test to programming education [16] offers similar observations to ours: the
benefits of introducing unit tests early on can extend significantly beyond grading,
but also into curriculum, familiarizing students with testing tools and processes, im-
proving course materials, and developing perceptions and behaviors toward testing.
     Some authors, such as Howles [17] and Edwards [18] argued as early as
2003 that fostering a software quality culture should start from the very early
stages of programming education. This idea has been taken further by Janzen and
Saieidian [19], who coined the team test-driven learning, inspired by test-driven
development, where students learn by starting from the unit tests before design-
ing their solution. The authors later attempted to evaluate the effectiveness of this
approach with a quantitative approach with moderately positive results, although
the small amount of data was insufficient to draw broad conclusions [20]. The
authors discuss similar benefits to what we have noticed in terms of teaching the
underlying concepts of software quality practices and the change in perception of
students where unit tests are concerned.
     Some authors note the danger of introducing undue cognitive load on intro-
ductory-level students by adding concepts such as unit testing and test-driven


                                         75
development early on [21]. Their proposed solution is the development of a sim-
plified macro language for defining unit tests in function comments as opposed to
applying a fully-fledged unit-testing framework. Our approach is slightly differ-
ent: we insist on using an actual unit-testing framework without the application of
intermediate tools, but we carefully select a framework by considering simplicity
of usage and apply only a limited set of capabilities, which we found sufficient
for an introductory level course.

5    Conclusion
Based on several years of experience in utilizing unit tests in teaching and
examinations in CS/SE/IS courses, the authors believe that their early introduction
is highly beneficial for students. In addition to the purely practical benefits of
providing a degree of safety and confidence in the correctness of the developed
code, be it examination solutions or teaching materials, the exposure of students
to this practice helps them achieve a first-hand appreciation of its advantages
and drawbacks. We have found that it helps them build their own style of
approaching unit testing in parallel with finding the programming style that they
feel most comfortable. Last, but not least, students appreciate obtaining a level of
familiarity with the unit testing practice, which is a valued and beneficial practice
in commercial software development, especially of widely used information
systems.
     As part of future discussion, we will aim at mapping the introduction of unit
tests to specific learning outcomes (cf. [5]) in order to isolate better their contri-
bution to the overall learning process.

6    Acknowledgements
The authors gratefully acknowledge financial support by Sofia University “St.
Kliment Ohridski” grant 80-10-173/05.04.2021.

References
1. Peláez C. (2016). Unit testing as a teaching tool in higher education. In SHS Web of Conferences
   (Vol. 26, p. 01107). EDP Sciences.
2. IEEE Computer Society (2014) Guide to the Software Engineering Body of Knowledge (SWE-
   BOK) Version 3.0. Chapter 4 Software Testing pp.4-2
3. Robert M. (2020) Clean Agile. Chapter 5: Technical practices; section “Test-driven-develop-
   ment”. Pearson.
4. Robert Martin (2009) Clean Code. Chapter 9: Unit Tests. Prentice Hall.
5. Kanabar V., & Kaloyanova, K. (2017). Identifying and embedding behavioral competencies in
   Information Systems courses.
6. Catch2 Tutorial. https://github.com/catchorg/Catch2/blob/devel/docs/tutorial.md, last accessed
   2021/05/05.


                                                76
7. Googletest Primer. https://github.com/google/googletest/blob/master/docs/primer.md, last ac-
    cessed 2021/05/05.
8. Moodle. https://moodle.org/, last accessed 2021/05/05.
9. Stay, M. (2001, April). ZIP attacks with reduced known plaintext. In International Workshop on
    Fast Software Encryption (pp. 125-134). Springer, Berlin, Heidelberg.
10. 7-zip, https://www.7-zip.org/, last accessed 2021/05/05.
11. Beck, K. (2003). Test-driven development: by example. Addison-Wesley Professional.
12. Kirsti M Ala-Mutka (2005) A Survey of Automated Assessment Approaches for Programming
    Assignments, Computer Science Education, 15:2, 83-102, DOI: 10.1080/08993400500150747.
13. Daly C., Livingstone D., and Orwell J. (2005) Automatic test-based assessment of programming:
    A review. J. Educ. Resour. Comput. 5, 3, 4–es. DOI:https://doi.org/10.1145/1163405.1163409.
14. Garousi V., and A. Mathur (2010) Current State of the Software Testing Education in North
    American Academia and Some Recommendations for the New Educators, 23rd IEEE Con-
    ference on Software Engineering Education and Training, 2010, pp. 89-96, doi: 10.1109/
    CSEET.2010.29.
15. Allen E., Cartwright R., & Reis C. (2003). Production programming in the classroom. ACM
    SIGCSE Bulletin, 35(1), 89. doi:10.1145/792548.611940.
16. Passos Scatalon L., Jeffrey C., Carver, Rogério E. G., and Francine Barbosa E. (2019) Soft-
    ware Testing in Introductory Programming Courses: A Systematic Mapping Study. In Pro-
    ceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE
    ‘19). Association for Computing Machinery, New York, NY, USA, 421–427. DOI:https://doi.
    org/10.1145/3287324.3287384.
17. Howles T. (2003). Fostering the growth of a software quality culture. ACM SIGCSE Bulle-
    tin,35(2), 45 – 47.
18. Edwards S. H. (2003). Rethinking computer science education from a test-first perspective.
    Companion of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programming,
    Systems, Languages, and Applications - OOPSLA ’03. doi:10.1145/949344.949390.
19. Janzen D. S., & Saiedian H. (2006). Test-driven learning. ACM SIGCSE Bulletin, 38(1), 254.
    doi:10.1145/1124706.1121419.
20. Janzen D., & Saiedian H. (2008). Test-driven learning in early programming courses. ACM
    SIGCSE Bulletin, 40(1), 532. doi:10.1145/1352322.1352315.
21. Lappalainen V., Itkonen J., Isomöttönen V., & Kollanus S. (2010). ComTest. Proceedings of the
    Fifteenth Annual Conference on Innovation and Technology in Computer Science Education -
    ITiCSE ’10. doi:10.1145/1822090.1822110.


                                               77