Stager: Simplifying the Manual
    Assessment of Programming Exercises
            Christopher Laß, Stephan Krusche, Nadine von Frankenberg, Bernd Brügge
                                               Technische Universität München
       christopher.lass@tum.de, krusche@in.tum.de, nadine.frankenberg@in.tum.de, bruegge@in.tum.de


Abstract                                                        programming exercises in large courses can take a
Assessing programming exercises requires time and ef-           considerable amount of time and effort. Automatic
fort from instructors, especially in large courses with         assessment systems (also called auto-graders) aim at
many students. Automated assessment systems re-                 flexibility and scalability in large courses, and allow
duce the effort, but impose a certain solution through          to integrate exercises into lectures [Krusche et al.,
test cases. This can limit the creativity of students and       2017b]. These systems utilize, among others, version
lead to a reduced learning experience. To verify code           control systems (VCS) to store the code solutions of
quality or evaluate creative programming tasks, the             students in repositories and test cases that are exe-
manual review of code submissions is necessary. How-            cuted on a continuous integration server to assess
ever, the process of downloading the students’ code,            the solution to a programming exercise automatically
identifying their contributions, and assessing their            [Heckman and King, 2018; Krusche and Seitz, 2018].
solution can require many repetitive manual steps.                 While automated assessment systems significantly
   In this paper, we present Stager, a tool designed            reduce manual assessment effort, they have draw-
to support code reviewers by reducing the time to               backs. Predefined test cases cannot cover all possible
prepare and conduct manual assessments. Stager                  solutions and therefore impose a certain solution on
downloads multiple submissions and adds the stu-                the students. Some students are limited in their pro-
dent’s name to the corresponding folder and project,            gramming skills, while other students can exploit the
so that reviewers can better distinguish between dif-           test cases by repetitive trial-and-error submissions.
ferent submissions. It filters out late submissions and         Especially first year students who are new to program-
applies coding style standards to prevent white space           ming often experience problems when trying to for-
related issues. Stager combines all changes of one              mulate their solution and thoughts as an executable
student into a single commit, so that reviewers can             computer program [Robins et al., 2003]. Such sub-
identify the student’s solution more quickly.                   missions can be overly complicated, and assessment
   Stager is an open source, programming language               systems cannot (yet) provide enough useful feedback
agnostic tool with an automated build pipeline for              in that regard. Furthermore, some programming exer-
cross-platform executables. It can be used for a va-            cises cannot be assessed automatically. The automated
riety of computer science courses. We used Stager               grading of creative assignments with open problem
in a software engineering undergraduate course with             statements is hardly possible because different solu-
1600 students and 45 teaching assistants in three sep-          tions exist [Knobelsdorf and Romeike, 2008; Krusche
arate programming exercises. We found that Stager               et al., 2017a]. An example for such an assignment
improves the code correction experience and reduces             is to implement a creative collision strategy in a 2D
the overall assessment effort.                                  racing game. Automated test cases could be able to
                                                                validate a collision, but are incapable of assessing
1     Introduction                                              the creativity or code quality of the solution. As a
                                                                result, manual assessment can be beneficial, even in
The number of students in university courses is in-             large courses that have fully implemented automated
creasing. The number of new undergraduate students              grading solutions.
at our computer science department increased by 81 %
                                                                   However, the process of manually assessing multi-
between 2013 (1110 students) and 2017 (2005 stu-
                                                                ple students’ solutions requires repeated manual steps.
dents)1 . Practical programming exercises are essential
                                                                Tasks such as finding the next student’s repository,
in computer science education and help students ac-
                                                                downloading the source code, and renaming the fold-
quire important skills in software development [Staub-
                                                                ers and projects names for standardization can be
itz et al., 2015]. However, a manual assessment of
                                                                time-consuming and error-prone. Determining a stu-
  1 https://www.tum.de/die-tum/die-universitaet/                dent’s contribution is challenging when the exercise
die-tum-in-zahlen/studium                                       builds upon a provided code template and when the


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                          34
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


students use multiple commits in their code repository.                        can therefore be less time consuming to manually as-
Then it becomes difficult to separate the provided                             sess solutions rather than to design the automated
template and the final solution.                                               assessment system [Ala-Mutka, 2005].
   In this paper, we present Stager, a tool that is de-                           Further, students can become distracted by au-
signed to support the manual assessment of program-                            tomated feedback. For instance, students may be
ming exercises. Reviewers, e.g. teaching assistants or                         tempted to fix only the failing tests instead of focus-
instructors, can automate the manual steps that are                            ing on the assignment [Heckman and King, 2018].
necessary to prepare the students’ code repositories,                          Automated assessment systems circumvent the de-
for instance download all repositories at once, and                            tection of frequent mistakes or misunderstandings
thereby reduce the manual assessment time. The idea                            among students. The understanding and resolution
for Stager evolved during an undergraduate university                          of common errors is an essential learning experience
course with 1600 students and 45 teaching assistants.                          for students. Semi-automated systems combine the
An initial implementation was used for three separate                          mentioned aspects by providing automated grading,
programming assignments.                                                       as well as manual feedback. Such systems offer per-
   The remainder of the paper is organized as follows.                         sonalized feedback to some extent, for instance the
We describe related work focusing on existing auto-                            instructor can annotate a static assessment [Gerdes
mated assessment solutions and the limitations of                              et al., 2017]. Other systems give the student instant
automated assessment approaches in Section 2. In                               feedback if the student’s solution is correct. If it is
Section 3, we cover Stagers’ approach to automat-                              not, the instructor reviews each solution and can give
ing the recurring manual steps during the correction                           additional feedback if required [Insa and Silva, 2015].
of programming exercises. We describe design deci-                             Many systems focus on the grading itself, but not on
sions, the exercise workflow with Stager, the configu-                         the process the instructor has to follow to obtain the
ration possibilities of the tool, and the concrete tasks                       students’ solutions.
of Stager, e.g. the Download repositories task. We ana-                           Some commercially available systems and tools that
lyze the improved code assessment experience of the                            are used in computer science (CS) courses offer fea-
teaching assistants by means of an experience report                           tures that aim at simplifying this process. In 2000,
in Section 4, where we also present the results of a                           Jackson proposed an approach that pre-processes stu-
quantitative analysis of Stager’s use in three program-                        dent submissions (sent via e-mail) by removing irrele-
ming exercises. Section 5 concludes the paper and                              vant information or unpacking files [Jackson, 2000].
provides directions for future work.                                           For submissions via repositories, pull requests (also
                                                                               called merge requests) in GitHub2 , GitLab3 , or Bit-
2     Related Work                                                             bucket4 allow students to commit their changes into
                                                                               separate branches. After requesting the code to be
Several automated assessment system approaches for
                                                                               merged into the main branch, i.e. a submission, re-
programming assignments exist [Heckman and King,
                                                                               viewers can highlight the student’s contribution as
2018; Knobelsdorf and Romeike, 2008; Krusche and
                                                                               difference to the template code and provide feedback
Seitz, 2018; Pieterse, 2013]. Advantages include a de-
                                                                               by requesting changes. While pull requests can also
crease in the workload of course instructors and timely
                                                                               be integrated with continuous integration systems,
feedback for students [Pieterse, 2013]. Automated
                                                                               e.g. using TravisCI5 to detect compile errors and to
systems work well to grade programming assignments
                                                                               run automated tests, reviewers might still need to
consistently and evaluate specific aspects, e.g. the
                                                                               download the source code and execute it to verify if
functionality [McCracken et al., 2001] or efficiency of
                                                                               all requirements of the problem statement have been
a system [Jackson and Usher, 1997]. However, they
                                                                               solved.
are missing the benefit of personal feedback which
                                                                                  GitLab introduced a “Squash and Merge” option
a manual grading approach could provide. The test
                                                                               which “applies all of the changes in a merge request
cases used by such systems cannot assess the code
                                                                               as a single commit, and then merges that commit us-
quality and “elegance” of the solution [Poženel et al.,
                                                                               ing the merge method set for the project”6 . This cleans
2015].
                                                                               up the commit history and can make it easier to iden-
   Building a robust automated assessment system
                                                                               tify the contribution of one particular student. Tools
amounts to a heavy workload, whereby the definition
                                                                               and services, such as Gerrit7 , support code reviews
of the test cases is (usually) the most time consuming
                                                                               that enable the reviewer to see the code difference,
activity [Cerioli and Cinelli, 2008]. This workload is
                                                                               and provide the option to leave in-line comments.
amplified when designing tasks with some degree of
freedom of solutions [Chen, 2004]. The degree of free-                            2 https://github.com
                                                                                  3 https://gitlab.com
dom of solutions indicates the difficulty of the exercise
                                                                                  4 https://bitbucket.org
[Striewe and Goedicke, 2013], meaning that a diffi-                               5 https://travis-ci.org
cult exercise has more possible solutions and therefore                          6 https://docs.gitlab.com/ee/user/project/merge_
has an increased workload to design the automated                              requests/squash_and_merge.html
assessment system. Depending on the class size, it                               7 https://www.gerritcodereview.com


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        35
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


However, such tools primarily focus on continuous                                    Reviewer              Stager                  Student
feedback rather than assessing a student’s solution.
                                                                                                                             1.1 Receive
                                                                                                                             exercise and
3     Stager’s Approach                                                                                                     code template

This section presents an approach that automates man-                                                                          1.2 Solve
ual steps during the correction of programming ex-                                                                             exercise

ercises in order to prepare student repositories for
easier assessment. We show how code reviewers can                                  2.1 Conﬁgure                             1.3 Commit and
                                                                                       Stager                                push solution
use Stager. Furthermore, we explain the different
tasks that are automatically executed by Stager.                                    2.2. Trigger
   Figure 1 illustrates the exercise workflow including                               Stager
the manual assessment with the help of Stager as a                                                      3.1 Download
                                                                                                         repositories
UML activity diagram. As precondition for this work-
flow, every student must have their own repository
                                                                                                         3.2 Rename
with the code template for the exercise in a VCS8 . Af-                                                     folders
ter the students complete the exercise, they commit
and push their solutions to the VCS (action 1.3).                                                       3.3 Filter late
   Before reviewers start to work, they need to config-                                                 submissions

ure Stager (action 2.1). Then, they trigger Stager to
process different tasks (actions 3.1 ... 3.6), such as                                                   3.4 Rename
                                                                                                           projects
Download repositories or Normalize code style. Finally,
the reviewer can manually assess the pre-processed
                                                                                                        3.5 Normalize
submissions and give qualitative feedback (action 4.2                                                     code style
and 5.) to the students in any arbitrary form (e.g.
uploading the feedback into an exercise management                                  4.1 Manually        3.6 Combine
system such as Moodle9 ).                                                              assess             commits
                                                                                      prepared
   The action 2.1 Configure Stager of the Reviewer is                               repositories
described in Section 3.1. Stager’s actions are described
as tasks in Section 3.2. The numbering in Section 3.2                                 4.2 Give                                 5. Receive
                                                                                     qualitative                               qualitative
aligns with the corresponding action in Figure 1.                                    feedback                                   feedback

3.1     Stager’s Setup
Stager is free, open source, and available under the
MIT license10 . It is platform independent and pro-                            Figure 1: Exercise workflow with Stager: students
gramming language agnostic, making Stager univer-                              complete the exercise and upload their solutions to a
sally applicable. It is written in the Go programming                          VCS. The reviewer configures and triggers Stager to
language11 and makes use of the distributed version                            process different tasks, e.g. 3.1 Download repositories.
control system git12 . Cross-platform executables can                          Afterwards, the reviewer manually assesses the pre-
be downloaded from the automatic build pipeline or                             pared repositories and gives qualitative feedback to
compiled from the source code.                                                 the students.
   Stager’s configuration is separated into two files
students.csv and config.json, based on how frequently
                                                                               password have to be set with valid credentials and
the settings change. The list of students in students.csv
                                                                               access rights to the VCS. For SSH, Stager uses the
might not change during the course duration, while
                                                                               operating system’s global SSH settings and therefore
config.json changes for every exercise. The configu-
                                                                               does not require further configuration.
ration procedure must be completed after the code
                                                                                  2. Latest commit hash of a programming exer-
template was finished and before Stager is executed.
                                                                               cise template: The programming exercises that are
Stager or its configuration does not add any precon-
                                                                               distributed to the students build upon a given code
ditions or constraints on the students. The following
                                                                               template. The SHA hash of the latest commit for
settings can be edited:
                                                                               the code template, meaning the latest code changes
   1. Credentials: Remote git repositories can be ac-
                                                                               the reviewer included, must be set for the JSON key
cessed via the SSH or HTTP protocols [Lawrance et
                                                                               squash_after. This setting is required for Stager to
al., 2013]. For HTTP, the JSON keys username and
                                                                               distinguish between the given code by the reviewer
   8 There are multiple tools available that automate this step, e.g.
                                                                               and code written by the student. This configuration
ArTEMiS, Github Classroom, etc.                                                option is used by the task Combine commits and is
   9 https://moodle.org
  10 https://github.com/arubacao/stager                                        further elaborated in Section 3.2.
  11 https://golang.org                                                           3. Deadline for homework submission: Students
  12 https://git-scm.com                                                       have to submit their homework in a given time-frame.


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        36
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


For example, the homework must be submitted by                                 purpose. For example, the Rename folders task ap-
Sunday midnight because the programming exercises                              pends the student’s name to the corresponding folder.
will be discussed in class on Monday morning. How-                             Stager is composed of multiple tasks (shown in the
ever, VCSs have limitations when it comes to time-                             Stager swimlane in Figure 1) that adhere to certain
based repository access. As described in more detail                           rules and are sequentially performed during the tool’s
in Section 3.2, the task Filter late submissions allows                        execution. The implementation allows a clear distinc-
to overcome these VCSs limitations. The deadline for                           tion of tasks, such that each task addresses a separate
students submitting their homework is set with the                             purpose. Therefore, it is easy to add new tasks or re-
JSON key deadline. The standard datetime format                                move existing ones conceptually and implementation-
YYYY-MM-DD HH:MM:SS must be used. For example,                                 wise in the future. For example, when the reviewer
2018-08-31 23:59:59 is valid.                                                  does not need a certain task, only one line of code
   4. Remote repository URL schema: Each student                               within the array of tasks has to be removed. Fur-
has a personal repository that can be accessed with                            thermore, tasks must be idempotent, meaning that
a unique URL. A general URL schema can be derived                              multiple executions of the task lead to the same out-
from these unique URLs, where the students’ identi-                            put. Even though tasks are independent, they are
fiers are substituted by a placeholder. For example, for                       processed sequentially, i.e. the order of the tasks is
the repository URL (1) of student 10001, the derived                           relevant. For instance, repositories first have to be
general URL schema is (2). If the repositories are ac-                         downloaded before other tasks have local file access.
cessed using HTTP as in the example, two additional                               The goal of Stager is to simplify the manual assess-
placeholders must be set for the reviewer’s credentials                        ment of programming exercises by modifying source
(3). The resulting schema is set for the key url.                              code, files, and repositories. Repetitive manual steps
                                                                               that are required for the reviewer to start the assess-
                                                                               ment should be reduced or eliminated by Stager. We
        https://repo.uni/cs101/exercise01-10001.git                 (1)        identified the following relevant tasks (listed accord-
           https://repo.uni/cs101/exercise01-%s.git                 (2)        ing to the order of execution) and describe each of
   https://%s:%s@repo.uni/cs101/exercise01-%s.git                   (3)        them in detail in the following:
                                                                               1. Download repositories
   5. List of students: In addition to the mentioned
settings, Stager requires a list of students the reviewer                      2. Filter late submissions
wants to assess. The students’ names and identifiers                           3. Rename folders
are defined in the students.csv file with the format
                                                                               4. Rename projects
shown in Listing 1. All mentioned people and courses
in this paper are placeholder names and do not exist                           5. Normalize code style
in reality.                                                                    6. Combine commits
               Listing 1: Sample students.csv                                     1. Download repositories: In order to better de-
name , i d                                                                     termine the software quality and verify if all require-
John Doe ,10001                                                                ments of the problem statement have been solved by
Jane Roe ,10002                                                                the students’ submissions, it is necessary for the re-
                                                                               viewer to compile and execute their homework source
   After configuration, the Stager executable, con-                            code locally. Hence the repositories must be available
fig.json, and students.csv are placed in a dedicated                           on the reviewer’s computer. The initial task clones all
and empty folder. Stager can then be executed via a                            repositories of the predefined students as-is and all at
double click or from the terminal. Listing 2 illustrates                       once to a given folder on the reviewer’s computer. This
this workflow. After Stager terminates, the students’                          first task takes potential existing local repositories into
repositories are locally available and prepared by the                         account and overwrites them. It ensures that each lo-
tasks described in the following Section 3.2.                                  cal repository is in sync with the remote repository
  Listing 2: Folder setup and execution of Stager                              and in a clean state.
                                                                                  The following tasks modify files and therefore
$ cd ~/cs101 / a s se s s m e nt 3                                             require write access to the repositories. These
$ ls                                                                           modifications can only be performed when the
config . json stager students . csv                                            repositories are locally available. Consequently, the
$ ./ stager                                                                    Download repositories task must be first.

3.2     Stager’s Tasks                                                            2. Filter late submissions: Homework submis-
Stager provides an extendable framework which                                  sions are tied to a hard deadline. With web-based
makes it easy to add or remove tasks according to                              VCSs like Bitbucket or GitLab, it is hardly possible to
the reviewer’s requirements. Tasks are functions that                          block student commits after a given deadline. Stu-
modify the repository or its contents and have a single                        dents could exploit this situation and extend their


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        37
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


time to finish the exercise as shown in Figure 2. The                          distinguish between students within source code edi-
Filter late submissions task analyzes the commit times-                        tors or integrated development environments (IDEs),
tamps and sets the repository to the state of the pre-                         e.g. Eclipse13 (Figure 4), and allows to import mul-
configured deadline in config.json. Commits after the                          tiple projects at the same time. Eclipse, for instance,
deadline are not considered anymore. This way time-                            does not allow to import multiple projects with iden-
based limitations of web-based VCSs are bypassed.                              tical names, which makes it impossible to compare
However, this procedure is not fully forgery-proof,                            multiple solutions without renaming the projects.
since commit timestamps can be manipulated.
   File changes made prior to this task would be
striped out, since the repository is set to the state
of the pre-configured deadline. Therefore, the Fil-
ter late submissions task must be executed before any
other task can modify files.


                                                                               Figure 4: Prepend student names to projects so that
                                                                               the submissions of multiple students can be imported
                                                                               into Eclipse and reviewed at the same time. Jane
                                                                               Roe and John Doe are prepended to the project name.
                                                                               Otherwise the reviewer could only import one Eclipse
Figure 2: Filter late homework submissions by exclud-                          project at the same time.
ing commits after the homework submission deadline.
The two commits above the red line are after the                                  5. Normalize code style: The encoding and code
deadline while the two commits below the red line                              style of the provided code template and the final
are before the deadline.                                                       student’s contribution should be consistent. Windows
                                                                               and Unix-based systems use different line breaks for
   3. Rename folders: Depending on the naming                                  code files by default. Windows uses carriage return
convention, only the student’s identifier is used for                          and line feed “\r\n” as a line ending, whereas Unix
the repository name. The resulting folders can be                              based systems use just line feed “\n”. Also, IDEs
hard to keep separate and to associate with the correct                        might automatically enforce a different code style
student. For obfuscation and identity protection this                          standard than desired. As illustrated in Figure 5, this
is reasonable, but counterproductive on the reviewer’s                         could lead to non-relevant changes and obscured
local system since it is easier to identify a student                          code differences in commits, thereby making it harder
by their name and not through their id. Once the                               to assess the submission. To avoid these non-relevant
repositories are locally available, the Rename folders                         file changes by the student, Stager invokes a linter
task appends the student’s name to the corresponding                           that automatically normalizes the code to the same
folder as illustrated in Figure 3.                                             standards as the initial template. This means that all
                                                                               white space related changes, e.g. line breaks, empty
                                                                               spaces and tabs are removed, so that the reviewer
                                                                               does not need to analyze them. Each programming
                                                                               language has its own linting strategies, utilizing
                                                                               existing tools like eslint14 for Javascript or checkstyle15
                                                                               for Java. This hides pure white space and encoding
Figure 3: Append names to folders to better distin-                            changes and allows code reviewers to focus on the
guish between students. Without the names john_doe                             actual contributions by the students.
and jane_roe, it would be difficult to identify which
folder belongs to which student.                                                  6. Combine commits: Reviewers provide code
                                                                               templates as a starting point for the programming
   4. Rename projects: As precondition of Stager,                              exercise, in which the student has to make changes
each student must have their own repository for each                           across multiple files. These changes can be small com-
published exercise. The content of these repositories                          pared to the provided template and consequently hard
is always identical. As a result, the project names                            to identify by the reviewer. In order to determine the
are also identical for all students. This leads to the                         student’s contribution more effectively, it is helpful to
problem that reviewers could only import one project                           see the exact difference between the template and the
at the same time into Eclipse in order to review and                           final submission instead of only looking at the final
execute the code. Renaming all projects manually is                            submission. VCSs provide easy comparison methods
time-consuming and error-prone. Analogue to the Re-                              13 https://www.eclipse.org
name folders task, a student’s name is prepended to the                          14 https://github.com/eslint/eslint

corresponding project name. This makes it possible to                            15 https://github.com/checkstyle/checkstyle


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        38
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


                                                                               4 Experience Report
                                                                               The following experience report describes the lecture-
                                                                               based course Introduction to Software Engineering
                                                                               (EIST16 ) in which we used Stager to improve the man-
                                                                               ual assessment of programming exercises. EIST is a
                                                                               second semester bachelor’s course with a heteroge-
                                                                               neous group of students including computer science,
                                                                               business informatics, and business students.
Figure 5: There is no visual change in the two code                               The course assumes that students have successfully
blocks in this figure. However, non-visible line breaks                        completed an introductory course in computer science
cause the comparison tool to show these lines. This                            (e.g. CS1) and are familiar with object-oriented pro-
can make it time-consuming for the reviewer to iden-                           gramming in Java. The course’s learning goals are
tify relevant changes.                                                         that students are able to apply relevant concepts and
                                                                               methods in all phases of software engineering projects
                                                                               including analysis, design, implementation, testing,
                                                                               and delivery. Further, students know the most im-
                                                                               portant terms and concepts and can apply them in
where the difference made by a single commit is vis-
                                                                               modeling and programming tasks. They are aware
ible. However, a submission can consist of multiple
                                                                               of the problems and issues that generally have to be
commits. The reviewer would have to compare each
                                                                               considered in software engineering projects. Table 1
commit and memorize the changes themselves, which
                                                                               shows the schedule and the content of the course.
makes the standard comparison method impractical
and error-prone.                                                                        Week Content
                                                                                           1    Introduction
   The combine commits task combines the students
                                                                                           2    Model-Based Software Engineering
commits into one single commit. The reviewer does
                                                                                           3    Requirements Elicitation and Analysis
not need to review multiple changes within the same
                                                                                           4    System Design I
code line and can omit changes that have been added
in one commit and removed again in a later commit.                                         5    System Design II
This single commit also contains all Stager related                                        6    Object Design
changes (e.g. white space changes). As a result, it is                                     7    Model Transformations and Refactorings
easy for the reviewer to quickly identify the student’s                                    8    Pattern-Based Development
contribution and to decide if the solution is correct.                                     9    Lifecycle Modeling
In addition to the existing branches with the complete                                    10    Software Configuration Management
commit history, Stager adds the combined commit into                                      11    Testing
a separate branch. Thus, information is only added                                        12    Project Management
and not removed from the repository and the reviewer                                      13    Repetitorium
could still see the whole commit history. Web-based
VCSs like GitHub also offer a squash feature, however,                         Table 1: The course Introduction to Software Engineer-
the reviewer would have to trigger it manually for                             ing lasts 13 weeks.
each repository.
                                                                                  1600 students were registered for the course in
   Figure 6 illustrates this process with an example                           2018. One lecturer and three exercise instructors
student John Doe and an Instructor. The Instructor                             were involved in the organization of the course. 45
provides a code template. John Doe works on the                                teaching assistants were responsible for holding 74
given exercise. Over a period of one day, John submits                         exercise group sessions per week. Teaching assistants
his work separated across multiple commits. As seen                            were mainly bachelor students in the fourth semester,
in the bottom right corner of Figure 6, one assignment                         who successfully completed the same course in the
was to Add new car types to the game. Since John                               previous year.
submitted multiple code changes and removed the                                   The course design is based on interaction and as-
“TODO” lines within the code, the reviewer would                               sumes active participation from students. The interac-
have to actively scan all nine commits to identify                             tive parts include in-class exercises, in-class quizzes,
John’s solution. Stager solves this time-consuming                             and exercise sessions. Students need to bring their
process by combining all student commits into one                              laptops to the class and to exercise sessions. Stu-
single commit that includes all changes by John. This                          dents can earn bonus points for completing in-class
single commit is selected in the top of Figure 6. The                          and homework exercises successfully. They can use
reviewer can see every file that has been modified by                          these bonus points to improve their final exam grade.
the student and quickly identify, whether John has
completed the assignment correctly.                                              16 The German title is “Einführung in die Softwaretechnik”.


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        39
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


Figure 6: Student commits are combined into one discrete change set: the commit at the top highlighted in
blue. This commit displays the difference between a provided code template by the instructor and the submitted
solution by the student. All commits of the student John Doe are still available.


For instance, if they score more than 90 % of the to-                          different design patterns to make the game extensible
tal exercise points, their grade in the final exam is                          for new requirements.
improved by 1.0. This possibility motivates the stu-                              To submit their solutions, the students commit their
dents to participate in the in-class exercises and in                          changes to a version control system. This automati-
the homework exercises. In-class exercises consist                             cally triggers test cases on a continuous integration
of quizzes (similar to the quiz exercises described in                         server to verify the given solution. After the submis-
[Krusche et al., 2017c]), modeling and programming                             sion of their solution, students can automatically see
exercises. Homework exercises include modeling, text                           the test results as individual feedback and improve
and programming exercises.                                                     their solution according to this feedback.

4.1     Programming Exercises                                                  1500
                                                                                              Participations in Homework Programming Exercises

Between 600 and 1200 students have actively partici-                           1000      1.070        1.104
pated in each programming exercise throughout the                                                                  824          794
semester which is shown in Figure 7 and Figure 8. In                            500                                                         637
each exercise, the students had to write new source
code or adjust existing code based on a given prob-                                0
lem statement. All students worked on the existing                                        H01          H02         H07          H08         H11
template code of an exercise in their individual git
repository. The exercises were based on a 2D rac-                              Figure 7: Number of students who submitted solutions
ing game called Bumpers. In the game, cars collide                             to homework programming exercises
with each other and each collision has a winner. The
course is designed so that each week’s exercises focus                           However, not all aspects of a problem statement
on a different part of Bumpers in accordance with the                          can be automatically tested. Either it is difficult to test
lecture’s content, e.g. in week 8, “Pattern-Based De-                          a certain aspect of a solution, for instance complex
velopment”, exercises include the implementation of                            behavior tests, or the problem statement provides a


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                         40
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


1500                                                                           submission metrics. The number of commits per stu-
               Participations in In-Class Programming Exercises
                                                                               dent varies from 1.81 to 5.91 on average. Stager’s
          1.213
1000                                                                           combine commits task will combine student commits
                        896          863                                       into one single commit so that reviewers can distin-
                                                  776           765
 500                                                                           guish the difference between the provided code tem-
                                                                               plate and code submitted by the student immediately.
   0                                                                           There are respectively 34, 8, and 7 late submissions
           L02          L07          L08          L10           L11            for the observed exercises. Stager will automatically
                                                                               filter commits that are contributed after the defined
Figure 8: Number of students who submitted solutions                           exercise deadline. There are between 118 and 183
to in-class programming exercises                                              students that submitted at least one commit where
                                                                               they only changed white spaces. While reviewing the
high degree of freedom which makes it difficult to                             student contributions, white space related changes
write test cases, e.g. open or visionary questions.                            are visually distracting to the reviewer (see Figure 5),
   The following three homework programming ex-                                since these changes are not relevant to the exercise.
ercises required manual assessment by the teaching                                In informal discussions, seven teaching assistants
assistants. The second and third exercises were graded                         reported that Stager reduced their reviewing effort
semi-automated.                                                                significantly. The workflow without Stager required
1. Collision Detection: The task was to implement                              the teaching assistants to first filter the repositories
a creative collision detection algorithm for cars in                           by student, then to check the commit dates and times,
Bumpers. The students were given executable tem-                               clone or download the code, and to fix potential white
plate code and had to extend it with a new class that                          space problems in order to be able to assess the actual
included their solution. This exercise required manual                         sumbission. Depending on the amount of exercise ses-
correction to test whether the new collision algorithm                         sions, teaching assistants had to perform this manual
performed as intended. Additionally, the most creative                         workflow for up to 50 student submissions. Further,
solutions were awarded and shown in class.                                     the repository names only include the student’s iden-
2. Serialization of Code: The students had to in-                              tifiers, not names, so that mix-ups could occur when
stantiate objects from two classes in Java. The main                           importing the solutions into an IDE.
task was to serialize and deserialize these object using                       4.3     Discussion
JSON. An automated assessment system was used to
test the input and output of the serialization. How-                           While using Stager, we identified four main advan-
ever, the students wrote their own serialization code,                         tages: (1) Combining commits is particularly helpful
so their solutions varied, e.g. in the naming of the ob-                       to review all changes of one student at a glance. This
jects or methods. This required the teaching assistants                        allows the reviewer to immediately identify whether
to assess the implementations manually.                                        the student has understood the problem statement
3. Adapter Pattern: Based on a code template, the                              and has implemented a proper solution. (2) Renam-
assignment was to extend the 2D car racing game                                ing the projects simplifies the assessment and compar-
Bumpers with legacy code using the adapter pattern.                            ison of multiple solutions. The reviewer can import
The legacy code for an existing analog speedome-                               multiple solutions at the same time with one click into
ter panel was provided separately. An automated                                an IDE. It increases the confidence of the reviewers, so
assessment system graded the students’ solution. In                            that the assessment is associated with the correct stu-
addition, the teaching assistants had to verify if the                         dent. (3) While most students follow the deadline of
speedometer panel was shown in the game user inter-                            an exercise, some students have committed changes
face and displayed the velocity correctly.                                     after the deadline. It would be possible to remove
                                                                               write permissions for all student git repositories at the
4.2     Results                                                                given deadline, but this might be hard to realize. En-
In order to determine how many manual steps during                             forcing the deadlines in Stager is easier and filters the
a homework assessment can be automated by Stager,                              cases where students try to circumvent the deadline.
we conducted a quantitative analysis for these three                           (4) Stager only depends on using git repositories for
programming exercises. For the qualitative analysis                            programming exercises and other instructors can use
we focused on:                                                                 it without adaptions in their courses, e.g. in GitHub
                                                                               Classroom or other git environments17 . As Stager is
1. Number of commits per student                                               open-source, other instructors can adapt it to their
2. Number of commits after the exercise deadline                               own needs.
3. Source code changes where only white spaces                                    While Stager is easy to use as a standalone tool,
   have been added or removed                                                  reviewers need to configure it for each exercise as
                                                                               described in Section 3.1. It would further simplify the
   Table 2 displays an overview of the number of par-
ticipating students for each exercise together with                              17 https://classroom.github.com


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        41
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


                                                 Metric 1. Collision Detection 2. Serialization of Code 3. Adapter Pattern
                               Total submission count                1104                        657                       794
                                   Total commit count                1998                        3880                      2447
             Average amount of commits per student                    1.81                       5.91                      3.08
                Total commits after exercise deadline                  34                          8                         7
                 Total submission count with at least
                                                                      125                        118                       183
                      one white space related change

        Table 2: Quantitative analysis of submission metrics for three programming exercises of the course


configuration if Stager would be integrated into the                           fixes typical white space problems. All commits of one
exercise management system, where the instructor                               student are combined into one discrete change-set
sets up the programming exercise. Then Stager would                            that is easier to review. Code reviewers can better dis-
automatically know the submission deadline, the lat-                           tinguish between the submissions of multiple students
est commit of the instructor in the code template, and                         and identify students’ contributions more quickly.
the remote repository URL. This would make the use                                Our experience in a course with 1600 students and
of Stager easier and seamlessly.                                               45 teaching assistants shows that Stager reduced the
                                                                               reviewing effort and time for teaching assistants. The
4.4     Limitations                                                            reviewers used the saved time to write better reviews
Our experience report only included three exercises                            and give more detailed feedback to the students. This
that used Stager for code reviews. It would be in-                             improved the student’s learning. A quantitative analy-
teresting to analyze the concrete time-savings with a                          sis in three programming exercises shows that Stager
comparison and to use Stager throughout the whole                              identifies several late submissions and fixes many
course. While we have first indications, we did not                            white space issues.
evaluate whether the quality of the reviews improved                              Stager is free, open source, and available under the
through the use of Stager.                                                     MIT license, so that other instructors can use it in their
   In addition, Stager’s implementation currently has                          courses18 . We will continue the development and aim
the following limitations: (1) Reviewers have to man-                          to integrate the tool into the automated assessment
ually search for each student repository’s key the first                       system ArTEMiS [Krusche and Seitz, 2018]. Our fu-
time they use Stager, before being able to use Stager                          ture work also includes the integration of code quality
for the remaining steps. The previously mentioned                              metrics to support the actual code assessment. This
integration of Stager into an exercise management                              could make it easier for reviewers to spot code quality
system would overcome this step. (2) For every exer-                           issues in the students’ solutions and be included, e.g.
cise, the config.json file has to be changed accordingly                       as a text file, into the feedback pipeline.
with the deadline, URL-schema, and commit of the                                  In addition, we would like to evaluate the quality
instructor. This could also be adapted to be automat-                          of the code reviews when using Stager compared to
ically included when creating exercises by means of                            pure manual reviews with respect to the completeness,
an exercise management system. (3) Reviewers have                              helpfulness, and understandability of the review. De-
to install Stager on their computer and start it via a                         pending on the results of this evaluation, we could in-
double-click or the command line interface. A web-                             tegrate strategies to semi-automatically propose com-
based solution or a plugin into an IDE (e.g. Eclipse)                          mon code review feedback. Automatic suggestions
in which the reviewers import the code would provide                           would further reduce the effort of reviewers but al-
a more user-friendly experience.                                               low them to tailor these suggestions to the concrete
                                                                               situation.
5     Conclusion
Manual code reviews are important for the learning                             References
experience of students. While automatic tests can find                         [Ala-Mutka 2005] A LA -M UTKA, Kirsti M.: A Survey
typical problems and check whether code works as                                 of Automated Assessment Approaches for Program-
intended, they cannot find all problems, code smells,                            ming Assignments. In: Computer Science Education
and implementation issues. Automatic assessment                                  15, pages 83–102, 2005.
imposes certain solutions on the students and might
limit their creativity. Stager supports code review-                           [Cerioli and Cinelli 2008] C ERIOLI, Maura ; C INELLI,
ers by automating steps in the manual assessment of                              Pierpaolo: GRASP: Grading and Rating ASsistant
programming exercises to reduce effort for the prepa-                            Professor. In: Proceedings of the Informatics Educa-
ration and the conduction of code reviews. Stager                                tion Europe III Conference, 2008.
downloads multiple students’ submissions, renames
folders and projects, filters out late submissions, and                          18 https://github.com/arubacao/stager


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        42
                                                                                   Stager: Simplifying the Manual Assessment of Programming Exercises
                                                            Christopher Laß, Stephan Krusche, Nadine von Frankenberg und Bernd Bruegge, TU München


[Chen 2004] C HEN, P. M.: An automated feedback                                  the 19th Australasian Computing Education Confer-
  system for computer organization projects. In: IEEE                            ence, pages 17–26, 2017.
  Transactions on Education 47, pages 232–240, 2004.                           [Krusche et al. 2017c] K RUSCHE, Stephan ; VON
                                                                                 F RANKENBERG, Nadine ; A FIFI, Sami: Experiences of
[Gerdes et al. 2017] G ERDES, Alex ; H EEREN, Bas-                               a Software Engineering Course based on Interactive
  tiaan ; J EURING, Johan ; B INSBERGEN, L. T. van:                              Learning. In: Tagungsband des 15. Workshops "Soft-
  Ask-Elle: an Adaptable Programming Tutor for                                   ware Engineering im Unterricht der Hochschulen",
  Haskell Giving Automated Feedback. In: Interna-                                pages 32–40, 2017.
  tional Journal of Artificial Intelligence in Education
  27, pages 65–100, 2017.                                                      [Lawrance et al. 2013] L AWRANCE, Joseph ; J UNG,
                                                                                 Seikyung ; W ISEMAN, Charles: Git on the Cloud
[Heckman and King 2018] H ECKMAN, Sarah ; K ING,
                                                                                 in the Classroom. In: Proceeding of the 44th ACM
  Jason: Developing Software Engineering Skills Us-
                                                                                 Technical Symposium on Computer Science Education,
  ing Real Tools for Automated Grading. In: Pro-
                                                                                 pages 639–644, 2013.
  ceedings of the 49th ACM Technical Symposium on
  Computer Science Education, pages 794–799, 2018.
                                                                               [McCracken et al. 2001] M C C RACKEN, Michael ;
[Insa and Silva 2015] I NSA, David ; S ILVA, Josep:                              A LMSTRUM, Vicki ; D IAZ, Danny ; G UZDIAL,
  Semi-Automatic Assessment of Unrestrained Java                                 Mark ; H AGAN, Dianne ; KOLIKANT, Yifat Ben-
  Code: A Library, a DSL, and a Workbench to Assess                              David ; L AXER, Cary ; T HOMAS, Lynda ; U TTING,
  Exams and Exercises. In: Proceedings of the Con-                               Ian ; W ILUSZ, Tadeusz: A Multi-national, Multi-
  ference on Innovation and Technology in Computer                               institutional Study of Assessment of Programming
  Science Education, pages 39–44, 2015.                                          Skills of First-year CS Students. In: Working Group
                                                                                 Reports on Innovation and Technology in Computer
[Jackson 2000] JACKSON, David: A semi-automated                                  Science Education, pages 125–180, 2001.
  approach to online assessment. In: SIGCSE Bulletin
  32, pages 164–167, 2000.                                                     [Pieterse 2013] P IETERSE, Vreda: Automated As-
                                                                                 sessment of Programming Assignments. In: Proceed-
[Jackson and Usher 1997] JACKSON, David ; U SHER,                                ings of the 3rd Computer Science Education Research
  Michelle: Grading Student Programs Using ASSYST.                               Conference, pages 45–56, 2013.
  In: Proceedings of the 28th Technical Symposium on
  Computer Science Education, pages 335–339, 1997.                             [Poženel et al. 2015] P OŽENEL, Marko ; F ÜRST,
                                                                                 Luka ; M AHNI Č, Viljan: Introduction of the auto-
[Knobelsdorf and Romeike 2008] K NOBELSDORF,
                                                                                 mated assessment of homework assignments in a
  Maria ; R OMEIKE, Ralf: Creativity As a Pathway
                                                                                 university-level programming course. In: 38th In-
  to Computer Science. In: Proceedings of the 13th
                                                                                 ternational Convention on Information and Commu-
  Annual Conference on Innovation and Technology in
                                                                                 nication Technology, Electronics and Microelectronics,
  Computer Science Education, pages 286–290, 2008.
                                                                                 pages 761–766, IEEE, 2015.
[Krusche et al. 2017a]        K RUSCHE, Stephan ;
  B RUEGGE, Bernd ; C AMILLERI, Irina ; K RINKIN, Kir-                         [Robins et al. 2003] R OBINS, Anthony ; R OUNTREE,
  ill ; S EITZ, Andreas ; W ÖBKER, Cecil: Chaordic                               Janet ; R OUNTREE, Nathan: Learning and teaching
  Learning: A Case Study. In: Proceedings of the 39th                            programming: A review and discussion. In: Com-
  International Conference on Software Engineering:                              puter Science Education 13, pages 137–172, 2003.
  Software Engineering Education and Training Track,
  pages 87–96, IEEE, 2017.                                                     [Staubitz et al. 2015] S TAUBITZ, Thomas ; K LE -
                                                                                 MENT , Hauke ; R ENZ, Jan ; T EUSNER , Ralf ; M EINEL,
[Krusche and Seitz 2018]       K RUSCHE, Stephan ;                               Christoph: Towards practical programming exer-
  S EITZ, Andreas: ArTEMiS: An Automatic Assess-                                 cises and automated assessment in Massive Open
  ment Management System for Interactive Learning.                               Online Courses. In: Teaching, Assessment, and Learn-
  In: Proceedings of the 49th ACM Technical Sympo-                               ing for Engineering, pages 23–30, IEEE, 2015.
  sium on Computer Science Education, pages 284–289,
  2018.                                                                        [Striewe and Goedicke 2013] S TRIEWE, Michael ;
                                                                                 G OEDICKE, Michael: Analyse von Programmier-
[Krusche et al. 2017b] K RUSCHE, Stephan ; S EITZ,                               aufgaben durch Softwareproduktmetriken. In:
  Andreas ; B ÖRSTLER, Jürgen ; B RUEGGE, Bernd: In-                             Tagungsband des 13. Workshops "Software Engineer-
  teractive Learning: Increasing Student Participation                           ing im Unterricht der Hochschulen", pages 59–68,
  through Shorter Exercise Cycles. In: Proceedings of                            2013.


V. Thurner, O. Radfelder, K. Vosseberg (Hrsg.): SEUH 2019                                                                                        43