=Paper=
{{Paper
|id=Vol-2531/paper01
|storemode=property
|title=Teaching Programming at Scale
|pdfUrl=https://ceur-ws.org/Vol-2531/paper01.pdf
|volume=Vol-2531
|authors=Angelika Kaplan,Jan Keim,Yves Schneider,Maximilian Walter,Dominik Werle,Anne Koziolek,Ralf Reussner
|dblpUrl=https://dblp.org/rec/conf/seuh/KaplanKSWWKR20
}}
==Teaching Programming at Scale==
<pdf width="1500px">https://ceur-ws.org/Vol-2531/paper01.pdf</pdf>
<pre>
                             Teaching Programming at Scale
 Angelika Kaplan, Jan Keim, Yves R. Schneider, Maximilian Walter, Dominik Werle, Anne Koziolek, Ralf Reussner
                                               Karlsruhe Institute of Technology (KIT)
                                                        Karlsruhe, Germany
             {angelika.kaplan, jan.keim, yves.schneider, maximilian.walter, dominik.werle, koziolek, reussner}@kit.edu


   Abstract—Teaching programming is a difficult task and there      to write 500 to 1,000 lines of code based on a complex and
are many different challenges teachers face. Beyond considera-      precise specification.
tions about the right choice of teaching content and presentation      Based on our learning objectives, we have three major
in lectures, scaling practical parts of courses and the exami-
nation and grading to course sizes of around 1,000 students         goals for the assessment of the students’ solutions. First, the
is particularly challenging. We believe programming is a skill      correctness of the program is important to us. Second, we
that needs to be trained practically, which creates additional      want the students to program in a good and clean object-
challenges, especially at this scale. In this paper, we outline     oriented manner. Counter-examples for that include god classes
learning goals for our undergraduate programming course and         (programs with all functionality and logic in one class, cf. [4,
the structure for the course we derived from these goals. We
report on the challenges we see when teaching programming           p. 136ff]), high coupling between classes as well as low
at scale and how we try to overcome them. For example, one          cohesion. Third, students need to submit self-made programs
central challenge is how to grade a high number of students         for assignments. No code written by another student or person
in a good, transparent, and efficient way. We report on our         is allowed in the submitted programs, including any kind of
approach that includes automated tests as well as tool support      code-copying from others or similar kinds of plagiarism.
for manual code review. Over the years, we experienced different
issues and learned valuable lessons. We present corresponding          For smaller course sizes, achieving these goals is challenging
key takeaways that we derived from our experiences.                 and already requires some effort. However, the number of
   Index Terms—Programming, Object Oriented Programming,            students joining our programming course increased by roughly
Software testing, Teaching, Computer aided analysis                 45% over the last five years. At present, we have about 1,000
                                                                    students attending the lectures and almost all of them participate
                        I. I NTRODUCTION                            in the practical exercises. Around 500 students take part in
                                                                    the exam. When scaling the challenges up to this number of
   Teaching programming is a challenging task. Different
                                                                    students, additional challenges arise, such as assessing and
programming concepts need to be explained in an appropriate
                                                                    grading in an efficient, good, transparent, and fair way. These
way for students to grasp them, their general application, and
                                                                    properties rise from the following factors: We have only limited
their concrete realization in a chosen programming language.
                                                                    time and personnel for grading, therefore we have to grade
In universities and schools, usually, the additional task of
                                                                    efficiently. However, we want to grade in a good way which
grading comes up. Grading is essential in our education
                                                                    means that students can learn and improve from their mistakes.
systems and is equally challenging. At the Karlsruhe Institute
                                                                    Therefore, the whole grading process should be clear and
of Technology (KIT), we teach object-oriented programming to
                                                                    transparent for students to understand. In addition, the grading
undergraduates in their first semester using Java with different
                                                                    should be fair in a way that submissions with similar quality get
learning objectives stating that students:
                                                                    similar grades. Moreover, we have to make sure that students
   • know the basic structures and details of the Java program- are submitting their own solutions without cheating.
      ming language, foremost control structures, simple data          In the following, we present our approach and the efforts to
      structures and how to handle objects,                         tackle the different challenges and share our experiences.
   • can implement non-trivial algorithms and can apply basic
      principles of programming and software engineering, and                           II. C OURSE S TRUCTURE
   • are able to create executable Java programs of medium             Our programming course consists of two parts: 1) a lecture
      size that withstand automated quality assurance including that teaches the theoretical knowledge for Java development
      automated tests and enforcement of code conventions.          and 2) a set of practical exercises. The practical part consists
   Our course follows the learning-by-doing principle for teach- of five exercise sheets and weekly tutorials, which are held by
ing programming which emphasizes the role of programming as student teaching assistants. The tutorials are held in smaller
skill that requires practical training. This view is also supported groups with about 25 students, teach the practical application
by research on programming education [1]–[3]. This is why we of concepts from the lecture, and present the solution to
employ practical programming assignments instead of a hand- the exercise sheets. Our student teaching assistants rate the
written exam or similar examination techniques. Therefore, submitted exercise sheets regarding functional properties and
one of our learning objectives is that students should be able coding style. The course participants have to submit their


S. Krusche, S. Wagner (Hrsg.): SEUH 2020                                                                                                                           2


                         Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
solutions using the Praktomat system, a submission system            easy to grade. However, a small part of this special exercise is
for programming assignments, digitally. The structure of our         always targeted to capture whether the student understood the
programming course is partly based on previous programming           basic principles and can apply them (Application, Analysis,
courses at KIT like the structure described in [5], [6]. As          and Synthesis according to Bloom’s taxonomy).
previously stated, we do not have group submissions but have            Second, to detect solutions shared between multiple partici-
a strong emphasis on individual submissions. At the end of the       pating students, we use the automatic plagiarism checker JPlag
semester, the course is concluded with two tasks that determine      [10], [11]. JPlag compares each solution against all solutions
the grade for the course. Each task can be solved with about         in the course using abstract syntax trees. However, before we
1,000 lines of code [5]. Students only qualify for the final tasks   finally mark a solution as plagiarism, we manually check them
if they scored over 50% of the points over all exercise sheets.      to filter out possible false positives. False positives often exist
The solutions for the final tasks are again digitally submitted      in the simpler exercises at the start of the semester, where the
via the Praktomat system, just like the exercise sheets during       number of possible solutions is limited. Despite the fact that we
the semester. However, grading of these tasks is not done by         announce the plagiarism checks publicly at the beginning of the
tutors.                                                              semester, we unfortunately detect multiple cases of plagiarism
                                                                     each semester. In the last semester (winter term 2018/2019),
A. Communication Infrastructure with Students                        almost 10% of the course participants were involved in such
   During the semester, we provide different information             cases.
sources for course participants. Besides the lecture and the
tutorials, we also have other forms of communication. First,                           III. G RADING S TUDENTS
we provide a wiki, where we document the rules for grading              We differentiate between functional requirements and coding
to provide transparency in this regard. In the wiki, students        style requirements for the grading of programming tasks.
can also find a beginner’s tutorial for Java. Second, we provide     Usually, the ratio of the points for functionality to style is
forums where students can ask questions about the lecture            2 to 3. The basis of the functional evaluation is the degree
and the exercises. To ensure fairness and equality, questions        to which a program corresponds to the functionality specified
regarding the content of the exercises are only answered in the      in the task description. This functional evaluation is carried
forum. Here, students have the options to either write under         out almost entirely by automated checking of test cases. The
a pseudonym or under their real name. The main idea is that          basis of the style evaluation is the degree to which a program
students answer each other’s questions (cf. Section IV). The         meets the principles of object-oriented design taught in the
teaching staff only answers questions that are not answered          lecture. This style evaluation is almost completely carried out
by students. In the last semester, the student forum had about       by manual code reviews.
300 threads with about 1,000 individual posts. Additionally,            The following section explains our grading process for
we provide a separate private forum for our student teaching         exercise sheets and final assignments. In general, this process
assistants where they can exchange questions and information         is identical for the exercise sheets and the final tasks.
about their tutorial. Besides the forum, we answered around             1) After a task is created, the correction scheme for
1,000 e-mails from students regarding organizational matters.              evaluating the coding style is created and test cases
                                                                           for testing functionality are developed.
B. How to Cope with Cheating
                                                                        2) After the submission deadline, further automated func-
   As all submission are only done digitally, including the final          tional and style tests can be performed on the submis-
exams, students might be able to buy or copy solutions from                sions.
other students. This is a well known problem in exercises that          3) Once these automated tests have been completed, the
are submitted digitally [7], [8] and we experienced this issue             manual correction is started. The source text files can
before. Therefore, it is necessary to cope with this kind of               be edited by the corrector, for example to comment on
cheating. We address this using two different methods:                     certain source code lines or to make suggestions for
   First, every student needs to pass a special exercise. This             improvements. In addition, the corrector can add general
exercise is similarly organized to a classical written exam.               comments to the submission.
Students come to the exercise, where we check their identity            4) The correction is published to the students after comple-
and give them a simple exercise that they need to answer                   tion via the Praktomat. The students can take a look at all
in writing. This exercise tests a minimal set of programming               test cases with their evaluation, all comments regarding
concepts like simple array operations or variable initialization.          the grading, and all changes to the source code.
This still cannot guarantee that no bought solutions are
submitted, but at least guarantees that every student has            A. Automated Functional Tests
understood basic programming principles. From our experience,           In almost all cases, functionality is checked automatically
everyone who has understood the basic principles passes this         through program output. All final tasks and most tutorial tasks
exercise and the ones who fail lack significant knowledge.           include a command line interaction. Student solutions are
We mostly ask questions in the domain of knowledge and               compiled with each submission and automatically run several
comprehension in regard to Bloom’s taxonomy [9] that are             of previously defined functional test cases. This automation


S. Krusche, S. Wagner (Hrsg.): SEUH 2020                                                                                              3
allows a much wider range of testing than would be possible all. In our experience, this is a widespread defect. In case of this
manually in realistic time. For the two final tasks, we had, on defect, a corrector would select the line in the editor and then
average, 50 different test cases per task.                           click the button. This automatically deduces a fixed number of
   Test cases are usually divided into two groups: public and points and produces an explanatory text for the student. The
private test cases. The public test cases are visible to students text contains the deduction, the reason for the deduction and the
during submission and are mandatory to pass a valid submission. line of the deduction. In case none of the predefined deductions
These test cases give the students feedback whether they have is applicable, a custom defect button exists. With the custom
correctly implemented the most important (basic) functionalities button, a corrector can type an individual comment and decide
before their final submission. Private test cases, on the other an individual deduction. It is also used for further explanation
hand, are only visible to students after the correction has been when the general description of a button is not enough. The
completed. These test cases automatically check additional deduction for each defect is individually configurable and
functionalities like edge cases and more elaborated functionality correctors can reuse individually created comments within the
and thus form the basis for grading.                                 scope of one solution similarly to the buttons. For the premade
   The functional test cases describe the expected command buttons, a threshold of minimum number of occurrences can
line output for a given input. After the Praktomat compiles be configured until points are deduced, e.g., 5 occurrences of
the submission, the Praktomat executes the submission and bad identifiers. Buttons can also directly deduce points for
compares the solution’s output to the expected output. The the very first occurrence, such as visibility, which means the
definition for individual test cases is done via a simple text file, wrong visibility modifier was used for a class. After the first
in which an input and the corresponding output are specified deduction for a certain type of defect, other occurrences do
line by line.                                                        not produce further deductions. Moreover, in each grading
                                                                     category there can be no negative points, i.e., the minimum is
B. Grading Coding Style                                              0 points. If all possible points are already deducted, further
   While we check some coding styles automatically with mistakes are only listed but do not change the score. Only
Checkstyle [12], such as indentation or mandatory comments, when markings are deleted, e.g., when correcting the grading,
some properties are manually reviewed. Before grading, we previously disregarded deductions are checked and applied
create grading guidelines that contain information about the automatically.
required style and how to apply the grading guidelines. Because
of the high number of solutions and the scope of the exercises,                           IV. K EY TAKEAWAYS
the grading cannot be done by one person but a group. The               In this section, we review our practice and experience of our
grading normally takes around 670 person-hours.                      programming course with regard to (a) teaching concepts and
   The UI for grading in the Praktomat is illustrated in Figure 1. strategies in lecture and practical training, (b) grading, and (c)
On the right side, the correctors can see the solution of the communication.
student as well as the results of the automated tests. They can         As we stated before, one of our ongoing and overall goal is
access the detailed results by clicking on each test. To attest to optimize the lecture in terms of teaching quality and effort
the solution, they can switch to the Attest solution tab.            reduction with (semi-)automation in case of a big course size
   This process originally caused some trouble: This process in the aforementioned categories (a)-(c). The key takeaways in
was cumbersome because correctors had to remember the each category is a statement from our point of view. Statements
line for the deduction (the reduction of points) and all marked with (*) are confirmed by students via, e.g., the course
grading guidelines. Additionally, the quality and especially the evaluations we conduct each semester.
traceability for deductions varied between different correctors.           a) Teaching Concepts and Strategies: As we believe in
Moreover, it was sometimes hard to deduce the reason for the the learning-by-doing principle, we use teaching strategies
deduction because of lacking reasoning or incomprehensibility. aligned to that principle. To organize the course structure
Therefore, we developed the Praktomat Enhancement Tool regarding teaching content, we provide a semester schedule
Suite (PETS). This tool is a web overlay written in JavaScript that lists the lecture units and an advance organizer (cf. [13]).
for the existing Praktomat interface. The left frame of Figure 1 Each lecture unit is outlined with the corresponding learning
shows its parts containing additional information and features. objectives, formulated according to Bloom’s taxonomy [9]. We
First, it shows the automatically calculated points (final grade) also use student-activating teaching methods during the lecture
of the student, here 20 points. Afterwards, it shows a list of to support efficient teaching and learning. For this, we have
the structure of the current solution, where the red icon marks experimented with online response tools, but found replies in
the class with the main method of the solution. Below the class by hand signs more interactive and less distracting for
structure, it shows the automatically calculated functionality the students. Tutorials, which take place every week in small
points, here 13 points. Then, it shows for each style category size attending groups, support on one side repeating the lecture
(OO-Modeling, Comprehensibility, Style) the current points content on the other side prepare for the practical tasks in
and a list of buttons. Each button represents a typical defect of advance. To achieve a more effective learning experience, the
students’ solutions. For instance, the empty JavaDoc button is learning objectives should also be made clear in the practical
used in case an empty JavaDoc exists or no JavaDoc exists at task sheets.


S. Krusche, S. Wagner (Hrsg.): SEUH 2020                                                                                           4
                                 Figure 1: Attestation view with the PETS extension on the left side


Key Takeaways. Formulating learning objectives and the curric- based on a given specification. For grading, we differentiate
ula brings many benefits: It limits and clearly determines the between functional requirements and coding style requirements.
teaching content, thus also improves planning. Additionally, it Coding style is a solid part in the programming lecture and is
creates a shared understanding about the expectations between also compactly described in our Ilias-Wiki (cf. Communication
lecturer and students and serves as criteria for external and self- below). While grading referred to functional requirements
assessment (*). Efficient teaching and learning during lecture is done automatically, we introduced PETS as enhancement
can be achieve by student-activating teaching methods (*). tool for grading coding style. PETS allows us to build a
Advance organizers are suitable for novice as well as more ex- coding style catalog by defining categories for grading purposes
perienced programming students (*). Practical tasks encourage with template comments based on an informal coding style
the learning-by-doing principle. In particular, implementing description. Additionally, graders have the opportunity to write
games is popular among students (*).                                so called custom comment for exceptional cases when a coding
Open Issues. One of the open issues is the selection of the main style convention is not considered yet.
teaching paradigm to use, i.e., objects first or algorithms first. Key Takeaways. A transparent and systematic grading scheme
Regarding the perspective of a novice programming student, helps in reducing (negative) feedback from student as well
we cannot decide which paradigm is more suitable. For the as overhead concerning inspections. (Semi)-automatic grading
up-coming winter term, we plan to establish the objects first techniques have – beside the reduction of effort (e.g., avoid
paradigm. At the beginning of the semester, we plan to use repeating the same kind of feedback) – the potential to ensure
a visualization of objects and their behaviors by using an fair grading in mass programming lectures.
animation environment for demonstration purposes (cf. [14]). Overall Conclusion. The alignment between teaching and
     b) Grading: During the semester, we have different examination is an ongoing optimization process. Besides effort
phases of grading, but in this part we will focus on the grading reduction with (semi-)automatic grading, teaching quality needs
of the final exam. At the end of the programming lecture, continuously improvement by adapting new findings in terms
students should be able to write 500 to 1,000 lines of code of didactic concepts.


S. Krusche, S. Wagner (Hrsg.): SEUH 2020                                                                                      5
      c) Communication: Regarding the novice programming            also increases the transparency as these metrics can be stated
course size, organizing the communication referred to organi-       clearly and are well-defined and explained. Besides increasing
zational information and teaching content as well as teaching       the efficiency, an ongoing process is the improvement of the
material is a major task. We use Ilias [15] an open source          lecture and its paradigms, e.g., the learning goals and teaching
software for managing this issue. Our Ilias course for the          objects first.
programming lecture is organized as follows: a wiki for defining
                                                                                              R EFERENCES
a programming style catalog, a download area to provide
materials for lecture, tutorials, practical and final exam, and      [1]    M. Piteira and C. Costa, “Learning computer program-
forums for important announcements and as a platform for                    ming: Study of difficulties in learning programming”, in
questions of students. Concerning the discussion forums (one                Proceedings ISDOC ’13, New York, NY, USA: ACM,
for the lecture, one for the practical tasks), we provide a set             2013, pp. 75–80.
of strict rules and guidelines for interaction. For example, we      [2]    E. Lahtinen, K. Ala-Mutka, and H.-M. Järvinen, “A
use naming conventions for the titles of threads for every post             study of the difficulties of novice programmers”, in
concerning practical tasks. In our experience, this reduces the             Proceedings ITiCSE ’05, ser. ITiCSE ’05, Caparica,
number of redundant questions and answers.                                  Portugal: ACM, 2005, pp. 14–18.
Key Takeaways. Learning management systems (LMS) as a                [3]    K. Ala-Mutka and H.-M. Järvinen, “Assessment process
central platform for all participant (i.e., teachers including              for programming assignments”, in IEEE Conference on
student teaching assistants and students) are an effective way              Advanced Learning Technologies, 2004, pp. 181–185.
to keep the communication process manageable and transparent.        [4]    R. C. Martin, Clean code: a handbook of agile software
Using open source LMS like Ilias also helps managing students               craftsmanship. Pearson Education, 2009.
and their learning activities as well as organizing a virtual        [5]    J. Breitner, M. Hecker, and G. Snelting, “Der Grader
learning environment (*). The E-learning course should be                   Praktomat”, Automatisierte Bewertung in der Program-
designed in a way that all requirements are met in the best                 mierausbildung, 2017.
possible way for teachers and students. In our case, regarding       [6]    J. Krinke, M. Störzer, and A. Zeller, “Web-basierte
the novice programming course with many users, we consider a                Programmierpraktika mit Praktomat”, Softwaretechnik-
transparent communication and knowledge provision as major                  Trends, vol. 22, no. 3, pp. 51–53, 2002.
key element. Therefore, we use Ilias-features like file exchange,    [7]    J. Sheard, M. Dick, S. Markham, I. Macdonald, and
internal mail for important announcements to users, a discussion            M. Walsh, “Cheating and plagiarism: Perceptions and
and announcement forum, and wiki functions.                                 practices of first year it students”, in Proceedings ITiCSE
Open Issues. We highly welcome the participation of students               ’02, Aarhus, Denmark: ACM, 2002, pp. 183–187.
in our discussion forums at Ilias by answering questions from        [8]    S. Cerimagic and M. R. Hasan, “Online exam vigilantes
other students. We believe this also increases collaborative                at australian universities: Student academic fraudulence
learning by building virtual learning groups. However, it is                and the role of universities to counteract”, Universal
still an open issue how to support knowledge exchange among                 Journal of Educational Research, pp. 929–936, 2019.
students and to encourage them even further to participate in        [9]    B. S. Bloom, “Taxonomy of educational objectives: The
answering discussion forums posts. In future, we will focus                 classification of educational goals”, Cognitive domain,
on this part in more detail and we plan to experiment with                  1956.
gamification mechanisms.                                            [10]    L. Prechelt, G. Malpohl, M. Philippsen, et al., “Finding
                                                                            plagiarisms among a set of programs with jplag”, J.
                         V. C ONCLUSION                                     UCS, vol. 8, no. 11, p. 1016, 2002.
   Teaching programming in lectures with around 1,000 atten-        [11]    Jplag 2.12.1, Nov. 4, 2019. [Online]. Available: https:
dants brings up some tough challenges, especially for grading.              //github.com/jplag/jplag (visited on 11/04/2019).
In this paper, we showed different challenges that we saw for       [12]    Checkstyle 8.18. [Online]. Available: https://checkstyle.
our programming course. We then reported how we organize                    sourceforge.io/ (visited on 11/04/2019).
the course and perform grading at a scale of around 1,000           [13]    D. P. Ausubel, “In defense of advance organizers: A
students. Automated and semi-automated techniques enable fair               reply to the critics”, Review of Educational research,
grading in mass programming lectures in terms of objectivity                vol. 48, no. 2, pp. 251–257, 1978.
and consistency. Derived from our experiences, we listed key        [14]    D. Boles, Programmieren spielend gelernt mit dem Java-
takeaways. Next steps for us are primarily about increasing our             Hamster-Modell. Springer, 1999, vol. 2.
efficiency by further automating the grading process, especially    [15]    Ilias: The open source learning management system.
regarding coding style. Here, we envision an extension that                 [Online]. Available: https://www.ilias.de/ (visited on
automatically calculates part of the grading based on the metrics           11/04/2019).
provided by tools, such as SonarQube [16] or Checkstyle. To         [16]    SonarQube. [Online]. Available: https://www.sonarqube.
ensure correct grading, the grading is verified by a human                  org/ (visited on 11/04/2019).
corrector who decides on the final grade. We expect that this
process not only reduces the overall human workload, but


S. Krusche, S. Wagner (Hrsg.): SEUH 2020                                                                                             6

</pre>