Teaching Clean Code Linus W. Dietz Johannes Manner and Simon Harrer Jörg Lenhard Department of Informatics Distributed Systems Group Department of Mathematics Technical University of Munich, Germany University of Bamberg, Germany and Computer Science linus.dietz@tum.de firstname.lastname@uni-bamberg.de Karlstad University, Sweden joerg.lenhard@kau.se Abstract—Learning programming is hard – teaching it well emerging teaching method. However, the automated assessment is even more challenging. At university, the focus is often of programming assignments is challenging [3] and most often on functional correctness and neglects the topic of clean and only focuses on functional correctness instead of code quality. A maintainable code, despite the dire need for developers with this skill set within the software industry. We present a feedback- study among 227 IT professionals finds that the most important driven teaching concept for college students in their second to skill is the “ability to read, understand and modify programs third year that we have applied and refined successfully over written by others” [4]. Since low code quality has a direct a period of more than six years and for which received the effect on maintainability, we argue that clean code should be faculty’s teaching award. Evaluating the learning process within an integral part of the programming education. Consequently, a semester of student submissions (n=18) with static code analysis tools shows satisfying progress. Identifying the correction of the motivating questions we seek to answer in this paper are: the in-semester programming assignments as the bottleneck for • How to teach clean code at university? scaling the number of students in the course, we propose using a • How to teach clean code at scale? knowledge base of code examples to decrease the time to feedback and increase feedback quality. From our experience in assessing We present a didactic concept that describes how to effectively student code, we have compiled such a knowledge base with the teach programming to undergraduates with an emphasis on typical issues of Java learners’ code in the format of before/afterwriting and reviewing code. Identifying the correction of comparisons. By simply referencing the problem to the student, assignments, (i.e., providing high-quality feedback) as the the quality of feedback can be improved, since such comparisons bottleneck to scale the participants in the course, we use a let the student understand the problem and the rationale behind the solution. Further speed-up is achieved by using a curated list knowledge base [5] and propose automated static code analysis of static code analysis checks to help the corrector in identifyingto speed up assessment while preserving the quality of feedback. violations in the code swiftly. We see this work as a foundational The interplay between these two components is promising, as step towards online courses with hundreds of students learning it enables teachers with little prior experience to give students how to write clean code. the necessary information to improve their coding style. I. I NTRODUCTION The driving idea behind this paper is to assess what academia can learn from industry to achieve high code quality in teaching. Many computer science graduates lack programming pro- Therefore, we briefly discuss the concept of code quality and ficiency when starting their first software engineering job. how it is achieved in industry in the next section. On this Programming at university is traditionally taught in lectures basis, we present and evaluate our didactic concept based on followed by practical exercises that usually revolve around the- writing and reviewing code in Section III. In Section IV, we oretical concepts like algorithms, data structures or specialized present the knowledge base and a tool for automated static aspects of programming paradigms (i.e., object-orientation or code analysis for code quality in an educational context. We functional programming). Such courses challenge the students conclude our findings and point out future work in Section V. to develop software to prove that they can apply the theoretical concepts. These prototypes, however, are usually trashed II. F OUNDATIONS after the end of the course. Consequently, students have few From a pragmatic point of view, code quality is strongly incentives to self-educate themselves in writing code that not linked to understandability: How easy is it for other developers only works, but is also of high quality. to understand a piece of code and how well can it be extended Given the high demand for computer scientists with college and reused in other contexts? The concept is referred to in degrees in industry, many students choose this challenging several books, most prominently in Clean Code [6], Code Com- course of studies. From 2009 onwards, the faculty for Informa- plete [7], Effective Java [8], The Pragmatic Programmer [9] tion Systems and Applied Computer Sciences at the University and Refactoring [10]. From a business perspective it can also of Bamberg grew from 340 students to 1200 in 2017. It is a be seen as a function of the maintenance costs, which typically huge challenge for lecturers to keep up the quality of teaching, amounts to 40–80% of the total project costs [11]. since they inevitably will have less time for providing individual To lower such costs, the software industry has introduced feedback to the students. This is problematic, as teaching many ways to improve the coding process, most of them fitting programming well is time-consuming [1]. Therefore, e-learning, under the hyped term ‘agile’ [12]. For instance, Li et al. report especially Massive Open Online Courses (MOOCs) [2] are an increased software code quality of a team using Scrum in a ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 24 longitudinal study [13]. Code reviews and feedback play a vital presented to the plenum of students, who review and improve role in agile methods. In pair programming [14] the review is upon the solution with help and input from the lecturers. These done simultaneously with a partner. Mob programming [15] discussions on improving the code step-by-step are central to even extends this to a group of people. Besides pairing, compa- the learning outcome, as this is where the students observe nies also use ‘pull requests’ with continuous integration (CI), the refactoring process, learn about the requirements in the where the proposed patch for the upstream is automatically programing assignments, and experience the transformation of tested and needs to be signed off by another developer. In a first working solution to clean code. addition, companies often use static code analysis in their CI 2) Assignment: During the semester, students work on pipeline to ensure that the code adheres to a predefined style. multiple assignments in groups. The assignments require A number of static code analysis tools are freely available, the application of the previously introduced and practiced e.g., Checkstyle1 , PMD2 or SpotBugs3 . They operate on source programming concepts from the lectures, but put them in a or bytecode level and automatically detect common code smells larger scope to address more realistic and complex problems and careless mistakes that are not easy to spot. To use them (e.g., programming a reference manager or an issue tracker). efficiently, they need to be fine-tuned by an expert. A more The assignments have to be submitted within a given timeframe user-friendly solution are online code quality services like using Git and are graded by the lecturers. The students receive SonarQube4 or Codacy5 . They build upon the mentioned static a detailed textual code review of their solution with a focus code analysis tools, however, they require in-depth integration on code quality, and refactoring opportunities, since they into the build pipeline. usually get most functional aspects right. Besides the individual feedback, the lecturers also publish a collection of common III. T EACHING C LEAN C ODE AT U NIVERSITY issues found in the assignments to the course. These common The availability of a mentor who supports the learner is issues serve as a knowledge base for the course and have a huge benefit. We argue that high-quality code in industry resulted in a text book, Java by Comparison [5]. is created when several people work together and review 3) Examination: Finally, during examination, both the each other’s code, either simultaneously in pair programming theoretical concepts of the course itself and clean code skills sessions or when assessing each others code in pull requests. are evaluated. The students are examined individually in an oral From this insight, the following didactic concept has a strong examination. During this examination, a few initial theoretical emphasis on reviewing code in various ways. questions are used to check if the students understood the general concepts. Then, the students are asked to explain how A. Didactic Concept they solved specific aspects in their own assignments’ code. This is to evaluate whether they can make the transfer from the theoretical concepts to practical knowledge and also to Programming Exercise Interactive Lecture assess whether they have contributed sufficiently to the group’s Concept Code Review submissions. Finally, the students are asked to review a small snippet of unknown code regarding bugs and code smells. Programming Lecturer Assignment Assignment B. Application Code Review Since 2011, this concept has been used in two practical Theoretical Walk-Through programming courses with 3 ECTS for undergraduate computer Concept Code Review Oral Exam science students at the University of Bamberg: ‘Advanced Java Programming’ (AJP) covering XML serialization, testing, and MVC-based GUIs, and ‘Introduction to Parallel and Distributed Fig. 1. Didactic Concept Based on Code Reviews Programming’ covering systems communicating through shared memory and message passing on the JVM. A sketch of the didactic concept is depicted in Fig. 1 and it These courses are taught in bi-weekly, four-hour lab sessions. covers three parts: lecture, assignments, and the examination. The students need to hand in four two-week assignments solved Note that at the end of each part, a different type of code by groups of three, and pass an individual 15-minute oral review is done. In the following, we detail each part. examination. In addition, we also provide an optional student 1) Lecture: Each course session is an alteration between help desk and forum support for any course-related questions. lecture time and practice. After having received a 10–30 minute Students have rated our courses on average 1.5 (on a Likert introduction to a programming concept, the students are asked scale from 1–very good to 5–very bad) within the last six years to solve a 20–40 minute exercise, in which they are to apply in the university’s standardized evaluation form. Furthermore, this concept. After finishing the task, at least one solution is the lecturers were nominated six times for the excellent teaching 1 Checkstyle, http://checkstyle.sourceforge.net award of the faculty and won it once. 2 PMD https://pmd.github.io 3 SpotBugs on Github: https://github.com/spotbugs/spotbugs C. Assignment Evaluation 4 https://www.sonarqube.org/ In the following, we present our findings from analyzing 5 https://www.codacy.com student code submissions using static code analysis. First, we ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 25 TABLE I TABLE II M OST F REQUENT V IOLATIONS C HANGE OF V IOLATIONS Name of Violation Frequency Name of Violation Normalized Change in % Frequency MagicNumberCheck 1382 AbbreviationAsWordInNameCheck 452 CyclomaticComplexityCheck 43 ! 0 100 Disappear ParameterAssignmentCheck 150 UnnecessaryFinalModifier 37 ! 0 100 PreserveStackTrace 144 ModifierOrderCheck 33 ! 0 100 UnnecessaryConstructor 143 EqualsAvoidNullCheck 23 ! 0 100 UselessParentheses 124 CollapsibleIfStatements 13 ! 0 100 VisibilityModifierCheck 120 AvoidFieldNameMatchingMethodName 128 ! 3 98 ConfusingTernary 107 Decrease NeedBracesCheck 67 ! 2 97 HideUtilityClassConstructorCheck 96 UselessParentheses 226 ! 9 96 SingularField 93 AvoidInstantiatingObjectsInLoops 37 ! 2 95 PrematureDeclaration 16 ! 1 94 UnnecessaryConstructor 67 ! 63 6 provide an overview of the most frequent violations and then LogicInversion 3!3 0 Stable MultipleVariableDeclarationsCheck 6!6 0 we analyze the relative change of code quality violations. The AvoidCatchingNPE 10 ! 10 0 analysis is based on a curated list of static code analysis checks MagicNumberCheck 580 ! 633 +9 described in Section IV-B. SingularField 27 ! 82 +203 Table I lists the ten most frequent code style violations. HiddenFieldCheck Increase 13 ! 50 +284 AbbreviationAsWordInNameCheck 43 ! 200 +365 The magic number check is an outlier, as students did not VariableDeclarationUsageDistanceCheck 13 ! 72 +453 encapsulate all numbers into final static fields in the VisibilityModifierCheck 6 ! 54 +800 beginning, and it was violated frequently in the last two InnerAssignmentCheck 0!8 +1 assignments (testing and GUI programming). The others ClassFanOutComplexityCheck 0 ! 10 +1 New are typical problems of unprofessional code: bad naming, SignatureDeclareThrowsException 0 ! 17 +1 UncommentedEmptyMethodBody 0 ! 19 +1 unnecessary elements, and bad habits such as re-assigning CompareObjectsWithEquals 0 ! 19 +1 parameters or not preserving the stack trace when re-throwing exceptions. Some, like the usage of the ternary ‘?’ operator, might be opinionated, but we argue that it is nevertheless a In summary, there is an improvement in the structure of the good didactic exercise to think about the use of such constructs. code in the last assignment: problems regarding the cyclomatic To obtain an impression of the overall learning process complexity diminish, and only few variables are prematurely between the beginning and the end of the course, we compare declared. Braces are almost always placed, and most unnec- the number of violations between the first and the last essary parentheses are removed. Furthermore, bad habits, like assignment. Since the assignments differ in size, we normalize checking references with equals() and instantiating objects the number of violations to make them comparable. There are in loops plummeted. several metrics for normalizing code sizes, e.g., the lines of On the increasing side, we note that there are still issues code, the number of methods/classes, etc. We use the number with naming and declaring variables. The number of fields of non-final method parameters per assignment, since lines of only used in one method tripled, the number of local variables code would give the fourth assignment with GUI programming shadowing a field went up by 284%, and variables were often too much weight. The non-final method parameters are better defined too far from their usage. We partially attribute these than just counting the number of methods, as this metric also issues to the topic of the last assignment, GUI programming, accounts for how complex a method is. Finally, the metric was where UI classes can be cluttered due to the UI framework. already computed using the MethodArgumentCouldBeFinal rule and we knew from the manual correction that our students did D. Validity Concerns not mark method parameters as final. These findings are meant to be understood as a tendency, Table II is subdivided into five categories that describe the rather than strong empirical evidence. The reason for this is that amount of change that happened to the violations. The first part the data is from one semester with only 18 groups. Also, we lists violations that disappeared entirely from the first to the last cannot attribute the effects to the teaching concept only, since assignment, followed by violations in which numbers decreased this is only a case study without a control group of submissions strongly. Then, the table lists violations in which numbers that have not participated in the course. Nevertheless, the groups remained stable, followed by increasing and new violations. did not know that their assignments would be analyzed with Due to space constraints, we just show five violations per static code analyses, so they did not program to conform with category. Overall, the result is satisfying: Out of a total of 107 a standard. rules, 18 were not violated at all. The number of violations of three rules was stable, and 44 decreased of which 21 were not IV. T EACHING C LEAN C ODE AT S CALE violated anymore. Of the 42 that increased, 32 did not occur in Giving valuable feedback in programming is time consuming. the first assignment, so they can be seen as rather specialized. Marking about two dozen assignments that students without ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 26 prior experience can solve within two weeks usually requires needs of the students. Naturally, the rules cannot cover all about a week of full-time work. We present two approaches issues in the code, since some problems are impossible to detect to reduce this time and improve the review quality. automatically. Nevertheless, the detection of many issues can be improved, without much effort on the side of the corrector. A. Knowledge Base During the assessment of assignments, we found that students V. C ONCLUSIONS make similar mistakes that result in the same feedback. With a This paper is motivated by the question of how univer- knowledge base of issue, one can simply provide links to the sities can learn from software companies to improve their issues, thereby saving time and relieving the corrector from programming education. We presented and evaluated a code repetitive actions. It also leads to shorter code reviews. We review-driven course concept for undergraduates that has been propose the following structure for issues in the knowledge executed and awarded at the University of Bamberg. Facing base: the challenges of scaling the concept to an increasing number Name A concise name capturing the solution to a code of students while keeping up the quality of code reviews, we quality problem as an action, for example “Avoid proposed using static code analysis combined with a book on Negations”. code quality that is suited for the target group of the learners. Code Two code snippets within the same context, one In the future, we aim to empirically evaluate the didactic containing the highlighted problem and the other concept along with the knowledge base and the tool at multiple one the solution. For example, a snippet containing universities through experiments. Furthermore, we plan to a negation named !done and a snippet with the integrate the knowledge base and tool into existing automatic refactored solution like inProgress. educational code assessment frameworks like ArTEMiS [16]. Text Two detailed explanations making an argument ex- ACKNOWLEDGMENTS plaining the problem and supporting the solution, e.g., The authors would like to thank Guido Wirtz for giving us the negations are harder to understand and the refactored freedom to develop these courses. Moreover, we thank our student solution reads much easier. assistants Christian Preißinger, Gabriel Nikol, Michael Träger, Henrik At first, we created a knowledge base for each assignment in Cech, and Tobias Jakubowitz who helped us lecture those courses. the form of a markdown document named “common issues”. R EFERENCES We noticed a time reduction in marking itself, more consistent [1] A. Vihavainen, M. Paksula, and M. Luukkainen, “Extreme apprenticeship and comparable markings, and, therefore, a reduction in the method in teaching programming for beginners,” in Proceedings of the number of questions regarding the markings and comments. 42nd ACM Technical Symposium on Computer Science Education. New York, NY, USA: ACM, 2011, pp. 93–98. Furthermore, the knowledge base created a common terminol- [2] F. G. Martin, “Will massive open online courses change how we teach?” ogy for the discussions in the courses. We have turned these Communications of the ACM, vol. 55, no. 8, pp. 26–28, Aug. 2012. “common issues” per assignment into a larger knowledge base in [3] T. Staubitz, H. Klement, J. Renz, R. Teusner, and C. Meinel, “Towards practical programming exercises and automated assessment in massive the form of a book [5], covering the most common “common open online courses,” in IEEE TALE, 2015, pp. 23–30. issues” with high-quality code and text. [4] J. Bailey and R. B. Mitchell, “Industry perceptions of the competencies needed by computer programmers: Technical, business, and soft skills,” B. Automated Didactic Code Review Computer Information Systems, vol. 47, no. 2, pp. 28–33, Jan. 2006. [5] S. Harrer, J. Lenhard, and L. Dietz, Java by Comparison: Become a With a universal knowledge base in place, the correction Java Craftsman in 70 Examples. Pragmatic Bookshelf, 2018. is essentially reduced to spotting the problematic parts in the [6] R. C. Martin, Clean Code. Prentice Hall, 2009. code and referring the students to the corresponding items. In [7] S. McConnell, Code Complete. Microsoft Press, 2004. [8] J. Bloch, Effective Java, 3rd ed. Addison Wesley, Nov. 2017. practice, this is not trivial, since the corrector must determine [9] A. Hunt and D. Thomas, The Pragmatic Programmer: From Journeyman the functional correctness before assessing code quality. For to Master. Boston, MA, USA: Addison Wesley, 1999. this we recommend using a large set of integration tests. [10] M. Fowler, Refactoring. Addison Wesley, 1999. [11] R. Glass, “Frequently forgotten fundamental facts about software For automating the code quality review, we have developed engineering,” IEEE Software, vol. 18, no. 3, pp. 112–111, May 2001. a static code analysis meta tool6 that can be integrated into [12] P. Abrahamsson, O. Salo, J. Ronkainen, and J. Warsta, “Agile software existing build setups using Gradle. Currently, it scans the code development methods: Review and analysis,” CoRR, vol. abs/1709.08439, 2017. for 107 code quality violations using PMD and Checkstyle and [13] J. Li, N. B. Moe, and T. Dybå, “Transition from a plan-driven process produces a .csv output that contains the violated rule, the to scrum: A longitudinal case study on software quality,” in Proceedings identifier of the submission, and the location of the violation of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ACM Press, 2010, pp. 1–10. (file, line, column). As mentioned before, static code analysis [14] J. T. Nosek, “The case for collaborative programming,” Communications tools need to be fine-tuned to be valuable. We picked these of the ACM, vol. 41, no. 3, pp. 105–108, mar 1998. 107 rules based on our knowledge from the course, making [15] A. Wilson, “Mob programming – what works, what doesn’t,” in Agile Processes in Software Engineering and Extreme Programming, them suitable for most learners, without enforcing a specific C. Lassenius, T. Dingsøyr, and M. Paasivaara, Eds. Cham: Springer, style of programming and producing too many false positives. 2015, pp. 319–325. Furthermore, the lecturer can adjust them to perfectly fit the [16] S. Krusche and A. Seitz, “ArTEMiS – an automatic assessment management system for interactive learning,” in SIGCSE. ACM, 2018. 6 https://github.com/LinusDietz/QualityReview ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 27