=Paper=
{{Paper
|id=Vol-2308/isee2019paper05
|storemode=property
|title=Code Process Metrics in University Programming Education
|pdfUrl=https://ceur-ws.org/Vol-2308/isee2019paper05.pdf
|volume=Vol-2308
|authors=Linus W. Dietz,Robin Lichtenthäler,Adam Tornhill,Simon Harrer
|dblpUrl=https://dblp.org/rec/conf/se/DietzLTH19
}}
==Code Process Metrics in University Programming Education==
Code Process Metrics in University Programming Education Linus W. Dietz∗ , Robin Lichtenthäler† , Adam Tornhill‡ , and Simon Harrer§ ∗ Department of Informatics, Technical University of Munich, Germany, linus.dietz@tum.de † Distributed Systems Group, University of Bamberg, Germany, robin.lichtenthaeler@uni-bamberg.de ‡ Empear, Sweden, adam.tornhill@empear.com § innoQ Deutschland GmbH, Germany, simon.harrer@innoq.com Abstract—Code process metrics have been widely analyzed covering XML serialization, testing and GUIs, and ‘Introduc- within large scale projects in the software industry. Since they tion to Parallel and Distributed Programming’ (PKS) covering reveal much about how programmers collaborate on tasks, they systems communicating through shared memory and message could also provide insights in the programming and software engineering education at universities. Thus, we investigate two passing on the Java Virtual Machine. Students typically take courses taught at the University of Bamberg, Germany to gain AJP in their third semester and PKS in their fifth. insights into the success factors of student groups. However, a correlation analysis of eight metrics with the students’ scores TABLE I revealed only weak correlations. In a detailed analysis, we OVERVIEW OF THE A SSIGNMENTS examine the trends in the data per assignment and interpret this using our knowledge of code process metrics and the courses. Course # Technologies We conclude that the analyzed programming projects were not suitable for code process metrics to manifest themselves because AJP 1 IO and Exceptions 2 XML mapping with JAXB and a CLI-based UI of their scope and students’ focus on the implementation of 3 JUnit Tests and JavaDoc documentation functionality rather than following good software engineering 4 JavaFX GUI with MVC practices. Nevertheless, we can give practical advice on the interpretation of code process metrics of student projects and PKS 1 Mutexes, Semaphores, BlockingQueue suggest analyzing projects of larger scope. 2 Executor, ForkJoin, and Java Streams 3 Client/server with TCP 4 Actor model with Akka I. I NTRODUCTION When teaching programming or practical software engineer- ing courses, lecturers often give students advice on how to The courses follow a similar didactic concept that has manage their group work to be successful. Such advice could constantly been evolved since 2011 [2]. During the semester, be to start early so students don’t miss the deadline, or to the students submit four two-week assignments (see Table I) split up the tasks so everybody learns something. Intuitively, solved by groups of three. These assignments require the such practices deem appropriate, but do they actually lead application of the previously introduced programming concepts to more successful group work? To answer this, objective and technologies from the lectures to solve realistic problems, metrics are needed as evidence. Code process metrics capture such as implementing a reference manager or an issue tracker. the development progress [1], as opposed to looking at the For each assignment, the groups get a project template with outcome of using static code analysis [2]. Since they have a few predefined interfaces. We provide a Git repository for been successfully used in the software industry [3], they each group to work with and to submit their solutions. Since might be useful in programming education to give advice on undergraduates in their third term are usually not proficient how to organize the development process of student projects. with version control systems, we also hold a Git tutorial at As a first step towards applying code process metrics in the beginning of the course, covering how to commit, push, programming education, we want to assess their explanatory merge, resolve conflicts, and come up with good commit power considering students’ success. We mine and analyze messages. More advanced topics like working with feature Git repositories of two programming courses to answer our branches or structured commit messages are not in scope of research question: “How meaningful are code process metrics this introduction. for assessing the quality of student programming assignments?” We grade each assignment in form of a detailed textual code By this, we hope to gain insights and provide recommendations review and a score between 0 and 20 points. The main part to lecturers teaching such courses. of that score accounts for functional correctness, which we check with the help of unit tests. However, we also evaluate II. M ETHOD the code quality, determined by a thorough code review. To The subject of analysis are two practical programming avoid bias from one lecturer, we established a peer-review by courses for undergraduate computer science students at the the other lecturer. By this, the score should be an objective University of Bamberg: ‘Advanced Java Programming’ (AJP) indicator for the quality of the solution. Over the years, we ISEE 2019: 2nd Workshop on Innovative Software Engineering Education @ SE19, Stuttgart, Germany 23 have built a knowledge base of typical code quality issues, recently culminating into the book Java by Comparison [4], which we use to refer to issues in the textual code review. A. Data Set The data analyzed in this paper are the Git repositories of one iteration of AJP (24 groups) and PKS (14 groups) in the academic year of 2016. This results in a total of 152 submissions. All groups submitted four assignments and no group scored less than 10 points in any assignment. An assignment solution consists of all the commits related to the assignment. Each commit includes its message, the changes made, a time stamp, and the author. Each submission had at most three authors, however, this number was sometimes Fig. 1. Distribution of points per courses and assignments reduced to two, in case a student dropped out of the course. Because of their limited experience with Git, the students imported project skeletons for each assignment and there were worked solely on the master branch. Furthermore, since dependencies between the assignments. For example, in PKS the focus of the courses was not on software engineering the very same task had to be solved using different technologies skills, the students could freely choose how to collaborate and the students were encouraged to copy their old solution to on the assignments and we enforced no policy regarding to the current assignment for comparing the performances. collaboration or the commit messages. III. R ESULTS B. Processing and Metrics Before investigating the code process metrics, we display the Before mining the raw data for metrics, we performed a distribution of achieved points per course and assignment in data cleaning step. We observed that students used different Figure 1. In AJP, the median score of 17 to 18 is quite high and machines with varying Git configurations for their work. This homogeneous over the course, however, there is some variability resulted in multiple email identifiers for a student. Therefore, with a standard deviation of 2.21. In the more advanced PKS we inspected the repositories and added .mailmap1 files to course, the average score had a rising tendency with a median consolidate the different identifiers. Then, we did the data value of only 15 in the first assignment until a median of mining with proprietary APIs of CodeScene2 , a tool for 19 in the fourth. We assume that this is due to students predictive analyses and visualizations to prioritize technical having little prior knowledge in concurrency programming. debt in large-scale code bases. The tool processes the individual Additionally, the first assignment deals with low-level threading Git repositories together with the information about the separate mechanisms, which require a profound understanding. The assignments. We customized the analysis to calculate metrics students gradually improve their performance over the course per assignment solution and selected the following metrics: by gaining experience and because the later assignments deal with more convenient concurrency programming constructs. • Number of commits. The total number of commits The standard deviation of points is 1.88. related to the specific assignment. • Mean author commits. The mean number of commits A. Correlating Code Process Metrics with Points per author. To analyze the aforementioned code process metrics for • Mean commit message length. The mean number of correlations with the achieved points, we calculated the pairwise characters in commit messages, excluding merge commits. Pearson Correlation Coefficient (PCC) between all features over • Number of merge commits. The number of merges. all solutions irrespective of the course. Surprisingly, we did not • Number of bug fixes. The number of commits with encounter any notable relationship between any of our metrics ‘bugfix’ or ‘fix’ in the commit message. and points, as can be seen in the ‘Overall’ column of Table II. • Number of refactorings. The number of commits with ‘refactor’ or ‘improve’ in the commit message. TABLE II • Author fragmentation. A metric describing how frag- P EARSON C ORRELATION C OEFFICIENT BETWEEN POINTS AND FEATURES mented the work on single files is across authors [5]. • Days with commits. The number of days with at least Feature Overall AJP PKS one commit in the assignment period. Mean Author Fragmentation 0.01 0.09 -0.25 Mean Commit Message Length 0.20 0.35 -0.06 These metrics cover the most relevant aspects of the process. Mean Author Commits 0.05 0.04 -0.09 Unfortunately, we could not consider the size, i.e., the number Number of Commits 0.04 0.05 -0.16 of additions and deletions of the commits, because the students Number of Merge Commits 0.08 0.10 -0.12 Number of Bug Fixes 0.04 0.09 -0.04 1 https://www.git-scm.com/docs/git-check-mailmap Days With Commits 0.03 0.07 -0.11 2 https://empear.com/ Number of Refactorings 0.07 0.07 0.08 ISEE 2019: 2nd Workshop on Innovative Software Engineering Education @ SE19, Stuttgart, Germany 24 Fig. 2. The score relative to the number of commits Fig. 3. The score relative to the author fragmentation When looking at the two courses separately, however, in Working on the same classes is not advisable. Another AJP, one can solely see a moderate positive correlation of 0.35 metric we analyzed was the author fragmentation. It measures between the commit message length and points. Interestingly, if the Java classes were written by a sole author (zero this effect cannot be seen in PKS. There, we mainly see weak fragmentation) or collaboratively. In AJP there was again barely negative correlations, which is also surprising, as it seems that a correlation of 0.09, whereas in PKS there was a weak negative in contrast to AJP, more work does not lead to more points. PCC value of −0.25. This is somewhat in line with findings More effort does not necessarily mean more points. in literature, where a lower fragmentation indicates a higher While it is generally hard to directly quantify the effort using quality of the software [5]. When we have a closer look at the our metrics, the combination of the number of commits and assignments in Figure 3, however, there is again a mixed signal the days with commits are the best available proxy. Figure 2 of PKS Assignment 1 and 4 having a negative dependency, shows the number of commits per course and assignment. The whereas Assignment 2 and 3 are relatively stable. lines drawn on top of the data points are a linear regression Finally, we refrain from analyzing the number of bug fixes model that serves as a visual aid for the detailed trends in and refactorings because they were rarely used in the commits. the data. Interestingly, there is no consistent trend observable over the assignments or the course. AJP Assignment 1 has a B. Discussion negative trend, indicating that those groups that managed to We found no notable correlations between the analyzed code solve the assignment with fewer commits got higher scores in process metrics and the quality of the assignments measured that assignment. We attribute this to the prior knowledge of the via manual grading. This stands in contrast to the literature on students at the start of the course. In the next two assignments software quality and code process metrics in industry [6]. So of AJP, the trend is positive, whereas the number of commits what makes the assignments different from real world projects? in the last assignment did not have an impact on the grading. First of all, the timeframe differs. The assignments in our In PKS, we see positive trends between both the number of courses all lasted for two weeks, whereas projects in industry commits and days with commits with the points in the first span multiple months and even years. Furthermore, students three assignments, while the last assignment shows a flat trend. focus solely on developing software, giving little thought on Our interpretation of this matter is the following: Assignments how to run and maintain their software. What is more, the that require much code to be written by the students benefit students had a good feeling when their assignment met the from more commits, while it is the other way for assignments functional requirements and stopped working when they were where the framework guides the development. Recall that in satisfied with their solution. Thus, equating the assignment Assignment 4 of AJP, the task is to write a GUI using JavaFX, scores with the notion of high-quality software is most probably and Assignment 4 of PKS is about using akka.io. not permissible in our courses. Distributing the work over a longer time span does On the other hand, it might be that code process metrics not increase the points. Generally, we assumed that starting simply require a certain effort by more developers to be put into earlier and constantly working on the assignments, therefore, the code, which is not done by the small student groups during accumulating more days with a commit would increase the the short assignment period. In industry projects, maintenance score. However, this is not the case. In AJP, the PCC between work, i.e., bug fixing and refactoring, accounts for a large this feature and points was 0.07, whereas in PKS it was even portion of the commits and overall quality of the software. By a weak negative value of −0.11. We assume this has to do looking at the commit messages of the student assignments, we with the limited temporal scope of only two weeks of working see that such efforts were rare. Also, communication problems on the assignments. The more experienced groups might have become more of an issue in larger groups. finished the assignment in a shorter time period and stopped Finally, the student groups were quite new to the management working when they thought that their solution was sufficient. of group programming tasks, especially in the third semester ISEE 2019: 2nd Workshop on Innovative Software Engineering Education @ SE19, Stuttgart, Germany 25 course AJP. Since they could organize the development on their limited timeframes code process metrics should not be used own, there were myriads of different strategies. We believe for the assessment of assignment solutions, since they are bad that this lack of organizational requirement is a key point in predictors for the score. Furthermore, when giving students why we don’t see clear patterns in the code process metrics. guidance about how to work on programming assignments, we can give suggestions such as to start early, prefer more small IV. R ELATED W ORK commits over fewer large commits, clearly separate tasks, but Our approach is a contribution to learning analytics for they do not necessarily result in a better score. which Greller et al. name two basic goals: prediction and We see time as a critical factor for the significance of reflection [7]. The commit data we analyzed has a coarse code process metrics. Future work could therefore analyze granularity compared to other work on programming education development efforts with varying time frames to investigate reviewed by Ihantola et al. [8], where the level of analysis is our argument. Our paper is a first attempt at utilizing code typically finer, for example key strokes. Our initial hope was process metrics in programming education impacted by the that code process metrics could have some predictive power characteristics of the courses we considered. This means there is for student courses. This, however, was not the case despite still potential in this topic and more research including different several studies related to the quality and evolution of software contexts, especially larger student projects, is desirable. in industry [9]. Nagappan et al. found that the structure of R EFERENCES the development organization is a stronger predictor of defects than code metrics from static analysis [6], and Mulder et al. [1] M. D’Ambros, H. Gall, M. Lanza, and M. Pinzger, Analysing Software Repositories to Understand Software Evolution. Berlin, Heidelberg: identified several cross-cutting concerns of doing software Springer, 2008, pp. 37–67. repository mining [10]. This paper is, thus, a parallel approach [2] L. W. Dietz, J. Manner, S. Harrer, and J. Lenhard, “Teaching clean code,” to static code analysis [2] or extensive test suites [11] for the in Proceedings of the 1st Workshop on Innovative Software Engineering Education, Ulm, Germany, Mar. 2018. evaluation of student assignments. [3] A. Tornhill, Software Design X-Rays. Pragmatic Bookshelf, 2018. The metrics used stem from the work of Greiler et al. [12], [4] S. Harrer, J. Lenhard, and L. Dietz, Java by Comparison: Become a D’Ambros et al. [1], and Tornhill [3], [13]. As an example, in Java Craftsman in 70 Examples. Pragmatic Bookshelf, Mar. 2018. [5] M. D’Ambros, M. Lanza, and H. Gall, “Fractal figures: Visualizing industry, the author fragmentation [5] is negatively correlated development effort for cvs entities,” in 3rd IEEE International Workshop with the code quality. This is supported by Greiler et al. [12], on Visualizing Software for Understanding and Analysis. IEEE, Sep. who find that the number of defects increases with the number 2005, pp. 1–6. [6] N. Nagappan, B. Murphy, and V. Basili, “The influence of organizational of minor contributors in a module and Tufano et al. [14], structure on software quality: An empirical case study,” in Proceedings who find that the risk of a defect increases with the number of of the 30th International Conference on Software Engineering, ser. ICSE developers who have worked on that part of the code. However, ’08. New York, NY, USA: ACM, 2008, pp. 521–530. [7] W. Greller and H. Drachsler, “Translating learning into numbers: A one can also go further and look at the commit metadata to generic framework for learning analytics,” Journal of Educational capture the design degradation, as Oliva et al. did [15]. Our Technology & Society, vol. 15, no. 3, pp. 42–57, 2012. approach therefore combines learning analytics with insights [8] P. Ihantola, K. Rivers, M. Á. Rubio, J. Sheard, B. Skupas, J. Spacco, C. Szabo, D. Toll, A. Vihavainen, A. Ahadi, M. Butler, J. Börstler, S. H. from industry. Since in realistic projects a developer rarely Edwards, E. Isohanni, A. Korhonen, and A. Petersen, “Educational data programs alone, we found that the focus of our analysis should mining and learning analytics in programming,” in Proceedings of the also be groups. This naturally limits us in drawing conclusions 2015 ITiCSE on Working Group Reports. New York, NY, USA: ACM, 2015, pp. 41–63. about the learning process of an individual student. [9] M. D. Penta, “Empirical studies on software evolution: Should we (try to) claim causation?” in Proceedings of the Joint ERCIM Workshop on V. C ONCLUSIONS Software Evolution and International Workshop on Principles of Software Evolution. New York, NY, USA: ACM, 2010, pp. 2–2. While static code analysis has often been investigated in [10] F. Mulder and A. Zaidman, “Identifying cross-cutting concerns using educational settings, code process metrics from Git commits software repository mining,” in Proceedings of the Joint ERCIM Workshop with a focus on groups represent a novel direction. We present on Software Evolution and International Workshop on Principles of Software Evolution. New York, NY, USA: ACM, 2010, pp. 23–32. an approach for analyzing code process metrics based on [11] V. Pieterse, “Automated assessment of programming assignments,” in Git commits from student assignments. However, from the Proceedings of the 3rd Computer Science Education Research Conference interpretation of our results, we cannot identify any metric on Computer Science Education Research, ser. CSERC ’13. Heerlen, The Netherlands: Open Universiteit, 2013, pp. 45–56. that has a significant correlation with the assignment scores [12] M. Greiler, K. Herzig, and J. Czerwonka, “Code ownership and software achieved by the students. Does this mean that code process quality: A replication study,” in IEEE/ACM 12th Working Conference metrics are not useful for teaching programming? From our on Mining Software Repositories. IEEE, May 2015, pp. 2–12. [13] A. Tornhill, Your Code As a Crime Scene. Pragmatic Bookshelf, 2016. experience, it is quite the contrary: We assume that the two [14] M. Tufano, G. Bavota, D. Poshyvanyk, M. D. Penta, R. Oliveto, and A. D. courses were not a realistic setting for profiting of good coding Lucia, “An empirical study on developer-related factors characterizing fix- practices. To be good software engineers in industry, students inducing commits,” Journal of Software: Evolution and Process, vol. 29, no. 1, Jun. 2016. should learn how to write maintainable code, even if their code [15] G. A. Oliva, I. Steinmacher, I. Wiese, and M. A. Gerosa, “What can will be trashed after the semester. To establish good practices, commit metadata tell us about design degradation?” in Proceedings of code process metrics should play a larger role in practical the 2013 International Workshop on Principles of Software Evolution, ser. IWPSE 2013. New York, NY, USA: ACM, 2013, pp. 18–27. software engineering courses, and could even be part of the grading. In any case, in pure programming courses with very ISEE 2019: 2nd Workshop on Innovative Software Engineering Education @ SE19, Stuttgart, Germany 26