=Paper=
{{Paper
|id=Vol-2066/isee2018paper07
|storemode=property
|title=Providing Better Feedback for Students using Projekt Tomo
|pdfUrl=https://ceur-ws.org/Vol-2066/isee2018paper07.pdf
|volume=Vol-2066
|authors=Gregor Jerše,Matija Lokar
|dblpUrl=https://dblp.org/rec/conf/se/JerseL18
}}
==Providing Better Feedback for Students using Projekt Tomo==
Providing Better Feedback for Students Solving Programming Tasks Using Project Tomo Gregor Jerše Matija Lokar Faculty of Computer and Information Science Faculty of Mathematics and Physics University of Ljubljana University of Ljubljana Ljubljana, Slovenia Ljubljana, Slovenia Gregor.Jerse@fri.uni-lj.si Matija.Lokar@fmf.uni-lj.si Abstract—Systems for automatic assessment of programming Nicol states1 Assessment and feedback practices should be tasks have become a popular choice in programming courses as designed to enable students to become self-regulated learners, several advantages of using automatic assessment in teaching able to monitor and evaluate the quality and impact of their and learning process have been observed. One of the most important is the immediate feedback students get. However, the own work and that of others. quality of the feedback is essential to achieve good learning But providing timely and instant feedback in overcrowded results. At the University of Ljubljana we use our proprietary classrooms is a tough task. This calls for a tool that can system called Project Tomo as a teaching tool. One of the most automatically provide immediate feedback. In their paper important aspects of our system is the possibility to return a Keuninig et all. [6] studied what kind of feedback is provided detailed feedback and to analyze the student’s solution since every submission is stored on the server. Until now we have by systems for automated assessment of programming tasks collected more than 110,000 attempts along with their history. (SAAP), which techniques are used to generate the feedback, We are currently in the process of analyzing them. Currently we how adaptable the feedback is, and how these tools are are concentrating on how to use this data to further improve evaluated. the quality of the feedback given to the student. Some of the The paper is organized as follows. In Section II we describe observations and the preliminary results are presented in the paper. our SAAP solution called Project Tomo and its properties. In Section III we present the results of the analysis of students’ I. I NTRODUCTION submissions and the conclusion in Section IV. Teaching a beginner level programming course can be quite challenging. Since programming is a skill, it can be learnt II. P ROJECT T OMO best by solving as many programming tasks as possible. As After evaluating several SAAP (a systematic review can Lee and Ko write in [1], for most beginners, the experience be found in [9]–[11]), we came to the similar conclusion of writing computer programs is characterized by a distinct as Keuninig et al. in [6] that “Most SAAP tools only grade sense of failure. The first code beginners write, often leads to student solutions” and “tools do not often give feedback on unexpected behaviors, such as syntax errors, run-time errors, fixing problems and taking the next step, and that teachers or unintended program output. While all of these forms of cannot easily adapt tools to their own needs.”. Also Rubio- feedback are essential helping a beginner understand what pro- Sanchez et al. mention in [13] “despite acknowledging that grams are and how computers interpret them, the experience using Mooshak (SAAP tool) was a good idea, students did can be quite discouraging [2], [3] and emotional [4]. As several not appreciate the experience as a whole, where the main researchers have pointed out, for example in [5], feedback is reported drawback was related to its feedback.” Most of the an important factor in the learning process. disappointment with the feedback is connected to the fact that Teachers should provide the students with the feedback the majority of SAAP tools work as explained in [13]: given beyond one that normal tools (compilers, interpreters and run a set of predefined instances of some computational problem time environments) provide. But the feedback must be imme- consisting of input-output pairs, the tool compiles and runs diate, otherwise the students can get stuck which slows down source code in order to verify whether the program generates their progress considerably. Keuning, Jeuring and Heeren [6] the desired outputs given the initial inputs. write “Formative feedback, aimed at helping students to im- So we developed a new web service for automatic assess- prove their work, is an important factor in learning.“ Also ment called Project Tomo 2 [14]. One of the main design goals Campos et al. [7] following [8] conclude that good feedback was the flexibility a tool should provide in giving feedback is essential for improving the students progress. They can learn to the students. Contrary to many SAAPs intended mostly to more effectively if they receive quick and appropriate feedback support assessment, our goal was to develop a small, flexible about their actions in a short amount of time. In addition to its influence on the students’ achievement, feedback is also 1 on webpage https://www.reap.ac.uk/ a significant factor in providing motivating for learning. As 2 https://tomo.fmf.uni-lj.si ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 28 # ======================================================= solution, aiming at providing assistance for lab exercises where # Computing distances students are required to solve numerous programming tasks. # =========================================@000003======= # We aimed towards the methods and tools that may help in # Write a function dist(x1, y1, x2, y2) that returns the providing the necessary feedback to improve the support for # distance between points (x1, y1) and (x2, y2). # students during the learning of programming. So, our target # >>> dist(1, 2, 3, 4) was the formative feedback [5]. # 2.82842712475 # ======================================================= A. Basic Features def dist(x1, y1, x2, y2): The main design objectives in developing our SAAP service ’’’Returns distance between two points’’’ return ((x1 - x2) ** 2 + (y1 - y2) ** 2) ** 0.5 were: Check.part() • Local execution, Check.equal(’dist(1, 2, 3, 4)’, 2 ** 1.5) • Possibility to use any of the existing programming envi- Check.equal(’dist(0, 0, 3, 4)’, 5) ronments, • Being flexible enough to be functional with any pro- Fig. 1. Instructions, solution, and validation from the teachers file. gramming language (currently we support Python, R and Octave) and • Providing as much flexibility as possible in administering problems only prove as being such during the attempt to solve tests to achieve giving the appropriate feedback. them. This approach also ensures that the official solutions exist and work properly. The details for those decision are explained in [14] and [15]. There are two commands that are most often used in testing. The service is designed to require little or no additional work The most simple one tests the equality of the expected result from students and teachers, enabling them to focus on the with the result acquired by evaluating the given expression content. (see Fig. 1). The service works as follows: the students first download the files containing problem descriptions to their own computers. However, Tomo’s main strength is in the possibility The files are opened in their preferred programming environ- that the teacher composes a test that goes beyond a ment for the chosen programming language and the students direct comparison of the outputs. The tests have ac- start coding the solutions. Executing the files checks their cess to the source code of the submitted solution under solutions locally. If the server is available, the solutions are Check.current_part[’solution’]. So it is simple to automatically stored on the server. The server also optionally make tests that ensure that a solution did not use for or while checks for the validity of the solutions by comparing hashed loops (e.g. if the students are to write solutions in recursive output from students’ solution to the hashed output of the style) - see Fig. 2. official solution. This approach has several benefits: the service provides instant insight into the obtained knowledge to both student and teacher, all without disturbing the teaching process. There is also no need for powerful servers since all executions are done on the students’ computers. B. Testing Possibilities Fig. 2. While is required, but for is forbidden As much flexibility as possible in administering tests was another vital feature. For instance, one of the requirements Commands Check.out_file, Check.equal, and was that there should be a possibility to administer tests that Check.run return True or False. Therefore, they can be can check whether a specific method was (or was not) used in used to determine if additional tests should be run or not. For a student’s submission. For example, if the student’s ability to instance, if the first test fails, the submission is clearly not write recursive programs is to be tested, non-recursive methods valid and additional tests are not necessary. However, if the should not be accepted, even if they give expected results. goal is to provide the students with detailed information on After providing the instructions for the task the teachers which test data their programs fail, as many tests as desired enter the expected solution followed by the tests that it has can be run. to pass (see Fig. 1). The solution is separated from the tests There are other commands available to test the programs, for with the Check.part() command. It should be noted that example commands that work with files. Since the validation the officially provided solution needs to pass the same tests as is essentially a program written in a chosen programming the students’ solutions. At first this approach seems slightly language (Python for example) using the capabilities of the more demanding for the teachers compared to the traditional Check class, it can be made more advanced using all pro- approach where the teachers only provide instructions in text gramming constructs that language offers. Thus, there are form. However, this forces the teacher to test the quality of the numerous possibilities available to prepare the appropriate instructions. Namely, it often happens that poorly formulated feedback. ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 29 C. Feedback program. This can be used to notify the students that the The basic feedback is the report on the success of the solution has passed some test cases and that they are on a student’s attempt. This feedback is issued as soon as the good track. This is achieved using the Check.feedback students runs their solutions. Figure 3 shows that the solution construct which prints out the string given. See example in of the first part of the task is accepted, the second one failed Fig. 4. at (at least) one test and there has been no attempt to solve if Check.equal("""start(’miha’, ’meta’)""", 1): the third part of the task yet. Check.feedback("Bravo! Strings ’miha’ and" + " ’meta’ match in the first character") 1. part has a valid solution. 2. part does not have a valid solution. Fig. 4. Positive feedback - Expression vsote([3,5,1], 4) returns [1] instead of []. 3. part has no solution. A good way to learn is also to observe the official solution Fig. 3. Basic feedback (see Fig. 5). In Project Tomo the teacher can decide when the students see it for every task. Currently the possibilities We paid special attention to suitable wording of this basic for official solution visibility are “always”, “never” and “after feedback. One of the first versions of the systems declared they have submitted a valid submission”. “solution is correct”. But this is not in accordance with The first option is rarely used since it gives the students an the premise that passing all the tests is not yet the proof easy possibility to ’cheat’: if their solution is not accepted, of correctness. Therefore, we changed that into “solution is they look at the official one and use obtained information to accepted”. Several teachers reported this change had positive solver the task. The second one is used during exams, where influence on student’s awareness on what is the ’right solution’. the official solutions are made visible after the end of the exam As explained before, the basic test is done with and the third one is the default setting. Check.equal method. This method directly compares the output from the official solution and the one (in [11], [12] see the discussion on drawbacks when only this kind of test is provided). Tomo offers much more flexibility. For example, especially the tasks that require to output text, students often complain “but just one space is missing. Why is Tomo so picky!” Here all teacher’s pedagogical knowledge is to be exploited (as discussed in [5], [8]). The Project Tomo is just Fig. 5. Comparing with official solution a tool in the hands of a teacher, who should decide on the purpose of a certain task. In this example the teacher has (at III. A NALYZING THE DATA least) four different possibilities: Till now, we have developed quite an extensive library of 1) Leave the task as it is. The purpose of such a task programming tasks with high-quality feedback. Despite that is to get students to read the instructions, claims, and we are constantly adding new tests and feedbacks to the tasks. requirements carefully, and to keep to them consistently. Ideas for additional tasks arise from observing the mistakes SAAP here helps the teacher, because it is not necessary that students make while programming. Many of the mistakes to explain to each student that his solution is not good are missed by the teacher since it is impossible to observe each because of a single capital letter. and every student all of the time. Since Project Tomo stores 2) Change the test so that if the student writes “enter” the history of every submission, we can do that retroactively: instead of “Enter”, he receives the feedback that instructs checking the history of the students’ submissions we can them to look at the capitalization of the commands. analyze them, extract typical mistakes and use that knowledge 3) Change the test so that it does not matter which case is to improve the quality of the feedback even further. So our used. This makes sense in the case where the teacher’s workflow is as follows. focus lies elsewhere and the output is of secondary First the task is created by the teacher. The teacher tries to significance. predict typical mistakes the students will make and include 4) Change the test so that it does not matter what wording them into the test cases. The task is used in a teaching course the students use in their solutions. and a lot of submissions are acquired from the students. Then And of course—most importantly—the teacher’s task is to these submissions are analyzed and if need for additional test react to events that may occur during exercises. And that cases is seen, they are added. Then the updated task is used is precisely the basic reason why we are developing the in a teaching course and the entire process is repeated. So Project Tomo: to relieve the teachers of simple tasks and give the quality of the feedback (and test cases) is checked and them additional time to interact with the student during lab improved continuously. exercises. Currently we are just starting a thorough analysis of the In Project Tomo it is possible to give a student feedback submissions. The goal of the first step of the analysis is to not only when a given test fails but anytime during the test detect problematic tasks. Our assumption is that the exercises ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 30 where the average number of unsuccessful attempts students cessful submissions for the given task. This would save our made before the accepted one, is high, are prime candidates time trying to analyze the history of all attempts and allow us for being labeled as problematic. If we manage to improve to focus on the most common mistake patterns. the quality of the feedback for these tasks, the students would Also some aditional features in providing feedback are benefit the most. planed. We are currently looking into the possibility of adding We concentrated on the last year programming course, an additional option when the official solution can be seen: where all data is available. For each successful attempt we make it visible after a specified number of unsuccessful checked how many unsuccessful attempts had been made attempts. This would allow the students to see the official before the valid one. Using the above mentioned criteria we solution after they had made some real effort towards solving detected several tasks where the average number of unsuccess- the task but failed to provide a valid solution. However, we ful submissions was higher than 10. We analyzed the source have to ensure way of verifying that those attempts are “real”, code of those unsuccessful attempts. not merely faking some output in order to reach the required Since the number of attempts is quite large, we have only number of submissions. Here we will probably use some of managed to analyze some of the tasks so far. One task that the approaches suggested in literature, for example in [12]. was particularly interesting was a simple one, where students R EFERENCES had to print the amount of money on an bank account in a [1] M. J. Lee and A. J. Ko. 2011. Personifying programming tool grammatically correct way. The average number of unsuc- feedback improves novice programrs’ learning. In Proceedings of cessful submissions for this task was higher than 15. When the seventh international workshop on Computing education re- we analyzed the history of the 495 attempts made, we found search (ICER ’11). ACM, New York, NY, USA, 109-116. DOI: http://dx.doi.org/10.1145/2016911.2016934 out that the students had made two typical mistakes. The first [2] A. J. Ko, B. A. Myers, and H. Aung, 2004. Six Learning Barriers in one was that they were unaware of the grammar rules of their End-User Programming Systems. IEEE VL/HCC, 199-206. own mother tongue and the second one was that they were [3] A. J. Ko, and B. A. Myers, 2009. Attitudes and Self-Efficacy in Young Adults’ Computing Autobiographies. IEEE VL/HCC, 67-74. very careless with their output, which usually only slightly [4] P. Kinnunen, and B. Simon. 2010. Experiencing programming assign- deviated from the official one. Both of them combined caused ments in CS1: the emotional toll. ICER, 77-86. the students to submit a lot of attempts that only slightly [5] V. J. Shute. Focus on formative feedback. Review of Educational Research, 78(1):153–189, 2008. deviated in the output from the official solution, but were [6] H. Keuning, J. Jeuring, and B. Heeren. 2016. Towards a Systematic rejected by Project Tomo. It looks like students did not manage Review of Automated Feedback Generation for Programming Exercises. to see the difference in the outputs since they submitted lots of In Proceedings of the 2016 ACM Conference on Innovation and Tech- nology in Computer Science Education (ITiCSE ’16). ACM, New York, solutions with seemingly random changes to the source code NY, USA, 41-46. DOI: https://doi.org/10.1145/2899415.2899422 that did not really fix the problem. [7] D. S. Campos, A. J. Mendes, M. J. Marcelino, D. J. Ferreira and Using the above data we added two additional feedbacks L. M. Alves, ”A multinational case study on using diverse feedback types applied to introductory programming learning,” 2012 Frontiers to the task. The first one informs the student of the necessary in Education Conference Proceedings, Seattle, WA, 2012, pp. 1-6. doi: grammar rules in detail if the test detects that they are not 10.1109/FIE.2012.6462412 respected. This will hopefully reduce the number of incorrect [8] J. Hattie and H. Timperley, The Power of Feedback. Review of Educa- tional Research. Volume 77, No. 1, pp. 81-112, 2007. attempts since a student can fix all grammatical mistakes in [9] K. M. Ala-Mutka. A survey of automated assessment approaches for one step. The second one deals with the sloppy outputs. On one programming assignments. Computer Science Education, 15(2):83–102, hand it is good that students learn how to be accurate. On the 2005. [10] P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppala. Review of recent other hand it can be very frustrating if one has been working systems for automatic assessment of programming assignments. In Koli on a task for hours without visible progress, even more so for Calling, pages 86–93, 2010. beginners. So we decided to modify the comparison function [11] D. M. Souza, K. R. Felizardo and E. F. Barbosa, ”A Systematic Literature Review of Assessment Tools for Programming Assignments,” between the expected and the given output so that it will show 2016 IEEE 29th International Conference on Software Engineering more clearly where the outputs differ. We hope this will reduce Education and Training (CSEET), Dallas, TX, 2016, pp. 147–156. the number of unsuccessful attempts even further and teach the [12] B. Cheang, A. Kurnia, A. Lim, W. Oon, On automated grad- ing of programming assignments in an academic institution, In students how to be precise at the same time. Computers and Education, Volume 41, Issue 2, 2003, Pages 121- We plan to analyze more tasks in the similar manner and 131, ISSN 0360-1315, https://doi.org/10.1016/S0360-1315(03)00030-7. use the updated tasks during future programming courses. We (http://www.sciencedirect.com/science/article/pii/S0360131503000307) [13] M. Rubio-Sanchez, P. Kinnunen, C. Pareja-Flores, and Angel Velazquez- hope to notice a reduction of unsuccessful attempts. Iturbide. 2014. Student perception and usage of an automated program- ming assessment tool. Comput. Hum. Behav. 31 (February 2014), 453- IV. C ONCLUSIONS AND FUTURE WORK 460. DOI: http://dx.doi.org/10.1016/j.chb.2013.04.001 Our goal is to use our analysis results to improve the [14] M. Lokar, M. Pretnar. ”A Low Overhead Automated Service for Teaching Programming”, Proceedings of the 15th Koli Calling Con- feedback for the most problematic tasks and use the improved ference on Computing Education Research, Koli, Finland 2015, exercises in the class next year. http://doi.acm.org/10.1145/2828959.2828964 Currently most of the analysis is done manually, which is [15] G. Jerše, M. Lokar. Learning and teaching numerical methods with a system for automatic assessment. The international journal for technol- a very slow process. In the future we plan to use machine ogy in mathematics education, ISSN 1744-2710, 2017, vol. 24, no. 3, learning algorithms to extract common patterns from unsuc- 121–127 ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany 31