=Paper= {{Paper |id=Vol-2066/isee2018paper07 |storemode=property |title=Providing Better Feedback for Students using Projekt Tomo |pdfUrl=https://ceur-ws.org/Vol-2066/isee2018paper07.pdf |volume=Vol-2066 |authors=Gregor Jerše,Matija Lokar |dblpUrl=https://dblp.org/rec/conf/se/JerseL18 }} ==Providing Better Feedback for Students using Projekt Tomo== https://ceur-ws.org/Vol-2066/isee2018paper07.pdf
     Providing Better Feedback for Students Solving
        Programming Tasks Using Project Tomo
                                Gregor Jerše                                                   Matija Lokar
              Faculty of Computer and Information Science                       Faculty of Mathematics and Physics
                          University of Ljubljana                                      University of Ljubljana
                           Ljubljana, Slovenia                                          Ljubljana, Slovenia
                         Gregor.Jerse@fri.uni-lj.si                                  Matija.Lokar@fmf.uni-lj.si



   Abstract—Systems for automatic assessment of programming        Nicol states1 Assessment and feedback practices should be
tasks have become a popular choice in programming courses as       designed to enable students to become self-regulated learners,
several advantages of using automatic assessment in teaching       able to monitor and evaluate the quality and impact of their
and learning process have been observed. One of the most
important is the immediate feedback students get. However, the     own work and that of others.
quality of the feedback is essential to achieve good learning         But providing timely and instant feedback in overcrowded
results. At the University of Ljubljana we use our proprietary     classrooms is a tough task. This calls for a tool that can
system called Project Tomo as a teaching tool. One of the most     automatically provide immediate feedback. In their paper
important aspects of our system is the possibility to return a     Keuninig et all. [6] studied what kind of feedback is provided
detailed feedback and to analyze the student’s solution since
every submission is stored on the server. Until now we have        by systems for automated assessment of programming tasks
collected more than 110,000 attempts along with their history.     (SAAP), which techniques are used to generate the feedback,
We are currently in the process of analyzing them. Currently we    how adaptable the feedback is, and how these tools are
are concentrating on how to use this data to further improve       evaluated.
the quality of the feedback given to the student. Some of the
                                                                      The paper is organized as follows. In Section II we describe
observations and the preliminary results are presented in the
paper.                                                             our SAAP solution called Project Tomo and its properties. In
                                                                   Section III we present the results of the analysis of students’
                       I. I NTRODUCTION                            submissions and the conclusion in Section IV.
   Teaching a beginner level programming course can be quite
challenging. Since programming is a skill, it can be learnt                                    II. P ROJECT T OMO
best by solving as many programming tasks as possible. As             After evaluating several SAAP (a systematic review can
Lee and Ko write in [1], for most beginners, the experience        be found in [9]–[11]), we came to the similar conclusion
of writing computer programs is characterized by a distinct        as Keuninig et al. in [6] that “Most SAAP tools only grade
sense of failure. The first code beginners write, often leads to   student solutions” and “tools do not often give feedback on
unexpected behaviors, such as syntax errors, run-time errors,      fixing problems and taking the next step, and that teachers
or unintended program output. While all of these forms of          cannot easily adapt tools to their own needs.”. Also Rubio-
feedback are essential helping a beginner understand what pro-     Sanchez et al. mention in [13] “despite acknowledging that
grams are and how computers interpret them, the experience         using Mooshak (SAAP tool) was a good idea, students did
can be quite discouraging [2], [3] and emotional [4]. As several   not appreciate the experience as a whole, where the main
researchers have pointed out, for example in [5], feedback is      reported drawback was related to its feedback.” Most of the
an important factor in the learning process.                       disappointment with the feedback is connected to the fact that
   Teachers should provide the students with the feedback          the majority of SAAP tools work as explained in [13]: given
beyond one that normal tools (compilers, interpreters and run      a set of predefined instances of some computational problem
time environments) provide. But the feedback must be imme-         consisting of input-output pairs, the tool compiles and runs
diate, otherwise the students can get stuck which slows down       source code in order to verify whether the program generates
their progress considerably. Keuning, Jeuring and Heeren [6]       the desired outputs given the initial inputs.
write “Formative feedback, aimed at helping students to im-           So we developed a new web service for automatic assess-
prove their work, is an important factor in learning.“ Also        ment called Project Tomo 2 [14]. One of the main design goals
Campos et al. [7] following [8] conclude that good feedback        was the flexibility a tool should provide in giving feedback
is essential for improving the students progress. They can learn   to the students. Contrary to many SAAPs intended mostly to
more effectively if they receive quick and appropriate feedback    support assessment, our goal was to develop a small, flexible
about their actions in a short amount of time. In addition to
its influence on the students’ achievement, feedback is also         1 on webpage https://www.reap.ac.uk/

a significant factor in providing motivating for learning. As        2 https://tomo.fmf.uni-lj.si




ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany                                       28
                                                                    # =======================================================
solution, aiming at providing assistance for lab exercises where    # Computing distances
students are required to solve numerous programming tasks.          # =========================================@000003=======
                                                                    #
We aimed towards the methods and tools that may help in             # Write a function dist(x1, y1, x2, y2) that returns the
providing the necessary feedback to improve the support for         # distance between points (x1, y1) and (x2, y2).
                                                                    #
students during the learning of programming. So, our target         # >>> dist(1, 2, 3, 4)
was the formative feedback [5].                                     # 2.82842712475
                                                                    # =======================================================
A. Basic Features
                                                                    def dist(x1, y1, x2, y2):
   The main design objectives in developing our SAAP service          ’’’Returns distance between two points’’’
                                                                      return ((x1 - x2) ** 2 + (y1 - y2) ** 2) ** 0.5
were:
                                                                    Check.part()
   • Local execution,
                                                                    Check.equal(’dist(1, 2, 3, 4)’, 2 ** 1.5)
   • Possibility to use any of the existing programming envi-       Check.equal(’dist(0, 0, 3, 4)’, 5)
      ronments,
   • Being flexible enough to be functional with any pro-
                                                                       Fig. 1. Instructions, solution, and validation from the teachers file.
      gramming language (currently we support Python, R and
      Octave) and
   • Providing as much flexibility as possible in administering
                                                                    problems only prove as being such during the attempt to solve
      tests to achieve giving the appropriate feedback.             them. This approach also ensures that the official solutions
                                                                    exist and work properly.
The details for those decision are explained in [14] and [15].
                                                                       There are two commands that are most often used in testing.
The service is designed to require little or no additional work
                                                                    The most simple one tests the equality of the expected result
from students and teachers, enabling them to focus on the
                                                                    with the result acquired by evaluating the given expression
content.
                                                                    (see Fig. 1).
   The service works as follows: the students first download the
files containing problem descriptions to their own computers.          However, Tomo’s main strength is in the possibility
The files are opened in their preferred programming environ-        that the teacher composes a test that goes beyond a
ment for the chosen programming language and the students           direct comparison of the outputs. The tests have ac-
start coding the solutions. Executing the files checks their        cess to the source code of the submitted solution under
solutions locally. If the server is available, the solutions are    Check.current_part[’solution’]. So it is simple to
automatically stored on the server. The server also optionally      make tests that ensure that a solution did not use for or while
checks for the validity of the solutions by comparing hashed        loops (e.g. if the students are to write solutions in recursive
output from students’ solution to the hashed output of the          style) - see Fig. 2.
official solution.
   This approach has several benefits: the service provides
instant insight into the obtained knowledge to both student
and teacher, all without disturbing the teaching process. There
is also no need for powerful servers since all executions are
done on the students’ computers.
B. Testing Possibilities                                                         Fig. 2. While is required, but for is forbidden
   As much flexibility as possible in administering tests was
another vital feature. For instance, one of the requirements           Commands Check.out_file, Check.equal, and
was that there should be a possibility to administer tests that     Check.run return True or False. Therefore, they can be
can check whether a specific method was (or was not) used in        used to determine if additional tests should be run or not. For
a student’s submission. For example, if the student’s ability to    instance, if the first test fails, the submission is clearly not
write recursive programs is to be tested, non-recursive methods     valid and additional tests are not necessary. However, if the
should not be accepted, even if they give expected results.         goal is to provide the students with detailed information on
   After providing the instructions for the task the teachers       which test data their programs fail, as many tests as desired
enter the expected solution followed by the tests that it has       can be run.
to pass (see Fig. 1). The solution is separated from the tests         There are other commands available to test the programs, for
with the Check.part() command. It should be noted that              example commands that work with files. Since the validation
the officially provided solution needs to pass the same tests as    is essentially a program written in a chosen programming
the students’ solutions. At first this approach seems slightly      language (Python for example) using the capabilities of the
more demanding for the teachers compared to the traditional         Check class, it can be made more advanced using all pro-
approach where the teachers only provide instructions in text       gramming constructs that language offers. Thus, there are
form. However, this forces the teacher to test the quality of the   numerous possibilities available to prepare the appropriate
instructions. Namely, it often happens that poorly formulated       feedback.




ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany                                                       29
C. Feedback                                                          program. This can be used to notify the students that the
   The basic feedback is the report on the success of the            solution has passed some test cases and that they are on a
student’s attempt. This feedback is issued as soon as the            good track. This is achieved using the Check.feedback
students runs their solutions. Figure 3 shows that the solution      construct which prints out the string given. See example in
of the first part of the task is accepted, the second one failed     Fig. 4.
at (at least) one test and there has been no attempt to solve        if Check.equal("""start(’miha’, ’meta’)""", 1):
the third part of the task yet.                                        Check.feedback("Bravo! Strings ’miha’ and" +
                                                                                " ’meta’ match in the first character")
1. part has a valid solution.
2. part does not have a valid solution.                                                     Fig. 4. Positive feedback
  - Expression vsote([3,5,1], 4) returns [1] instead of [].
3. part has no solution.
                                                                        A good way to learn is also to observe the official solution
                      Fig. 3. Basic feedback                         (see Fig. 5). In Project Tomo the teacher can decide when
                                                                     the students see it for every task. Currently the possibilities
   We paid special attention to suitable wording of this basic       for official solution visibility are “always”, “never” and “after
feedback. One of the first versions of the systems declared          they have submitted a valid submission”.
“solution is correct”. But this is not in accordance with               The first option is rarely used since it gives the students an
the premise that passing all the tests is not yet the proof          easy possibility to ’cheat’: if their solution is not accepted,
of correctness. Therefore, we changed that into “solution is         they look at the official one and use obtained information to
accepted”. Several teachers reported this change had positive        solver the task. The second one is used during exams, where
influence on student’s awareness on what is the ’right solution’.    the official solutions are made visible after the end of the exam
   As explained before, the basic test is done with                  and the third one is the default setting.
Check.equal method. This method directly compares the
output from the official solution and the one (in [11], [12] see
the discussion on drawbacks when only this kind of test is
provided). Tomo offers much more flexibility. For example,
especially the tasks that require to output text, students often
complain “but just one space is missing. Why is Tomo so
picky!” Here all teacher’s pedagogical knowledge is to be
exploited (as discussed in [5], [8]). The Project Tomo is just                      Fig. 5. Comparing with official solution
a tool in the hands of a teacher, who should decide on the
purpose of a certain task. In this example the teacher has (at                        III. A NALYZING THE DATA
least) four different possibilities:
                                                                        Till now, we have developed quite an extensive library of
   1) Leave the task as it is. The purpose of such a task            programming tasks with high-quality feedback. Despite that
       is to get students to read the instructions, claims, and      we are constantly adding new tests and feedbacks to the tasks.
       requirements carefully, and to keep to them consistently.     Ideas for additional tasks arise from observing the mistakes
       SAAP here helps the teacher, because it is not necessary      that students make while programming. Many of the mistakes
       to explain to each student that his solution is not good      are missed by the teacher since it is impossible to observe each
       because of a single capital letter.                           and every student all of the time. Since Project Tomo stores
   2) Change the test so that if the student writes “enter”          the history of every submission, we can do that retroactively:
       instead of “Enter”, he receives the feedback that instructs   checking the history of the students’ submissions we can
       them to look at the capitalization of the commands.           analyze them, extract typical mistakes and use that knowledge
   3) Change the test so that it does not matter which case is       to improve the quality of the feedback even further. So our
       used. This makes sense in the case where the teacher’s        workflow is as follows.
       focus lies elsewhere and the output is of secondary              First the task is created by the teacher. The teacher tries to
       significance.                                                 predict typical mistakes the students will make and include
   4) Change the test so that it does not matter what wording        them into the test cases. The task is used in a teaching course
       the students use in their solutions.                          and a lot of submissions are acquired from the students. Then
And of course—most importantly—the teacher’s task is to              these submissions are analyzed and if need for additional test
react to events that may occur during exercises. And that            cases is seen, they are added. Then the updated task is used
is precisely the basic reason why we are developing the              in a teaching course and the entire process is repeated. So
Project Tomo: to relieve the teachers of simple tasks and give       the quality of the feedback (and test cases) is checked and
them additional time to interact with the student during lab         improved continuously.
exercises.                                                              Currently we are just starting a thorough analysis of the
   In Project Tomo it is possible to give a student feedback         submissions. The goal of the first step of the analysis is to
not only when a given test fails but anytime during the test         detect problematic tasks. Our assumption is that the exercises




ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany                                          30
where the average number of unsuccessful attempts students          cessful submissions for the given task. This would save our
made before the accepted one, is high, are prime candidates         time trying to analyze the history of all attempts and allow us
for being labeled as problematic. If we manage to improve           to focus on the most common mistake patterns.
the quality of the feedback for these tasks, the students would        Also some aditional features in providing feedback are
benefit the most.                                                   planed. We are currently looking into the possibility of adding
   We concentrated on the last year programming course,             an additional option when the official solution can be seen:
where all data is available. For each successful attempt we         make it visible after a specified number of unsuccessful
checked how many unsuccessful attempts had been made                attempts. This would allow the students to see the official
before the valid one. Using the above mentioned criteria we         solution after they had made some real effort towards solving
detected several tasks where the average number of unsuccess-       the task but failed to provide a valid solution. However, we
ful submissions was higher than 10. We analyzed the source          have to ensure way of verifying that those attempts are “real”,
code of those unsuccessful attempts.                                not merely faking some output in order to reach the required
   Since the number of attempts is quite large, we have only        number of submissions. Here we will probably use some of
managed to analyze some of the tasks so far. One task that          the approaches suggested in literature, for example in [12].
was particularly interesting was a simple one, where students                                    R EFERENCES
had to print the amount of money on an bank account in a
                                                                     [1] M. J. Lee and A. J. Ko. 2011. Personifying programming tool
grammatically correct way. The average number of unsuc-                  feedback improves novice programrs’ learning. In Proceedings of
cessful submissions for this task was higher than 15. When               the seventh international workshop on Computing education re-
we analyzed the history of the 495 attempts made, we found               search (ICER ’11). ACM, New York, NY, USA, 109-116. DOI:
                                                                         http://dx.doi.org/10.1145/2016911.2016934
out that the students had made two typical mistakes. The first       [2] A. J. Ko, B. A. Myers, and H. Aung, 2004. Six Learning Barriers in
one was that they were unaware of the grammar rules of their             End-User Programming Systems. IEEE VL/HCC, 199-206.
own mother tongue and the second one was that they were              [3] A. J. Ko, and B. A. Myers, 2009. Attitudes and Self-Efficacy in Young
                                                                         Adults’ Computing Autobiographies. IEEE VL/HCC, 67-74.
very careless with their output, which usually only slightly         [4] P. Kinnunen, and B. Simon. 2010. Experiencing programming assign-
deviated from the official one. Both of them combined caused             ments in CS1: the emotional toll. ICER, 77-86.
the students to submit a lot of attempts that only slightly          [5] V. J. Shute. Focus on formative feedback. Review of Educational
                                                                         Research, 78(1):153–189, 2008.
deviated in the output from the official solution, but were          [6] H. Keuning, J. Jeuring, and B. Heeren. 2016. Towards a Systematic
rejected by Project Tomo. It looks like students did not manage          Review of Automated Feedback Generation for Programming Exercises.
to see the difference in the outputs since they submitted lots of        In Proceedings of the 2016 ACM Conference on Innovation and Tech-
                                                                         nology in Computer Science Education (ITiCSE ’16). ACM, New York,
solutions with seemingly random changes to the source code               NY, USA, 41-46. DOI: https://doi.org/10.1145/2899415.2899422
that did not really fix the problem.                                 [7] D. S. Campos, A. J. Mendes, M. J. Marcelino, D. J. Ferreira and
   Using the above data we added two additional feedbacks                L. M. Alves, ”A multinational case study on using diverse feedback
                                                                         types applied to introductory programming learning,” 2012 Frontiers
to the task. The first one informs the student of the necessary          in Education Conference Proceedings, Seattle, WA, 2012, pp. 1-6. doi:
grammar rules in detail if the test detects that they are not            10.1109/FIE.2012.6462412
respected. This will hopefully reduce the number of incorrect        [8] J. Hattie and H. Timperley, The Power of Feedback. Review of Educa-
                                                                         tional Research. Volume 77, No. 1, pp. 81-112, 2007.
attempts since a student can fix all grammatical mistakes in         [9] K. M. Ala-Mutka. A survey of automated assessment approaches for
one step. The second one deals with the sloppy outputs. On one           programming assignments. Computer Science Education, 15(2):83–102,
hand it is good that students learn how to be accurate. On the           2005.
                                                                    [10] P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppala. Review of recent
other hand it can be very frustrating if one has been working            systems for automatic assessment of programming assignments. In Koli
on a task for hours without visible progress, even more so for           Calling, pages 86–93, 2010.
beginners. So we decided to modify the comparison function          [11] D. M. Souza, K. R. Felizardo and E. F. Barbosa, ”A Systematic
                                                                         Literature Review of Assessment Tools for Programming Assignments,”
between the expected and the given output so that it will show           2016 IEEE 29th International Conference on Software Engineering
more clearly where the outputs differ. We hope this will reduce          Education and Training (CSEET), Dallas, TX, 2016, pp. 147–156.
the number of unsuccessful attempts even further and teach the      [12] B. Cheang, A. Kurnia, A. Lim, W. Oon, On automated grad-
                                                                         ing of programming assignments in an academic institution, In
students how to be precise at the same time.                             Computers and Education, Volume 41, Issue 2, 2003, Pages 121-
   We plan to analyze more tasks in the similar manner and               131, ISSN 0360-1315, https://doi.org/10.1016/S0360-1315(03)00030-7.
use the updated tasks during future programming courses. We              (http://www.sciencedirect.com/science/article/pii/S0360131503000307)
                                                                    [13] M. Rubio-Sanchez, P. Kinnunen, C. Pareja-Flores, and Angel Velazquez-
hope to notice a reduction of unsuccessful attempts.                     Iturbide. 2014. Student perception and usage of an automated program-
                                                                         ming assessment tool. Comput. Hum. Behav. 31 (February 2014), 453-
           IV. C ONCLUSIONS AND FUTURE WORK                              460. DOI: http://dx.doi.org/10.1016/j.chb.2013.04.001
   Our goal is to use our analysis results to improve the           [14] M. Lokar, M. Pretnar. ”A Low Overhead Automated Service for
                                                                         Teaching Programming”, Proceedings of the 15th Koli Calling Con-
feedback for the most problematic tasks and use the improved             ference on Computing Education Research, Koli, Finland 2015,
exercises in the class next year.                                        http://doi.acm.org/10.1145/2828959.2828964
   Currently most of the analysis is done manually, which is        [15] G. Jerše, M. Lokar. Learning and teaching numerical methods with a
                                                                         system for automatic assessment. The international journal for technol-
a very slow process. In the future we plan to use machine                ogy in mathematics education, ISSN 1744-2710, 2017, vol. 24, no. 3,
learning algorithms to extract common patterns from unsuc-               121–127




ISEE 2018: 1st Workshop on Innovative Software Engineering Education @ SE18, Ulm, Germany                                                     31