Mining an Online Judge System to Support Introductory
Computer Programming Teaching
Rodrigo Elias Francisco Ana Paula Ambrosio
Instituto Federal Goiano Instituto de Informatica
Campus Morrinhos Universidade Federal de Goiás
Morrinhos – GO – Brazil Goiânia – GO – Brazil
+55 (64) 34137900 +55 (62) 35211181
rodrigo.francisco@ifgoiano.edu.br apaula@inf.ufg.br
ABSTRACT The BOCA software [1] is an online judge system used in
Computer programming is an activity which requires a set of programming marathons in Brazil. It is freely available for
cognitive processes that naturally develop through practice, institutions to download and install. This particular system allows
writing algorithmic solutions. Students learn a lot from their teachers to register problems and to track their students’ work.
mistakes, but for this they need feedback on their workouts. However this system is neither easy to handle nor has an exercise
Marking students’ work outs is very time consuming, which often database (DB), needed to facilitate the generation of problem lists.
limits a teacher’s capacity to offer close guidance individually. This project proposes to extend the BOCA online judge to make it
The PROBOCA project aims to build a tool, based on the BOCA more suitable for use in introductory programming teaching. The
online judge, suited for the purpose of learning computer resulting system, called PROBOCA, complements the online
programming by practice. In addition to a problem database judge functionality with features that improve its ease-of-use,
organized by theme and difficulty, the system provides enhancing teacher productivity and course effectiveness.
functionalities to support the teacher in the classroom. One of the One of the introduced features provides automatic classification of
main endeavors is to develop a procedure for estimating the problem difficulty, based on submitted solutions and students’
degree of difficulty of a certain problem. This “nominal” evaluation of degree of difficulty encountered in solving a given
parameter may then be compared to the difficulty level as problem. This will aid teachers while composing the exercise lists,
perceived by each student. The result is a valuable indicator of allowing them to better gauge the list complexity. It also lets
those students that are experiencing challenges. This paper students organize the order of problems to solve, tackling the
presents the preliminary specification of PROBOCA´s easier problems first before turning to more complex ones.
architecture and functional requirements along with its current
state of development. Another additional feature is the production of student reports
based on the submitted solutions and the students’ behavior while
accessing the tool. Student evaluation of program difficulty,
Keywords number of submissions for the same exercise, time between
Online Judge; Computer Programming Education. submissions, order of submissions, among other information
gathered by the tool, reveal information about student behavior
1. INTRODUCTION and his ease in solving the proposed problems.
The learning process of Computer Programming (CP) usually
involves practicing the resolution to many problems. Students 2. EXERCISES ON PROGRAMMING
write code to implement algorithms that meet exercise EDUCATION
requirements and teachers should review those codes and present
Aiming at automatic correction of program code and student
their feedback to the students. As this is a time consuming task,
monitoring and evaluation during the programming process,
some teachers are installing and using online judge systems as
several initiatives have been developed.
automatic reviewing and scoring tools.
Within this line of research, the study of Chaves et al [2] aims to
Online judge refers to software tools originally designed for
explore resources of online judges looking to integrate them into
programming competitions. They usually run on a server, and
the Moodle system. The goal is to provide a Virtual Learning
contestants access it online. Basically, its role is to present the
Environment (VLE) with automatic evaluation feature, allowing
contestant teams with a list of problems, to which they should
the teacher to monitor students´ problem solving. The authors
respond by uploading program codes that satisfy the criteria of
defined an architecture containing a module for integration with
each problem. The tool then evaluates the answer code by using a
online judges (MOJO). The Integration Module integrated the
set of predefined inputs and comparing the program results to
VLE structure to URI Online Judge [3] and SPOJ Brazil [4].
predefined outputs. If the output of the answer code exactly
matches the expected output for the corresponding input, the In [5], the authors have developed a prototype intended to be an
answer code is considered correct. Otherwise it is considered educational online judge. They argue that online judge tools are
incorrect. No indication of where the program went wrong is suited to competition and have few educational features. They
given. Although helpful as a teaching assistant, these tools were also criticize the way online judges provide feedback to students
not designed for use in a classroom and therefore lack some only indicating whether the answer is right/wrong or if there were
features that are important for academic management. errors at compilation/runtime. Their project, called JOnline, aims
to add the following didactic features: Presenting tips in
Portuguese to fix compilation errors found in the submitted source traditional exercise lists. The burden on the teacher is significantly
code; presenting the test cases that generate erroneous results; reduced and the students get feedback. Teachers are then able to
organization of the problems by topic; difficulty level deduced focus on helping the students that fail to solve the problems even
from a poll conducted by the system and a resource for after several attempts. This strategy allows students to tackle a
collaborative programming that allows two students to co-write larger number of exercises, thus increasing their potential to
one common code. master cognitive skills required for programming. However, in
Automatic generation of exercise lists based on user defined spite of these advantages, some drawbacks were revealed.
criteria requires the classification of problems according to these Looking at the process of reviewing students’ answers two issues
criteria. In [6], the authors conclude that there is a strong were identified. First, the system assesses a submitted program by
relationship between the difficulty of a problem and the total comparing its output, as generated by the student’s code in
number of lines of code and amount of flow control in the response to a given input data set, to the registered expected
program (IF, WHILE, FOR ...). New students have difficulty in output. For the two outcomes to be identical, exact formatting of
reading and interpreting problem statements. This problem has the program’s output is required. At the beginning students
been related to the difficulty in dealing with abstraction [9]. incurred in many errors just for using a slightly different format.
This situation marks the problem’s result as wrong whereas the
3. BOCA program’s logic maybe right, causing frustration among students
Designed for use in programming marathons, the BOCA online as the system does not point out where the error is. Students,
judge system has vocabulary and requirements contextualized by however, managed to adapt and began paying more attention to
Competition and Teams. The main interest in this system is that it output formatting. Second, there is no cross-answers plagiarism
was designed to enable its users, the competitors, to interact with identification, in other words no control over cheats. It was
a set of problems. Figure 1 shows a use case diagram with the observed that some students simply copied their colleagues’
different functions provided by BOCA. Its software allows program and submitted them as their own, thus considered to have
registration of problems for a given competition. That is done by solved the problem without any real comprehension of the
submitting a PDF file stating the problem to be tackled and the solution.
files containing the input and output data sets for the
Figure 2 shows the BOCA code submission screen. It was
corresponding test cases. The registered problems are associated
modified to include the “level of difficulty” selection where the
with a single competition. If needed, to reuse the problems for
students evaluate how hard it was to solve that problem they are
another competition, the files must be inserted again. BOCA does
submitting.
not include a database to store the problems.
Figure 2. BOCA’s submission functionality answers
Regarding the class management functionality, it was noted that
each installed instance of BOCA supports one active competition
Figure 1. BOCA´s use case diagram at a time, meaning a single class per instance. Thus, if a teacher
wants to have several competitions running at the same time i.e.
In the system, colored balloons represent the points gained by the
different groups doing the same or different sets of exercises, he
competitors. This feature is part of the programming marathons
must install multiple instances of BOCA. Each instance is
context where correctly answering a problem yields a balloon to
independent. So even if two competitions are composed of the
its team. Besides allowing teams to monitor their submissions and
same problems (e.g. for different classes), these problems must be
the related results, BOCA also provides a screen displaying the
separately registered for each instance. As each competition is
teams ranking including the team’s overall score, and a list of the
independent, the system does not group the results of the different
solved exercises using balloons to identify those successfully
competitions. This feature would be interesting in case of multiple
done. It also produces information on students’ interaction with
exercise lists delivered to the same class. It should also be noted
the problems containing the time spent and the number of
that the system is not trivial to install and manage, which ends up
resolution attempts.
discouraging the teachers from adopting it. On top of that, the
BOCA has been used in CP introductory classes at our institute teacher needs to add each student to the system, which proves to
for some years, with good results. Using this system enhances the be quite tedious, especially for large classes. To overcome this
practice of problem solving by providing automatic feedback to drawback, teachers have implemented programs that automatically
students. We have observed that the number of problems solved generate registration ids and corresponding passwords based on
by students using BOCA is significantly higher when compared to student university registration number. As BOCA does not allow
password modification, students often know other students parameters, namely a difficulty level (1-easy; 2-medium; 3-
password as they are generated following certain patterns. This difficult); the desired component syllabus´ elements for the given
facilitates copying other students’ solutions. problem set and finally the quantity of problems to compose the
This shows that, although BOCA has several advantages, there are list. The system then analyzes which problems best fit the user-
problems that hinder the use of the system in the classroom. To defined parameters and generates a competition list, from among
solve this, beyond what has already been presented, we aim to the problems available in its DB. The difficulty level for each
tackle the following requirements: automatic generation of lists of problem is estimated automatically by the system based on data
problems based on subjects and level of difficulty, measurement obtained from solutions submitted by students and other
of the student experience with problem solving by subject, and parameters as described in section 6.
rank problems by levels of difficulty. We consider the last item is
important for didactic purposes, not finding a technique or
algorithm to use, this is the focus of this work.
4. PROBOCA
From an architectural viewpoint, the proposed system builds upon
BOCA that is considered an internal component and offers its
structure, PHP source code and PostgreSQL DB to be reused.
In order to avoid further complication of the development process,
the “Competitions and Teams” context is kept, with some
adaptations. The terms “Competition” and “Team” are used to
loosely indicate exercise list and student respectively. BOCA´s
user interface allows for such extrapolation which is already in
practice by the teachers who use this system in their classroom.
Adapting BOCA to support CP teaching brought the need to
introduce new functional requirements, as presented in the use
case diagram in Figure 3.
PROBOCA required some changes to the BOCA DB structure. It
was necessary to modify the internal tables in order to link the
stored problems to given course’s syllabus and certain difficulty Figure 4. Generate problem lists\competitions
levels as well as to include more detailed data about the students.
Furthermore, the original competition-based structure will be
altered to adapt it to the concept of multiple exercise lists. R3’s purpose is to provide teachers with a report of every
student´s experience level based on his interaction with the
system. Using the collected data, it should be possible to present
information about a student's experience regarding the syllabus
elements addressed in each problem, thus allowing the teacher to
detect where students show more difficulty. This requirement
depends on R4, which measures the difficulty level of a given
problem. A simple mechanism to measure the degree of difficulty
of a given problem was developed and tested.
For R3 to be reliable, it is necessary that a plagiarism
identification strategy (R9) be also implemented in the system.
R5 is aimed at associating a student to multiple competitions
without the need to register each time. Thus, the data from
different competitions can be associated to the student, allowing
the integration of results obtained for the different lists presented
to a given class. This requirement has a dependency upon R6,
which aims to adapt BOCA to judge problems from several
competitions. R7 is also related to R5, with the goal of allowing a
student to participate in several competitions - analog to
Figure 3. Use Case Diagram with New Requirements answering several problem lists.
R8 aims to display the problems’ balloons, color coded to indicate
Following requirement R1, student registration is now done by the the problem’s difficulty level. Currently, balloon colors are
student himself using a token which is handed out by the teacher. registered by the system administrator along with the problems
Besides saving a password, other information is collected during and have no relation to difficulty or content, unless the
registration that permits class wide statistical analysis. classification and corresponding color attribution is done by the
teacher when uploading the problems.
R2 was intended to simplify the task of competition generation,
which is performed by the teacher. To achieve this goal, a
database of problems was implemented. Currently, to generate a
competition, as shown in figure 4, the user indicates three
5. IMPLEMENTATION Nonetheless, this work investigates the possibility of
characterizing a programming problem in a way such that
calculated parameters may correlate to a determined “Difficulty-
The functionalities of detailing the student information and level” as expressed by students.
providing automatic generation of exercise lists are already
implemented and tested. Requirement R4, estimating the difficulty In their work [6], Alvarez and Scott present the program control
level of a given problem, is currently in progress. flow and number of lines of code as variables that correlate to the
difficulty of a problem. The experiments undertaken in the current
Within this requirement, the first executed task was to create 74 study corroborate this information.
exercises using BOCA´s problem-registration functionality. These
exercises cover the syllabus of the “Introduction to programming” Also, other works [7, 8] show the need to deal with abstraction to
course (CS1) and range from debutant programming instructions solve problems. A highly abstract problem is one that implies a
(Hello World style) up to coding with matrixes. The inserted greater level of generalization. Thus, a problem that involves
problems populated an initial database to be used for testing, but many different topics can become more abstract and hence more
additional problems are expected to be easily included. An difficult. It is fairly safe to assume that in order to measure the
estimated level of difficulty for each problem was supplied by the difficulty of a problem, it is necessary to make a detailed analysis
teacher inserting the problem in the database. of the topics that are addressed within the problem. That
proposition helps in formulating the hypothesis below.
To submit an answer, the student uploads one code file per
problem solved. Along with the file upload, the student is asked to 6.1 Problem Difficulty Mechanism
evaluate the problem´s difficulty, by indicating one of three
choices: Easy, Medium or Difficult, based on his experience when
On one hand, the survey of the student´s opinion of the submitted
solving the problem (Figure 4).
problem’s difficulty provided the first estimate. For each of the 74
Success was obtained in computing several parameters that are registered exercises, this information was stored, along with the
needed to calculate problem difficulty. They include different respective answer in the DB. Although information was collected
measures, such as counting the number of repetition and selection for all students, statistics were calculated only considering those
structures used in the solution and the number of topics involved that completed at least 95% of the exercises (70 out of 74). This
in the problem. Problem topics were defined using the excluded students that dropped out the course in order to prevent
introductory programming course syllabus. They include, their partial answers from skewing the results. After filtering, the
input/output; attribution; selection; repetition; vectors; strings; mean difficulty “Mean_Dif” was then calculated for each exercise
matrices and functions. In this list, topics appear in the order they based on the answers of the resulting 62 students. “Median_Dif”
are taught and therefore in increasing level of complexity for and “Mode_Dif” were also calculated.
novel students. Since programming is incremental, more complex
In addition, C source code was written to correctly workout each
topics usually base upon lesser complex topics. Several tests have
of the registered problems. A PHP program was also developed in
been conducted to define and validate the mathematical function
order to analyze the programs´ internal structures and to extract
that will calculate the difficulty of the problem, but this is still an
some measures from each program. The measures include
open issue.
counting:
Based on the calculated difficulty and the topics involved in the
problem, colored balloons are associated to the problem. Different • Lines of code, represented by variable N_LC;
colors represent different topics, and within each topic, the level • Repetition structures used in the solution, N_RP;
of difficulty is represented by the intensity of the color. For
• Selection structures, N_SL;
example, blue represents problems whose most complex topic is
selection. Light blue balloons are associated to easy problems • Edges in a graph that represents the algorithm, N_EDG;
using selection. Medium blue balloons are associated to medium • Edges in a graph that represent repetition structures in
difficulty problems and dark blue balloons are associated to the algorithm, N_EDG_RP;
problems that demand more from students.
• Height of a tree, with each sub-block of internally
The task of providing teachers with a report on student nested code representing a node, MX_HT and
achievements (R3) has not been tackled yet. Ideas in this sense
include: (1) comparing student’s perceived problem difficulty and • Number of topics involved in the problem, N_TPC.
mean problem difficulty. If students perceive problems as harder This last count was obtained manually.
or easier than the mean this could indicate that they are at a lesser
or higher level of programming competency; (2) comparing
successive submission of the same problem. This may show if To verify if a combination of variables obtained better results, the
students adopt a trial-and-error approach; (3) mean time taken to following formulae were also tested against the Mean_Dif
solve problems; (4) mean number of submission per problem; (5) variable to verify their correlation. Table 2 shows the results.
score when compared to the other students; among others. Mean_Dif correlated positively with all measured variables, being
the best correlation associated to f4, r = .76, p = .000, that
6. Estimating Problem Difficulty improves on the individual correlations with N_TPC (number of
topics) and MX_HT (maximum tree height).
In a sense, “difficulty” expresses the lack of ability, or amount of
skill/effort needed to accomplish something. Obviously, the
perception of difficulty is an individual matter that is related to
many aspects, including the time at which the question was posed.
Table 1. Correlation between measured variables and student Table 3. Correlation between measured variables and teacher
evaluation evaluation
Median_ Mean Mode_ Median_Prof_Dif
Dif _Dif Dif N_RP Pearson Correlation .62
N_RP Pearson Correlation .49 .52 .44 Sig. (2-tailed)
(Repetitions) .000
(Repetitions) Sig. (2-tailed) .000 .000 .000 N
N 74
74 74 74 N_SL Pearson Correlation .20
N_SL Pearson Correlation .16 .34 .19 Sig. (2-tailed)
(Selections) .093
(Selections) Sig. (2-tailed) .171 .003 .108 N
N 74
74 74 74 N_LC Pearson Correlation .56
N_LC Pearson Correlation .44 .62 .45 Sig. (2-tailed)
(Lines of Code) .000
(Lines of Sig. (2-tailed) .000 .000 .000 N
N 74
Code) 74 74 74 N_TPC Pearson Correlation .50
N_TPC Pearson Correlation .57 .69 .59 Sig. (2-tailed)
(Topics) .000
(Topics) Sig. (2-tailed) .000 .000 .000 N
N 74
74 74 74 N_EDG Pearson Correlation .53
N_EDG Pearson Correlation .42 .58 .41 Sig. (2-tailed)
(Edges) .000
(Edges) Sig. (2-tailed) .000 .000 .000 N
N 74
74 74 74 MX_HT Pearson Correlation .50
MX_HT Pearson Correlation .52 .67 .56 Sig. (2-tailed)
(Tree Height) .000
(Tree Height) Sig. (2-tailed) .000 .000 .000 N
N 74
74 74 74 N_EDG_RP Pearson Correlation .62
N_EDG_RP Pearson Correlation .53 .60 .51 Sig. (2-tailed)
(Rep. Edges) .000
(Rep. Edges) Sig. (2-tailed) .000 .000 .000 N
N 74
74 74 74
Table 4 correlates the teacher defined difficulty with the proposed
f1 = N_RP+N_SL+N_LC+N_TPC+N_EDG+MX_HT+N_EDG_RP
formulae defined above. Positive correlation was found with all
7
formulae, being the best correlation associated to f2, r = .63,
f2 = N_EDG . 0.3 + MX_HT . 1.4 + N_EDG_RP . 1.4 + N_LC . 0.2 p=.000. Furthermore, a positive correlation was found between
f3 = N_TPC+MX_HT+N_LC + N_EDG_RP+N_EDG+N_RP + N_SL the teacher defined difficulty and mean student perceived
3 8 difficulty, r = .69, p = .000.
f4 = N_TPC + MX_HT
2
Table 2 shows the results. Mean_Dif correlated positively with all Table 4. Correlation of teacher defined difficulty and
measured variables, being the best correlation associated to f4, r = developed formulae
Mean_Dif
.76, p= .000, that improves on the individual correlations with Pearson Correlation .59
N_TPC and MX_HT. f1 Sig. (2-tailed) .000
N
74
Pearson Correlation .63
Table 2. Correlation of student perceived difficulty and f2 Sig. (2-tailed) .000
developed formulae N
74
Mean_Dif r* .59
Pearson Correlation .66 f3 Pearson Correlation .000
Sig. (2-tailed) Sig. (2-tailed) 74
f1 .000
N N
74 Pearson Correlation .55
Pearson Correlation .68 Sig. (2-tailed)
Sig. (2-tailed) f4 .000
f2 .000 N
N 74
74
Pearson Correlation .67
f3 Sig. (2-tailed) .000 Although student perceived difficulty and teacher defined
N
74 difficulty are correlated, there are differences that can be verified
Pearson Correlation .76 by the correlations found with the measured variables and
f Sig. (2-tailed) .000
N proposed formulae. This could be explained by the fact that
4 74
students evaluate difficulty based on the knowledge they have
when developing the solution. As they do not know what comes
We also verified correlations between teacher evaluations of the ahead, they cannot base their evaluation on the overall knowledge
problems’ difficulty. For this we asked five teachers to read each of programming. Teachers on the other hand, have this overall
problem and attribute a level of difficulty in the range 1-5. Since view of the domain and evaluate difficulty accordingly. It must be
we had only five evaluations we chose to work with the median observed that the teachers that did the evaluation are new to
difficulty obtained (Median_Prof_Dif). teaching and have not yet acquired a more critical understanding
Correlating this variable to the measured variables we obtained of the difficulties students encounter when learning to program.
the results presented in table 3. Best correlations were obtained After developing algorithmic reasoning, people tend to forget how
with N_RP and N_EDG_RP, r = .62, p = .000, both related to the they thought before, and find that many concepts are obvious
number of repetition structures found in the solution codes. when in fact, for beginners, they are not.
7. CONCLUSION REFERENCES
PROBOCA is a system under development whose goal is to adapt [1] BOCA. BOCA Online Contest Administrator. Available in
the BOCA software for use in teaching programming. While Access on March
BOCA itself was developed for programming marathons, it is 20th, 2015
already in use, as a support tool, in programming introductory [2] Chaves, J. O. et al. Uma Ferramenta de Auxílio ao Processo
courses. As a learning aid, BOCA has several limitations, de Ensino Aprendizagem para Disciplinas de Programação
especially relative to class administration and student monitoring. de Computadores (2013). TISE 2013
In addition, as a system specifically developed for competitions, it
lacks mechanisms that facilitate the creation of exercise lists, such [3] URI. URI Online Judge. Available in Access on May 20th, 2014.
PROBOCA supports the persistence of problems registered by [4] SPOJ. SPOJ Brasil. Available in
teachers in the database and provides greater access to students’ Access on May 20th, 2014.
related information. Unlike other “Online Judge” systems that are [5] Santos, J. C. S., Ribeiro, A. R. L. JOnline - proposta
available exclusively online and are managed by their creators, preliminar de um juiz online didático para o ensino de
PROBOCA can be downloaded and installed by the teacher, programação (2007). SBIE 2011.
giving the teacher control over the problems stored in the
[6] Alvarez, A. and Scott, T. A. (2010). Using student surveys in
database. Since teachers are responsible for introducing the
determining the difficulty of programming assignments.
problems, this solution has the additional advantage that it is
Journal of Computing Sciences in Colleges, 26(2):157–163.
language free, i.e., it is not limited to teachers and students that
speak the language in which the problem specifications were [7] Gomes, A. et al. (2008). Aprendizagem de programação de
written as is the case of online systems administered by third computadores: dificuldades e ferramentas de suporte.
parties. Revista Portuguesa de Pedagogia. 42(2).
In addition to implementing an environment that facilitates the use [8] Mendonça, A. et al. Difficulties in solving ill-defined
of BOCA in teaching programming, PROBOCA also aims to problems: A case study with introductory computer
provide teachers and students with information that will help programming students. In Frontiers in Education
students in their learning process. One of the important features of Conference, 2009. FIE’09. 39th IEEE, pages 1–6. IEEE.
PROBOCA is an automatic evaluation of problem difficulty. This [9] Mendonça, A., de Oliveira, C., Guerrero, D., and Costa, E.
gives students direction in the path to follow when choosing (2009). Difficulties in solving ill-defined problems: A case
which exercises to solve first, allowing them to solve easier study with introductory computer programming students. In
exercises before more complex ones, diminishing student Frontiers in Education Conference, 2009. FIE’09. 39th
frustration at not solving problems. It also allows teachers to IEEE, pages 1–6. IEEE.
better gauge the level of difficulty of exercise lists. As shown by
the collected data, teacher and student evaluation regarding
problem difficulty do not match, and this may lead to distortions
when teaching programming. Future work could include
developing a system that will automatically suggest problems to
students based on their performance and calculated problem
difficulty.
This work shows the approach being taken to calculate the
difficulty of the problems. This approach differs from others by
treating the algorithms submitted by students in the form of
graphs and trees to identify properties that could be correlated
with the difficulty of problems. For this, data mining using
correlations was undertaken.
Part of the system has already been implemented, and has been
successfully used in the classroom. The success of these tasks
shows the feasibility of the project and encourages further work.