=Paper= {{Paper |id=Vol-2188/Paper11 |storemode=property |title=Model for Evaluating Student Performance Through Their Interaction With Version Control Systems |pdfUrl=https://ceur-ws.org/Vol-2188/Paper11.pdf |volume=Vol-2188 |authors=Angel M. Guerrero-Higueras,Vicente Matellán-Olivera,Gonzalo Esteban-Costales,Camino Fernández-Llamas,Francisco J. Rodríguez-Sedano,Miguel Á. Conde |dblpUrl=https://dblp.org/rec/conf/lasi-spain/HiguerasOELRC18 }} ==Model for Evaluating Student Performance Through Their Interaction With Version Control Systems== https://ceur-ws.org/Vol-2188/Paper11.pdf
  Model for Evaluating Student Performance
Through Their Interaction With Version Control
                   Systems

     Ángel Manuel Guerrero-Higueras1 , Vicente Matellán-Olivera3 , Gonzalo
         Esteban Costales1 , Camino Fernández-Llamas2 , Francisco Jesús
                 Rodrı́guez-Sedano2 , and Miguel Ángel Conde2
1
    Research Institute on Applied Sciences in Cybersecurity (RIASC). Universidad de
                 León, Av. de los Jesuitas s/n. ES-24008 León (Spain).
                                am.guerrero@unileon.es
    2
       Robotics Group. Universidad de León, Av. de los Jesuitas s/n. ES-24008 León
                                         (Spain).
                   {camino.fernandez,fjrods,mcong}@unileon.es
     3
        Supercomputación de Castilla y León (SCAYLE), Campus de Vegazana s/n,
                                 ES.24071 León (Spain).
                              vicente.matellan@fcsc.es



         Abstract. Version Control Systems are commonly used for Information
         and communication technology professionals. They also allows to follow
         the activity of a single programmer working in a project. For these rea-
         sons, Version Control Systems are also used by educational institutions.
         The aim of this work is to demonstrate that the student performance
         may be evaluated, and even predicted, by monitoring their interaction
         with a Version Control System. In order to do so we have build a Ma-
         chine Learnings model to predict student results in a specific task of the
         Ampliación de Sistemas Operativos subject from the second course of
         the degree in Computer Science of the University of León through their
         interaction with a Git repository.

         Keywords: Version Control System, Machine Learning, Learning ana-
         lytics.


1     Introduction
The emergence of the Information and Communication Technologies have change
the landscape of the teaching and learning processes. Teachers can employ a lot
of tools in their classes with the aim to improve students learning. In addition
students can use different application to learn in their education center and be-
yond it. However, Is it possible to say if a tool is improving student performance?
If we can assert this, it would be possible to use the tool that better fits with
specific lessons or students. There are several studies regarding to this, and this
issue is specially link to trends such as Learning Analytics and Educational Data
Mining.




    Copyright © 2018 for this paper by its authors. Copying permitted for private and academic purposes
Learning Analytics Summer Institute Spain - LASI Spain 2018


    The most accepted definition of learning analytics considers that it comprises
“the measurement, collection, analysis and reporting of data about learners and
their contexts, for purposes of understanding and optimising learning and the
environments in which it occurs” [1]. Learning analytics facilitates discovery of
“hidden” knowledge about teaching and learning processes (see [2,3]). There-
fore, the use of learning analytics allows learners and instructors to obtain and
visualize information about different issues and between them the suitability of
contents and/or tools and their impact in students’ performance [4]. Educational
institutions and instructors could use the information obtained by applying these
techniques to make changes in the courses in order to improve the whole learning
process and experience [5].
    In this case the idea is to explore how students’ performance is affected by
the use of Version Control Systems (VCSs). VCSs facilitate the management of
changes in the components of a software product or its configuration[6]. The
version, release or edition is the state of this product in a specific moment. But
why to use such tools? This is because it is a high demanded tool for future
computer science engineers and it is introduced as a tool of several Computer
Science Subjects.
    The aim of this work is to build a model that allows to predict student
results at a practical assignment by monitoring their use of a VCS. We assume
the premise that the students activity with this type of systems is an indicator
of the evolution of their progress.
    The rest of the paper is organized as follows: Section 2 describes the em-
pirical evaluation of the classification algorithms presenting the experimental
environment, materials, and methods used. Section 3 summarizes the results of
the evaluation. The discussion of the results is developed in Section 4. Section 5
presents the conclusions and future lines of research.


2     Materials and Methods

This section describes all the elements and the methodology used to build and
evaluate the model for predicting student results. Among the elements used
there are a specific practical assignment to provide student results, and a VCS.
Regarding the methodology, a set of classifying algorithms have been evaluated
by analysing some well-known Key Performance Indicators (KPIs).


2.1   Practical assigment: ASSOOFS

The Ampliación de Sistemas Operativos (ASSOO) subject from the second
course of the degree in Computer Science of the University of León broadens
knowledge about operating systems. In particular, it addresses the internal func-
tioning of storage management, both volatile (memory management) and non-
volatile (file management). Issues related to security in operating systems are
also addressed.
                     Learning Analytics Summer Institute Spain - LASI Spain 2018

    Main practical assignment consists on implementing an inode-based file sys-
tem called Ampliación de Sistemas Operativos File System (ASSOOFS). Ac-
cording to the proposed specification, this file system must work on computers
that run the Linux operating system. Therefore, students have to implement a
module for the Linux kernel [7] that supports, at least, the following operations:
mounting of devices formatted with this system; creation, reading and writing
of regular files; creation of new directories and the visualization of the content
of existing directories.
    This is an individual assignment and each student is encouraged to use a
VCS during the completion of the task.

2.2   GitHub Classrrom
In software engineering, it is known as control of versions to the management
of the changes that are made on the elements of some product [6]. It is called
version, revision or edition, to the state of the product at a given time.
    Version management can be done manually, although it is advisable to use
some tool to facilitate this task. These tools are known as VCSs [8]. Among the
most popular there are the following: CVS, Subversion [9] or Git [10].
    A VCS must provide, at least, the following features:

 – Storage for the different elements to be managed (source code, images, doc-
   umentation).
 – Edition the stored elements (creation, deletion, modification, renaming, etc.).
 – Registration and labelling of all actions carried out, of so that they allow an
   element to be returned to a state previous.

    For the development of ASSOOFS, students are encouraged to use a Git
repository. Git follows a distributed scheme, and contrary to other systems that
follow the client-server models, each copy of the repository includes the story
complete of all the changes made [11].
    In order to provide some organizing capabilities and private repositories for
students the GitHub Classroom platform was used [12]. GitHub is a web-based
hosting service for software development projects that utilize the Git revision
control system. In addition, GitHub Classroom allows to assign tasks to students,
or groups of students, framed in the same centralized organization: ASSOO
students in our case.

Features Regarding the input data to predict results, usually called features in
a Machine Learning (ML) context, we have considered the following information
coming from students activity on their repositories:

 – Commits: total number of commit operations carried by the student.
 – #Days with commit operations: total number of days where there is at least
   one commit operation.
 – Commits/date: average number of commit operations per date.
Learning Analytics Summer Institute Spain - LASI Spain 2018

 – Additions: number of lines of code added during the assignment completion.
 – Deletions: number of lines deleted during the assignment completion.
    In addition to the above data, all obtained from the GitHub Classroom plat-
form, we have also considered the students grade on a proof carried out to control
the authorship of the code in student repositories. This authorship proof allows
to verify that the students really worked in the content of their repository. The
authorship proof has two possible results: “1”, if the student passed the proof;
“0” otherwise.
    Input data explained above will be used by the model to predict a class: AP,
for those students who will finish the practical assignment successfully; and SS,
for those who not.

2.3   Model
We want to generate a model whose inputs are quantitative, while its output
is a discrete value: AP, and SS. Two types of ML algorithms may be used:
classifiers and predictors, whereby considering the first ones will be better. We
have evaluated the following well-known methods that we think are the more
promising ones: Adaptive Boosting (AB), Classification And Regression Tree
(CART), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA),
Logistic Regression (LR), Multi-Layer Perceptron (MLP), Naive Bayes (NB),
and Random Forest (RF).
AB Ensemble methods are techniques that combine different basic classifiers
  turning a weak learner into a more accurate method. Boosting is one of the
  most successful types of ensemble methods, and AB one of the most popular
  boosting algorithms.
CART A decision tree is a method which predicts the label associated with an
  instance by travelling from a root node of a tree to a leaf [13]. It is a non-
  parametric method in which the trees are grown in an iterative, top-down
  process.
KNN Although nearest neighbours is the foundation of many other learning
  methods, notably unsupervised, supervised neighbour-based learning is also
  available to classify data with discrete labels. It is a non-parametric technique
  which classifies new observations based on the distance to observation in the
  training set. A good presentation of the analysis is given in [14] and [15].
LDA Parametric method that assumes that distributions of the data are mul-
  tivariate Gaussian [15]. Also, LDA assumes knowledge of population pa-
  rameters. In another case, the maximum likelihood estimator can be used.
  LDA uses Bayesian approaches to select the category which maximizes the
  conditional probability (see [16], [17] or [18]).
LR Linear methods are intended for regressions in which the target value is ex-
  pected to be a linear combination of the input variables. LR, despite its name,
  is a linear model for classification rather than regression. In this model, the
  probabilities describing the possible outcomes of a single trial are modeled
  using a logistic function.
                     Learning Analytics Summer Institute Spain - LASI Spain 2018

MLP An artificial neural network is a model inspired by the structure of the
  brain. Neural networks are used when the type of relationship between inputs
  and outputs is not known. It is supposed that the network is organized
  in layers (input layer, output layer and hidden layers). An MLP consists
  of multiple layers of nodes in a directed graph so that each layer is fully
  connected to the next one. An MLP is a modification of the standard linear
  perceptron and, the best characteristic is that it is able to distinguish data
  which is not linearly separable. An MLP uses back-propagation for training
  the network, see [19] and [20].
NB This method is based on applying Bayes’ theorem with the “naive” assump-
  tion of independence between every pair of features, see [15] and [21].
RF Classifier consisting of a collection of decision trees, in which each tree is
  constructed by applying an algorithm to the training set and an additional
  random vector that is sampled via boostrap re-sampling [22].

    To evaluate the previous methods, the implementation of the Scikit-learn
library has been used [23].

2.4   Methodology
In order to train the models we have use the data obtained by the ASSOO
students from de 2016–2017 course presented at [24]. These data includes the
features mentioned at section 2.2 for the 46 students who tried the ASSOOFS
assignment. We carried out 2 kind of analysis: in the first one we do not include
the authorship proof as an input feature; in the second one, we do.
    To evaluate the above algorithms with there input data, we have followed
the method proposed at [25] to select the model which better fits our problem.
The method proposes a 10-iteration cross-validation analysis for selecting the
most suitable learning algorithm. Moreover, the accuracy classification score has
been used to evaluate the performance of the models.PThe accuracy classification
score is computed
               P as shown at equation 1, where          Tp is the number of true
positives, and    Tn is the number of true negatives.
                                       P        P
                                          Tp + Tn
                            accuracy = P                                       (1)
                                           total data
    The three models with the highest accuracy classification score have been
pre-selected for in-depth evaluation by considering the following KPIs: Precision
(P ), Recall (R), and F1 -score; all of which were obtained through the confusion
matrix.                                                              P
    The Precision (P ) is computed as shown at equation 2, where        Fp is the
number of false positives.
                                          P
                                             Tp
                                 P =P         P                               (2)
                                         Tp + Fp
                                                       P
    The Recall (R) is computed at equation 3, where Fn is the number of false
negatives.
Learning Analytics Summer Institute Spain - LASI Spain 2018


                                           P
                                             Tp
                                 R= P         P                                    (3)
                                         Tp + Fn
   These quantities are also related to the F1 -score, which is defined as the
harmonic mean of precision and recall as shown at equation 4.
                                             P ×R
                                    F1 = 2                                         (4)
                                             P +R

3    Results
Table 1–left shows the accuracy classification score for all evaluated models with-
out consider the authorship proof as an input feature. The highest scores for the
validation dataset are highlighted in bold. Table 1–right show Precision (P ),
Recall (R), and F1 -score for the highlighted models: RF, CART, and LR.


Table 1. Accuracy classification score without consider the authorship proof (left) and
Precision (P ), Recall (R), and F1 -score for highlighted models (right).

                             Classifier Class P      R F1 -score #examples
     Model Score
                                          AP 0.67 1.00      0.80         4
      RF  0.8                    RF        SS 1.00 0.67     0.80        6
     CART 0.7                             avg. 0.87 0.80    0.80        10
      LR  0.6
                                          AP 0.60 0.75      0.67         4
      LDA 0.6
                               CART        SS 0.80 0.67     0.73        6
      KNN 0.6
                                          avg. 0.72 0.70    0.70        10
       AB 0.6
      MLP 0.6                             AP 0.50 1.00      0.67         4
       NB 0.5                    LR        SS 1.00 0.33     0.50        6
                                          avg. 0.80 0.60    0.57        10



    Fig 1 shows the confusion matrix computed from the highlighted models:
RF, CART, and LR.
    Table 2–left shows the accuracy classification score for all evaluated models
considering the authorship proof as an input feature. The highest scores for the
validation dataset are highlighted in bold. Table 1–right show Precision (P ),
Recall (R), and F1 -score for or the highlighted models: RF, LR, and NB.
    Fig 2 shows the confusion matrix computed from the highlighted models:
RF, LR, and NB.


4    Discussion
According to the above results, as shown at Table 1–left, RF classifier works
better (accuracy score = 0.8) than any other for selected features, in this case:
                       Learning Analytics Summer Institute Spain - LASI Spain 2018




Fig. 1. Confusion matrix for the RF (left), CART (center), and LR (right) classifier.

Table 2. Accuracy classification score considering the authorship proof (left) and Pre-
cision (P ), Recall (R), and F1 -score for highlighted models (right).

                             Classifier Class P      R F1 -score #examples
     Model Score
                                          AP 0.80 1.00      0.89         4
       RF      0.9               RF        SS 1.00 0.83     0.91        6
       LR      0.8                        avg. 0.92 0.90    0.90        10
       NB      0.8
                                          AP 0.67 1.00      0.80         4
       LDA     0.7
                                 LR        SS 1.00 0.67     0.80        6
      KNN      0.6
                                          avg. 0.87 0.80    0.80        10
      MLP      0.6
      CART     0.5                        AP 0.67 1.00      0.80         4
        AB     0.5               NB        SS 1.00 0.67     0.80        6
                                          avg. 0.87 0.80    0.80        10



Commits, #days with commit operations, commits/date, additions, and dele-
tions. CART classifier works slightly worse (accuracy score = 0.7) than RF,
while all the other classifiers offer very poor results.
    Once the best models are pre-selected, a deeper analysis with the confusion
matrix of each one is given. Another important item that should be analysed
is the sensitivity of the model for detecting a passed assignment (AP): i.e., the
rate of APs that the model classifies incorrectly. Table 1–right and Fig 1, show
that the RF classifier gets better average values for Precision (P ), Recall (R)
and F1 -score than CART and LR.




  Fig. 2. Confusion matrix for the RF, (left) LR (center), and NB (right) classifier.
Learning Analytics Summer Institute Spain - LASI Spain 2018

    Table 2–left shows the results by considering an additional feature: the result
of an authorship proof made by students. Accuracy is clearly better considering
this feature. Again, it is the RF classifier the one with best results (accuracy score
= 0.9). LR and NB classifiers work slightly worse, both with a 0.8 accuracy. All
the other classifiers work worse. Regarding the sensitivity of the three bests
models for detecting a passed assignment (AP), Table 2–right and Fig 2, show
that the RF classifier gets better average values for precision (P ), recall (R) and
F1 -score than LR and NB.
    It is important to note that models have been trained with a small dataset.
We have data for just 46 students. Results might change considerably with a
bigger dataset. ASSOO students of the 2017–2018 course, have done the same
practical assignment, so we plan to repeat the analysis with a bigger dataset
when 2017–2018 students finish their work.


5    Conclusions

This work aim to build a model to predict students results by monitoring their
activity at VCSs. We start from the premise that analysing the students activity
at VCSs allows to predict their results.
    To build the model several classifiers have been evaluated. In addition to
select the best classifier, we have demonstrated that our premise is true due to
the fact that we can predict the students results with a success high percentage.
However, the models were evaluated using a small dataset. It would be desirable
to get a larger volume of data to perform the analysis.
    Regarding the chosen features, we observe that in addition to consider the
repository activity, adding an authoring proof helps to increase the accuracy.
    Future work will be related to the tuning the hyper-parameters of models
in order to obtain better results. In addition, we need to increase de training
dataset.


References

 1. Siemens, G., Long, P.: Penetrating the fog: Analytics in learning and education.
    EDUCAUSE review 46(5), 30 (2011)
 2. Hernández-Garcı́a, Á., González-González, I., Jiménez-Zarco, A.I., Chaparro-
    Peláez, J.: Applying social learning analytics to message boards in online distance
    learning: A case study. Computers in Human Behavior 47, 68–80 (2015)
 3. Agudo-Peregrina, Á.F., Iglesias-Pradas, S., Conde-González, M.Á., Hernández-
    Garcı́a, Á.: Can we predict success from log data in vles? classification of interac-
    tions for learning analytics and their relation with performance in vle-supported
    f2f and online learning. Computers in human behavior 31, 542–550 (2014)
 4. Conde, M.Á., Hernández-Garcı́a, Á., Oliveira, A.: Endless horizons?: addressing
    current concerns about learning analytics. In: Proceedings of the 3rd International
    Conference on Technological Ecosystems for Enhancing Multiculturality. pp. 259–
    262. ACM (2015)
                       Learning Analytics Summer Institute Spain - LASI Spain 2018

 5. Conde, M.Á., Hernández-Garcı́a, Á.: Learning analytics for educational decision
    making. Computers in Human Behavior (47), 1–3 (2015)
 6. Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from
    version control and bug tracking systems. In: Software Maintenance, 2003. ICSM
    2003. Proceedings. International Conference on. pp. 23–32. IEEE (2003)
 7. Corbet, J., Rubini, A., Kroah-Hartman, G.: Linux Device Drivers: Where the Ker-
    nel Meets the Hardware. ” O’Reilly Media, Inc.” (2005)
 8. Spinellis, D.: Version control systems. IEEE Software 22(5), 108–109 (2005)
 9. Pilato, C.M., Collins-Sussman, B., Fitzpatrick, B.W.: Version Control with Sub-
    version: Next Generation Open Source Version Control. ” O’Reilly Media, Inc.”
    (2008)
10. Torvalds, L., Hamano, J.: Git: Fast version control system. http://git-scm.com
    (2010)
11. De Alwis, B., Sillito, J.: Why are software projects moving from centralized to
    decentralized version control systems? In: Proceedings of the 2009 ICSE Work-
    shop on cooperative and human aspects on software engineering. pp. 36–39. IEEE
    Computer Society (2009)
12. Griffin, T., Seals, S.: Github in the classroom: Not just for group projects. Journal
    of Computing Sciences in Colleges 28(4), 74–74 (2013)
13. Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning Ed. 2,
    vol. 1. Springer series in statistics Springer, Berlin (2009)
14. Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition,
    vol. 31. Springer Science & Business Media (2013)
15. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons
    (2012)
16. Bishop, C.M.: Pattern recognition. Machine Learning 128, 1–58 (2006)
17. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques.
    MIT press (2009)
18. Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press (2012)
19. Rummelhart, D.E.: Learning internal representations by error propagation. Parallel
    distributed processing (1986)
20. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Mathe-
    matics of Control, Signals, and Systems (MCSS) 2(4), 303–314 (1989)
21. Zhang, H.: The optimality of naive bayes. AA 1(2), 3 (2004)
22. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
23. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
    Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
    learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
24. Guerrero-Higueras, Á.M., Conde, M.Á., Matellán, V.: Using version control sys-
    tems to apply peer review techniques in engineering education. In: IV Congreso
    Internacional sobre Aprendizaje, Innovación y Competitividad (CINAIC) (2017)
25. Guerrero-Higueras, Á.M., DeCastro-Garcı́a, N., Matellán, V.: Detection of cyber-
    attacks to indoor real time localization systems for autonomous robots. Robotics
    and Autonomous Systems 99, 75–83 (2018)