-

Model for Evaluating Student Performance Through Their Interaction With Version Control Systems

Angel Manuel Guerrero-Higueras

Vicente Matellan-Olivera

vicente.matellan@fcsc.es 2

Gonzalo Esteban Costales

Camino Fernandez-Llamas

Francisco Jesus Rodr guez-Sedano

Miguel Angel Conde

1 0 Research Institute on Applied Sciences in Cybersecurity (RIASC). Universidad de Leon , Av. de los Jesuitas s/n. ES-24008 Leon , Spain 1 Robotics Group. Universidad de Leon , Av. de los Jesuitas s/n. ES-24008 Leon , Spain 2 Supercomputacion de Castilla y Leon (SCAYLE) , Campus de Vegazana s/n, ES.24071 Leon , Spain

2018

Version Control Systems are commonly used for Information and communication technology professionals. They also allows to follow the activity of a single programmer working in a project. For these reasons, Version Control Systems are also used by educational institutions. The aim of this work is to demonstrate that the student performance may be evaluated, and even predicted, by monitoring their interaction with a Version Control System. In order to do so we have build a Machine Learnings model to predict student results in a speci c task of the Ampliacion de Sistemas Operativos subject from the second course of the degree in Computer Science of the University of Leon through their interaction with a Git repository.

Version Control System Machine Learning Learning analytics

The emergence of the Information and Communication Technologies have change the landscape of the teaching and learning processes. Teachers can employ a lot of tools in their classes with the aim to improve students learning. In addition students can use di erent application to learn in their education center and beyond it. However, Is it possible to say if a tool is improving student performance? If we can assert this, it would be possible to use the tool that better ts with speci c lessons or students. There are several studies regarding to this, and this issue is specially link to trends such as Learning Analytics and Educational Data Mining.

The most accepted de nition of learning analytics considers that it comprises \the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs" [ 1 ]. Learning analytics facilitates discovery of \hidden" knowledge about teaching and learning processes (see [ 2,3 ]). Therefore, the use of learning analytics allows learners and instructors to obtain and visualize information about di erent issues and between them the suitability of contents and/or tools and their impact in students' performance [ 4 ]. Educational institutions and instructors could use the information obtained by applying these techniques to make changes in the courses in order to improve the whole learning process and experience [ 5 ].

In this case the idea is to explore how students' performance is a ected by the use of Version Control Systems (VCSs). VCSs facilitate the management of changes in the components of a software product or its con guration[ 6 ]. The version, release or edition is the state of this product in a speci c moment. But why to use such tools? This is because it is a high demanded tool for future computer science engineers and it is introduced as a tool of several Computer Science Subjects.

The aim of this work is to build a model that allows to predict student results at a practical assignment by monitoring their use of a VCS. We assume the premise that the students activity with this type of systems is an indicator of the evolution of their progress.

The rest of the paper is organized as follows: Section 2 describes the empirical evaluation of the classi cation algorithms presenting the experimental environment, materials, and methods used. Section 3 summarizes the results of the evaluation. The discussion of the results is developed in Section 4. Section 5 presents the conclusions and future lines of research. 2

Materials and Methods This section describes all the elements and the methodology used to build and evaluate the model for predicting student results. Among the elements used there are a speci c practical assignment to provide student results, and a VCS. Regarding the methodology, a set of classifying algorithms have been evaluated by analysing some well-known Key Performance Indicators (KPIs). 2.1

Practical assigment: ASSOOFS

The Ampliacion de Sistemas Operativos (ASSOO) subject from the second course of the degree in Computer Science of the University of Leon broadens knowledge about operating systems. In particular, it addresses the internal functioning of storage management, both volatile (memory management) and nonvolatile ( le management). Issues related to security in operating systems are also addressed.

Main practical assignment consists on implementing an inode-based le system called Ampliacion de Sistemas Operativos File System (ASSOOFS). According to the proposed speci cation, this le system must work on computers that run the Linux operating system. Therefore, students have to implement a module for the Linux kernel [ 7 ] that supports, at least, the following operations: mounting of devices formatted with this system; creation, reading and writing of regular les; creation of new directories and the visualization of the content of existing directories.

This is an individual assignment and each student is encouraged to use a VCS during the completion of the task. 2.2

GitHub Classrrom

In software engineering, it is known as control of versions to the management of the changes that are made on the elements of some product [ 6 ]. It is called version, revision or edition, to the state of the product at a given time.

Version management can be done manually, although it is advisable to use some tool to facilitate this task. These tools are known as VCSs [ 8 ]. Among the most popular there are the following: CVS, Subversion [ 9 ] or Git [ 10 ].

A VCS must provide, at least, the following features: { Storage for the di erent elements to be managed (source code, images, documentation). { Edition the stored elements (creation, deletion, modi cation, renaming, etc.). { Registration and labelling of all actions carried out, of so that they allow an element to be returned to a state previous.

For the development of ASSOOFS, students are encouraged to use a Git repository. Git follows a distributed scheme, and contrary to other systems that follow the client-server models, each copy of the repository includes the story complete of all the changes made [ 11 ].

In order to provide some organizing capabilities and private repositories for students the GitHub Classroom platform was used [ 12 ]. GitHub is a web-based hosting service for software development projects that utilize the Git revision control system. In addition, GitHub Classroom allows to assign tasks to students, or groups of students, framed in the same centralized organization: ASSOO students in our case.

Features Regarding the input data to predict results, usually called features in a Machine Learning (ML) context, we have considered the following information coming from students activity on their repositories: { Commits : total number of commit operations carried by the student. { #Days with commit operations : total number of days where there is at least one commit operation.

{ Commits/date: average number of commit operations per date. { Additions: number of lines of code added during the assignment completion. { Deletions: number of lines deleted during the assignment completion.

In addition to the above data, all obtained from the GitHub Classroom platform, we have also considered the students grade on a proof carried out to control the authorship of the code in student repositories. This authorship proof allows to verify that the students really worked in the content of their repository. The authorship proof has two possible results: \1", if the student passed the proof; \0" otherwise.

Input data explained above will be used by the model to predict a class: AP, for those students who will nish the practical assignment successfully; and SS, for those who not. 2.3

Model

We want to generate a model whose inputs are quantitative, while its output is a discrete value: AP, and SS. Two types of ML algorithms may be used: classi ers and predictors, whereby considering the rst ones will be better. We have evaluated the following well-known methods that we think are the more promising ones: Adaptive Boosting (AB), Classi cation And Regression Tree (CART), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Multi-Layer Perceptron (MLP), Naive Bayes (NB), and Random Forest (RF).

AB Ensemble methods are techniques that combine di erent basic classi ers turning a weak learner into a more accurate method. Boosting is one of the most successful types of ensemble methods, and AB one of the most popular boosting algorithms.

CART A decision tree is a method which predicts the label associated with an instance by travelling from a root node of a tree to a leaf [ 13 ]. It is a nonparametric method in which the trees are grown in an iterative, top-down process.

KNN Although nearest neighbours is the foundation of many other learning methods, notably unsupervised, supervised neighbour-based learning is also available to classify data with discrete labels. It is a non-parametric technique which classi es new observations based on the distance to observation in the training set. A good presentation of the analysis is given in [ 14 ] and [ 15 ]. LDA Parametric method that assumes that distributions of the data are multivariate Gaussian [ 15 ]. Also, LDA assumes knowledge of population parameters. In another case, the maximum likelihood estimator can be used. LDA uses Bayesian approaches to select the category which maximizes the conditional probability (see [ 16 ], [ 17 ] or [ 18 ]).

LR Linear methods are intended for regressions in which the target value is expected to be a linear combination of the input variables. LR, despite its name, is a linear model for classi cation rather than regression. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. MLP An arti cial neural network is a model inspired by the structure of the brain. Neural networks are used when the type of relationship between inputs and outputs is not known. It is supposed that the network is organized in layers (input layer, output layer and hidden layers). An MLP consists of multiple layers of nodes in a directed graph so that each layer is fully connected to the next one. An MLP is a modi cation of the standard linear perceptron and, the best characteristic is that it is able to distinguish data which is not linearly separable. An MLP uses back-propagation for training the network, see [ 19 ] and [ 20 ].

NB This method is based on applying Bayes' theorem with the \naive" assumption of independence between every pair of features, see [ 15 ] and [ 21 ]. RF Classi er consisting of a collection of decision trees, in which each tree is constructed by applying an algorithm to the training set and an additional random vector that is sampled via boostrap re-sampling [ 22 ].

To evaluate the previous methods, the implementation of the Scikit-learn library has been used [ 23 ]. 2.4

Methodology

In order to train the models we have use the data obtained by the ASSOO students from de 2016{2017 course presented at [ 24 ]. These data includes the features mentioned at section 2.2 for the 46 students who tried the ASSOOFS assignment. We carried out 2 kind of analysis: in the rst one we do not include the authorship proof as an input feature; in the second one, we do.

To evaluate the above algorithms with there input data, we have followed the method proposed at [ 25 ] to select the model which better ts our problem. The method proposes a 10-iteration cross-validation analysis for selecting the most suitable learning algorithm. Moreover, the accuracy classi cation score has been used to evaluate the performance of the models. The accuracy classi cation score is computed as shown at equation 1, where P Tp is the number of true positives, and P Tn is the number of true negatives.

accuracy =

P Tp + P Tn

P total data

The three models with the highest accuracy classi cation score have been pre-selected for in-depth evaluation by considering the following KPIs: Precision (P ), Recall (R), and F1-score; all of which were obtained through the confusion matrix.

The Precision (P ) is computed as shown at equation 2, where P Fp is the number of false positives.

P =

P Tp

P Tp + P Fp

The Recall (R) is computed at equation 3, where P Fn is the number of false negatives. (1) (2)

R =

P Tp

P Tp + P Fn According to the above results, as shown at Table 1{left, RF classi er works better (accuracy score = 0.8) than any other for selected features, in this case: Commits, #days with commit operations, commits/date, additions, and deletions. CART classi er works slightly worse (accuracy score = 0.7) than RF, while all the other classi ers o er very poor results.

Once the best models are pre-selected, a deeper analysis with the confusion matrix of each one is given. Another important item that should be analysed is the sensitivity of the model for detecting a passed assignment (AP): i.e., the rate of APs that the model classi es incorrectly. Table 1{right and Fig 1, show that the RF classi er gets better average values for Precision (P ), Recall (R) and F1-score than CART and LR. This work aim to build a model to predict students results by monitoring their activity at VCSs. We start from the premise that analysing the students activity at VCSs allows to predict their results.

To build the model several classi ers have been evaluated. In addition to select the best classi er, we have demonstrated that our premise is true due to the fact that we can predict the students results with a success high percentage. However, the models were evaluated using a small dataset. It would be desirable to get a larger volume of data to perform the analysis.

Regarding the chosen features, we observe that in addition to consider the repository activity, adding an authoring proof helps to increase the accuracy.

Future work will be related to the tuning the hyper-parameters of models in order to obtain better results. In addition, we need to increase de training dataset.

1. Siemens , G. , Long , P. : Penetrating the fog: Analytics in learning and education . EDUCAUSE review 46(5) , 30 ( 2011 )

2. Hernandez-Garc

, A. , Gonzalez-Gonzalez , I. , Jimenez-Zarco , A.I. , ChaparroPelaez, J.: Applying social learning analytics to message boards in online distance learning: A case study . Computers in Human Behavior 47 , 68 { 80 ( 2015 )

3. Agudo-Peregrina , A.F. , Iglesias-Pradas , S. , Conde-Gonzalez , M.A. , HernandezGarc a, A.: Can we predict success from log data in vles? classi cation of interactions for learning analytics and their relation with performance in vle-supported f2f and online learning . Computers in human behavior 31 , 542 { 550 ( 2014 )

4. Conde , M.A. , Hernandez-Garc

, A. , Oliveira , A. : Endless horizons?: addressing current concerns about learning analytics . In: Proceedings of the 3rd International Conference on Technological Ecosystems for Enhancing Multiculturality . pp. 259 { 262 . ACM ( 2015 )

5. Conde , M.A. , Hernandez-Garc

, A.: Learning analytics for educational decision making . Computers in Human Behavior (47) , 1 { 3 ( 2015 )

6. Fischer , M. , Pinzger , M. , Gall , H.: Populating a release history database from version control and bug tracking systems . In: Software Maintenance , 2003 . ICSM 2003 . Proceedings. International Conference on. pp. 23 { 32 . IEEE ( 2003 )

7. Corbet , J. , Rubini , A. , Kroah-Hartman , G. : Linux Device Drivers: Where the Kernel Meets the Hardware. " O'Reilly Media , Inc." ( 2005 )

8. Spinellis , D. : Version control systems . IEEE Software 22(5) , 108 { 109 ( 2005 )

9. Pilato , C.M. , Collins-Sussman , B. , Fitzpatrick , B.W. : Version Control with Subversion: Next Generation Open Source Version Control. " O'Reilly Media , Inc." ( 2008 )

10. Torvalds , L. , Hamano , J.: Git: Fast version control system . http://git-scm. com ( 2010 )

11. De Alwis , B. , Sillito , J.: Why are software projects moving from centralized to decentralized version control systems? In: Proceedings of the 2009 ICSE Workshop on cooperative and human aspects on software engineering . pp. 36 { 39 . IEEE Computer Society ( 2009 )

12. Gri n , T., Seals , S.: Github in the classroom: Not just for group projects . Journal of Computing Sciences in Colleges 28 ( 4 ), 74 { 74 ( 2013 )

13. Friedman , J. , Hastie , T. , Tibshirani , R. : The elements of statistical learning Ed. 2 , vol. 1 . Springer series in statistics Springer, Berlin ( 2009 )

14. Devroye , L. , Gyor , L., Lugosi , G.: A probabilistic theory of pattern recognition , vol. 31 . Springer Science & Business Media ( 2013 )

15. Duda , R.O. , Hart , P.E. , Stork , D.G. : Pattern classi cation . John Wiley & Sons ( 2012 )

16. Bishop , C.M. : Pattern recognition . Machine Learning 128 , 1{ 58 ( 2006 )

17. Koller , D. , Friedman , N.: Probabilistic graphical models: principles and techniques . MIT press ( 2009 )

18. Murphy , K.P.: Machine learning: a probabilistic perspective . MIT press ( 2012 )

19. Rummelhart , D.E. : Learning internal representations by error propagation. Parallel distributed processing ( 1986 )

20. Cybenko , G.: Approximation by superpositions of a sigmoidal function . Mathematics of Control, Signals, and Systems (MCSS) 2 ( 4 ), 303 { 314 ( 1989 )

21. Zhang , H.: The optimality of naive bayes . AA 1 ( 2 ), 3 ( 2004 )

22. Breiman , L. : Random forests . Machine learning 45(1) , 5 { 32 ( 2001 )

23. Pedregosa , F. , Varoquaux , G. , Gramfort , A. , Michel , V. , Thirion , B. , Grisel , O. , Blondel , M. , Prettenhofer , P. , Weiss , R. , Dubourg , V. , Vanderplas , J. , Passos , A. , Cournapeau , D. , Brucher , M. , Perrot , M. , Duchesnay , E.: Scikit-learn: Machine learning in Python . Journal of Machine Learning Research 12 , 2825 { 2830 ( 2011 )

24. Guerrero-Higueras , A.M. , Conde , M.A. , Matellan , V. : Using version control systems to apply peer review techniques in engineering education . In: IV Congreso Internacional sobre Aprendizaje, Innovacion y Competitividad (CINAIC) ( 2017 )

25. Guerrero-Higueras , A.M. , DeCastro-Garc

, N., Matellan , V. : Detection of cyberattacks to indoor real time localization systems for autonomous robots . Robotics and Autonomous Systems 99 , 75 { 83 ( 2018 )