<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>E. Gaudioso and L. Talavera, “Data mining to support tutoring in virtual learning communities:
experiences and challenges.”
W. Hämäläinen, T. H. Laine, and E. Sutinen, “Data Mining in Personalizing Distance Education
Courses.”
Bhise RB, Thorat SS, and Supekar AK, “'Importance of Data Mining in Higher Education
System,'” IOSR J. Humanit. Soc. Sci. (IOSR-JHSS</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/3-540</article-id>
      <title-group>
        <article-title>Using descriptive and predictive learning analytics to understand student behavior at LMS Moodle</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dijana Oreški</string-name>
          <email>dijana.oreski@foi.hr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Zagreb, Faculty of Organization and Informatics</institution>
          ,
          <addr-line>Pavlinska 2, 42000 Varaždin</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>6</volume>
      <issue>6</issue>
      <fpage>18</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Learning analytics is a data-centric field that applies machine learning algorithms in the educational domain to analyze e-Learning environment data. This study employs descriptive and predictive learning analytics approaches in order to develop descriptive and predictive models of student behavior and success. Cluster analysis, unsupervised machine learning algorithm, and decision tree, supervised machine learning algorithm, are applied on the data from a Learning Management System (LMS) Moodle. Research results indicated: (i) groups of students with similar patterns in behavior at LMS, (ii) student activities at LMS that lead to successful course completion. Such results serve as guidelines for teachers when developing courses and students when enrolling in the course. Descriptive and predictive learning analytics is an innovative approach in education that can enhance teachers and students and improve learning outcomes. Learning analytics, educational data mining, LMS data, machine learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>2022 Copyright for this paper by its authors.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Students` success prediction was the subject of numerous research papers so far. Recent work in the
educational scientific community strives to exploit the potential of learning management system data
to develop accurate and reliable prediction models. Hereinafter, we provide an overview of existing
research papers close to our approach.</p>
      <p>In the recently published study of Feldman-Maggor et. al. [5] focus was on the characterization of
students based on their learning patterns and the authors strived to identify indicators for students'
success prediction in an online environment. On the data from undergraduate online chemistry courses,
logistic regression and a decision tree algorithm were applied. Assignment submission and the students'
video viewing are the most significant predictors. The authors emphasized the importance of students’
choices they make regarding their learning process.</p>
      <p>Gašević et. al. [6] presume that instructional conditions influence the prediction of academic success.
They performed research in nine undergraduate courses offered in a blended learning model. The study
illustrates the differences in predictive power and significant predictors between course-specific models
and generalized predictive models. The results suggest that it is imperative for learning analytics
research to account for the diverse ways technology is adopted and applied in different courses and in
different domains. The authors conclude that differences in how students use the learning management
system require further research examinations.</p>
      <p>Costa et. al. [7] evaluated various educational data mining techniques for the prediction of students’
failure in programming courses. Authors strived to investigate the effectiveness of these techniques in
the prediction of students failing to take actions that decrease the failure rate. They evaluated four
techniques (Naive Bayes, decision tree, neural network and Support Vector Machines) on two data sets
from programming courses at Brazilian Public University. One dataset is from distance education and
the second one is from on-campus.</p>
      <p>Cerezo et. al. [8] examined students' learning process patterns using Moodle logs data. The authors
grouped students according to similar behaviors regarding effort and time spent working. Different
patterns of students' behavior at the LMS were clustered and the relationships between patterns and
students' grades were analyzed. 140 undergraduate students enrolled in Moodle 2.0 course were
included in the research. Results indicated variables predicting students` results. Their results could
serve as a basis for the improvement of students' achievement in LMS.</p>
      <p>Conijin et al. [9] analyzed blended courses in one institution using LMS Moodle. The authors
predicted student achievement from LMS variables using multi-level and standard regressions. Their
results showed that predictors vary significantly across different courses. The generalization of such
prediction models is low.</p>
      <p>Macfadyen et al. [10] used LMS tracking data to explore which student online activities could predict
academic success. Their analysis from a Blackboard Vista identified variables correlated with student
grades. Regression was applied in data analysis resulting in a predictive model for this course which
identified variables such as a number of messages posted and number of completed assessments as
variables explaining the most variation in student grades. The logistic regression approach showed an
accuracy of 81%. These results are useful for the extraction and visualization of LMS data on student
engagement and the likelihood of success.</p>
      <p>Matcha et. al. investigated students` learning strategies [11]. Among others, clustering was used to
detect and interpret learning tactics and strategies. Recently, Saqr and Lopez-Penas [12] examined
online engagement by applying clustering to reveal the clusters of students’ learning strategies and
engagement patterns in the courses.</p>
      <p>Based upon previous research papers, hereinafter, we are combining descriptive and predictive
analysis of LMS Moodle log data from one course taught through two generations of information
technology students.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Research methodology</title>
      <sec id="sec-3-1">
        <title>The goals of the research are:</title>
        <p>(i) to identify the most important predictors of students’ success among LMS Moodle logs
data,
(ii) to group students of similar LMS behavior patterns,
(iii) to identify the relationship between students’ clusters and student’s success
To achieve the goals of this research, we address the following research questions:
(i) Which of the variables extracted from the LMS Moodle logs have the highest impact on
the student’s performance?
(ii) Can we create good student clusters based on their usage of the LMS?
(iii) Is there any correlation between students’ clusters and students’ success?
3.1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data description</title>
      <p>Data are collected from the course Knowledge discovery in data taught at the University of Zagreb,
Faculty of Organization and Informatics. Dataset consists of two generations of students. The course
was taught as an elective at the undergraduate study level. It was implemented as a blended course:
lectures and lab exercises were held in the classroom combined with LMS Moodle usage. Data were
extracted from Moodle for two generations of students, thus creating a sample of 83 students. Extracted
variables considered a number of logs at specific resource and activity: File, Forum, Student Report,
Folder, Choice, File submission, Overview report, Page, System, Test, and Assignment. Overall grade
at the course was included in data analysis as a dependent variable.</p>
      <p>3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Machine learning algorithms</title>
      <p>Two machine learning algorithms were applied in data analysis: unsupervised machine learning
algorithm cluster analysis and supervised machine learning algorithm decision tree. Clustering is an
unsupervised machine learning algorithm used for grouping objects into clusters of similar objects [13].
In the e-learning context, clustering has been used for: finding clusters of students with similar learning
characteristics [14]; discovering patterns reflecting user behaviors [15], or grouping students according
to their characteristics to give them personalized learning approaches [16]. K-means clustering
algorithm is applied because it is the mostly used clustering algorithm [17].</p>
      <p>A decision tree is a supervised machine learning algorithm belonging to the information-based group
of algorithms. Those algorithms develop predictive models by determining the most informative
attributes for the prediction of a given task. In the e-learning context, classification and prediction have
been used for: predicting students` success in the course [18], predicting students’ performance and
their final grade [19] as well as predicting the students’ achievement along with discovering the
relevance of the attributes [20].</p>
      <p>For the prediction of student success based on the LMS activity, in this research supervised machine
learning algorithm of the decision tree is used. Decision tree approach is simple to understand and
interpret the results. Furthermore, previous research papers shown decision tree superiority when
comparing with other approaches [21]. Finally, we study a model to analyze the influence of the LMS
variables on the student’s final grade in the course. Students` clusters are built within the same dataset,
and finally, we investigate clusters and their correlation with students’ final grades.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Research results</title>
      <p>Data analysis consists of two parts. First, descriptive models were developed followed by predictive
models. Table 1 reports CCC value (Cubic Clustering Criterion) for three different groups: 2, 3 and 4.
Three clusters are optimal for a given dataset since CCC value is the lowest.</p>
      <p>Three groups of students are identified. Cluster 1 consists of 42 students, cluster 2 of 4 students, and
cluster 3 of 37 students. Characteristics of each group are given in tables 2 and 3 as mean values for the
groups.</p>
      <p>Cluster 1 consists of students with the lowest overall activity at Moodle. Average view values for
all resources and activities except files are the lowest. Cluster 2 consists of students with the highest
overall activity at Moodle. Average view values for all resources and activities except student reports
and view reports are the highest. Student report and view report are the highest for students in cluster
3, whereas other values are between clusters 1 and 2.</p>
      <p>Following descriptive models, predictive models were developed for clusters 1 and 3. A decision
tree algorithm is applied for predictive models’ development. Evaluation of two models is given in table
4 in terms of RSquare values (metric for model reliability) and RASE (metric for model error).</p>
      <p>Both models indicate good quality results. Results interpretability assumes conducting sensitivity
analysis. Figures 1 and 2 show sensitivity analysis results for models of cluster 1 (figure 1) and cluster
3 (figure 2).</p>
      <p>Assignments and student report views are the most important predictors of grade for cluster 1.
Choice and test logs are the most important predictors of grade for cluster 3.</p>
      <p>Choice</p>
      <p>Test</p>
      <p>Page</p>
      <p>Folder</p>
      <p>Variable</p>
      <p>We can see that there are differences in student success predictors among groups of students with
different levels of activities at LMS. That’s why it is justified to combine descriptive and predictive
learning analytics approaches and hybrid approaches to provide complete information.</p>
      <p>One of the advantages of decision tree application is that decision tree results can be presented in
the form of rules. Prediction rules for cluster 1 are given in table 5 and prediction rules for cluster 3 in
table 6.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>Earlier research papers have demonstrated that higher education institutions could use the predictive
power of LMS data in combination with machine learning algorithms to develop models’ tools that
identify successful students and at-risk students and allow interventions. In this paper, we have proved
that a combination of unsupervised and supervised machine learning algorithms on LMS data results in
useful models for explaining and predicting students’ behavior at LMS.</p>
      <p>To achieve the goals of this research, we have answered the following research questions:
(i) Which of the variables extracted from the LMS Moodle logs have the highest impact on
the student's performance?
Assignments and student report views have the highest impact on grades for students with
lower LMS activity. Choice and test logs have the highest impact on grades for students
with higher LMS activity.
(ii)
(iii)</p>
      <sec id="sec-7-1">
        <title>Can we create good student clusters based on their usage of the LMS? Based on the CCC value, we can conclude that good student clusters are created.</title>
      </sec>
      <sec id="sec-7-2">
        <title>Is there any correlation between students’ clusters and students’ success?</title>
        <p>There is no correlation between students’ clusters and students’ success.</p>
        <p>Research results contribute to the personalization of learning and teaching approach, especially for
online environment. In future research, we will upgrade the study with the following aspects. First,
more courses will be included in the analysis, and students of different study programs. Secondly,
different machine learning algorithms will be applied to the LMS data and compared to see if are there
differences in performance between different algorithms.
6. References</p>
        <p>C. Romero and S. Ventura, “Data mining in education,” Wiley Interdiscip. Rev. Data Min.
Knowl. Discov., vol. 3, no. 1, pp. 12–27, Jan. 2013, doi: 10.1002/WIDM.1075.</p>
        <p>L. Mainon, Oded; Rokach, Data mining and knowledge discovery handbook. .</p>
        <p>B. Alexander and B. Alexander, “Web 2.0: A New Wave of Innovation for Teaching and
Learning?,” Educ. Rev., vol. 41, no. 2, pp. 33–34, 2006.</p>
        <p>N. Cavus and A. M. Momani, “Computer aided evaluation of learning management systems,”
Procedia - Soc. Behav. Sci., vol. 1, no. 1, pp. 426–430, Jan. 2009, doi:
10.1016/J.SBSPRO.2009.01.076.</p>
        <p>Y. Feldman-Maggor, R. Blonder, and I. Tuvi-Arad, “Let them choose: Optional assignments
and online learning patterns as predictors of success in online general chemistry courses,”
Internet High. Educ., vol. 55, p. 100867, Oct. 2022, doi: 10.1016/J.IHEDUC.2022.100867.
D. Gašević, S. Dawson, T. Rogers, and D. Gasevic, “Learning analytics should not promote one
size fits all: The effects of instructional conditions in predicting academic success,” Internet
High. Educ., vol. 28, pp. 68–84, Jan. 2016, doi: 10.1016/J.IHEDUC.2015.10.002.</p>
        <p>E. B. Costa, B. Fonseca, M. A. Santana, F. F. de Araújo, and J. Rego, “Evaluating the
effectiveness of educational data mining techniques for early prediction of students’ academic
failure in introductory programming courses,” Comput. Human Behav., vol. 73, pp. 247–256,
Aug. 2017, doi: 10.1016/J.CHB.2017.01.047.</p>
        <p>R. Cerezo, M. Sánchez-Santillán, M. P. Paule-Ruiz, and J. C. Núñez, “Students’ LMS
interaction patterns and their relationship with achievement: A case study in higher education,”
Comput. Educ., vol. 96, pp. 42–54, May 2016, doi: 10.1016/J.COMPEDU.2016.02.006.
R. Conijn, C. Snijders, A. Kleingeld, and U. Matzat, “Predicting student performance from LMS
data: A comparison of 17 blended courses using moodle LMS,” IEEE Trans. Learn. Technol.,
vol. 10, no. 1, pp. 17–29, Jan. 2017, doi: 10.1109/TLT.2016.2616312.</p>
        <p>L. P. Macfadyen and S. Dawson, “Mining LMS data to develop an ‘early warning system’ for
educators: A proof of concept,” Comput. Educ., vol. 54, no. 2, pp. 588–599, 2010, doi:
10.1016/j.compedu.2009.09.008.</p>
        <p>W. Matcha et al., “Analytics of learning strategies: Associations with academic performance
and feedback,” ACM Int. Conf. Proceeding Ser., pp. 461–470, Mar. 2019, doi:
10.1145/3303772.3303787.</p>
        <p>M. Saqr and S. López-Pernas, “The longitudinal trajectories of online engagement over a full
program,” Comput. Educ., vol. 175, p. 104325, Dec. 2021, doi:
10.1016/J.COMPEDU.2021.104325.</p>
        <p>A. K. Jain, M. N. Murty, and P. J. Flynn, “Data Clustering: A Review,” 2000.</p>
        <p>T. Tang, T. Tang, and G. McCalla, “Smart Recommendation for an Evolving E-Learning
System: Architecture and...,” Int. J. E-Learning, vol. 4, no. 1, pp. 105–129, 2005.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>