<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving generalizability of predictive models through course-related variables</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Manuel Moreno-Marcos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro J. Muñoz-Merino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Delgado Kloos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Telematic Engineering, Universidad Carlos III de Madrid</institution>
          ,
          <addr-line>28911 Leganés (Madrid), Spain, ROR: 03ths8210</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Students' dropout and academic failure are two of the main challenges in educational contexts. Researchers have made significant eforts to develop predictive models to detect students at risk. However, one of the main limitations is that these models are trained with data from one course but they do not work well when they are used in a diferent course (sometimes even in another edition of the same course) due to the impact of the course context. In this direction, this work aims to analyze how generalizability of the models could be improved by using global models that contain data about many courses and whether or not it is possible to enhance the models by using course-related variables that could capture information about the context. In order to that, data from 16 Small Private Online Courses (SPOCs) are used to develop the models to predict dropout and students' success. Results show that while it is possible to achieve accurate predictions at global level when training using several courses, these models do not properly fit all individual courses. Particularly, there is a drop in Area Under the Curve (AUC) higher than 0.1 in 17-40% of the courses, depending on the variable to predict. Moreover, it is possible to enhance the predictive models (up to 0.08 in AUC) by adding course-related variables that capture the main features of the course context. Among these variables, the most relevant ones are the average length of videos, and the number of videos and exercises in the course. These results add new insights about the variables that should be used in the models to improve the generalizability, which is crucial for real implementations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Analytics</kwd>
        <kwd>Generalizability</kwd>
        <kwd>Retention</kwd>
        <kwd>Blended learning</kwd>
        <kwd>Higher Education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Prediction is one of the main research areas in learning analytics because there is a high interest in
detecting students who will drop out the course or fail [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this direction, many works have been
carried out in diferent contexts at both degree-level (i.e., predict who will drop out the degree [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ])
and course-level (i.e., predict who will drop out the course [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]). However, one important limitation
of these works is that the models are usually trained using data from one course and they may lose
accuracy when using the models in other contexts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In order to analyze this problem, several researchers have tried to train models using data from
diferent populations and for example, Ocumpaugh et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] experienced dificulties when generalizing
afect detectors in diferent populations. In addition, other works have analyzed how models can be
transferred to the subsequent editions of the course. In this line, and Veeramachaneini [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] reported a
drop of 0.1 or more in AUC when using the dropout predictive models trained with data from previous
edition of the course. Similarly, Moreno-Marcos et al. [8] analyzed generalizability using data from two
SPOCs to prepare the university entrance exams in two cohorts. They found that models worked well
when transferring them to the other SPOC with the same students, and they were still acceptable when
transferring them to the same course in a subsequent edition. However, they faced dificulties when
modifying both the students and the course.
      </p>
      <p>Given the relevance of the context, other authors, such as Bote-Lorenzo and Gómez-Sánchez [9]
proposed in-situ models, which are trained with previous data in the same course (e.g., use data up to
chapter i-1 to analyze what happens in chapter i). In addition, some have tried to quantify the impact of
the generalization and there have been even special issues with a focus on this problem [10]. Among
the works in that line, some works have shown that specific predictors [ 11] or regularization techniques
could improve generalizability [12], and that the sample could also have an impact so that generalization
to future editions may be better when training with a small number of students at risk [13].</p>
      <p>Nevertheless, further work is needed around this line as there is not consensus about the how to
improve generalization and the best ways to achieve it [14], and this problem is still considered a
key limitation of current learning analytics models [15]. In this direction, one possibility to improve
generalization is by creating global models that capture the information of many courses and including
course-related variables that add information about the course context beyond the students’ information.
With this idea, the objective of this paper is to analyze (1) the impact of global models that include data
from many courses to capture the variability of contexts and (2) the impact of course-related variables
to discover whether or not they could serve to mitigate the generalizability issue.</p>
      <p>The remainder of the paper is as follows. Section 2 details the methodology of the paper, including a
description of the data, the list of variables, and the analytical methods. Section 3 presents the results
obtained in this paper in relation to the objectives. Finally, Section 4 provides the conclusions of this
paper, as well as the main limitations and future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>The study was carried out using data from 16 SPOCs ofered by a Spanish university and hosted in a
local Open edX instance. These courses serve to support face-to-face courses (ofered in a synchronous
way) although there are three ways the SPOCs could be used: (1) SPOCs needed to pass the course,
with a weight in the final grade; (2) SPOCs that are part of the course and they could be combined with
lfipeed classroom, although they are not part of the summative evaluation; and (3) SPOCs that are only
used as support materials. However, this information is unknown.</p>
      <p>Considering these scenarios, the prediction goal is to forecast student success and dropout. Regarding
success, a student is considered successful when the average grade in the SPOC activities is higher of
equal than 5 out of 10 (non attempted activities count as 0). Regarding dropout, two possible definitions
are considered: (1) dropout related to activity, which means that students dropout when they they do
not interact for two consecutive weeks (excluding weeks where less than 10% of students interact), and
(2) dropout related to completion, which means that students dropout when they do not complete 75%
of the activities at least.</p>
      <p>In order to carry out these models, several variables are used. The full list of variables is presented in
Table 1. For the implementation of the models, Random Forest is used (as from previous studies, e.g.,
[16, 17] , this is one of the most successful methods for prediction in similar contexts), and models are
evaluated using the Area Under the Curve (AUC), since this is a well-known metric and it is generally
appropriate for classification problems involving students behaviors [18].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>A first analysis was conducted to evaluate the potential of models trained with many courses and how
these models work in specific courses. Particularly, models were trained with 15 courses and evaluated
with the remaining one using data from week 8 out of 16. Results of the 16 trained models with each
set of 15 courses are presented in Table 2. When conducting this analysis, it was observed that the
dropout rate or failure was 100% in some courses, probably because these SPOC were designed as a
support material and the expected engagement was diferent. This occurs in one course considering the
dropout definition related to activity, in five courses with the dropout definition related to completion,</p>
      <sec id="sec-3-1">
        <title>It indicates the percentage o videos the student has opened over the total</title>
      </sec>
      <sec id="sec-3-2">
        <title>It indicates the percentage of visualized content in videos considering the total duration of all videos and without counting repetitions of the same segments of the video</title>
      </sec>
      <sec id="sec-3-3">
        <title>It indicates the percentage of videos the student has fully watched over the total</title>
      </sec>
      <sec id="sec-3-4">
        <title>It is the relationship between the total time invested watching videos and the total time of diferent content that the student has watched</title>
      </sec>
      <sec id="sec-3-5">
        <title>It indicates the average number of times the students pause the video considering the opened videos</title>
      </sec>
      <sec id="sec-3-6">
        <title>Variables about interactions with exercises</title>
      </sec>
      <sec id="sec-3-7">
        <title>It indicates the percentage o exercises the student has attempted</title>
      </sec>
      <sec id="sec-3-8">
        <title>It is the average number of attempts in the exercises the student has attempted</title>
      </sec>
      <sec id="sec-3-9">
        <title>It is the average grade of the student considering all the attempts in only the attempted exercises</title>
      </sec>
      <sec id="sec-3-10">
        <title>It is the average grade of the student considering the first attempt in only the attempted exercises</title>
      </sec>
      <sec id="sec-3-11">
        <title>It indicates the percentage of exercises the student has solved correctly (100%) over the exercises they have attempted</title>
      </sec>
      <sec id="sec-3-12">
        <title>It indicates the percentage of exercises the student has solved correctly (100%) at first attempt over the exercises they have attempted</title>
      </sec>
      <sec id="sec-3-13">
        <title>It indicates the longest streak of consecutive exercises with a score of 100% (correct exercises)</title>
      </sec>
      <sec id="sec-3-14">
        <title>Variables about platform use</title>
      </sec>
      <sec id="sec-3-15">
        <title>It indicates the maximum number of consecutive days the student has accessed to the platform to do course activities</title>
      </sec>
      <sec id="sec-3-16">
        <title>It indicates the percentage of days the student has accessed to the platform</title>
      </sec>
      <sec id="sec-3-17">
        <title>Average number of consecutive days the student accesses to the platform. For example, if the student acceses 3 consecutive days, and later 5 consecutive days, the value is (3+5)/2=4</title>
      </sec>
      <sec id="sec-3-18">
        <title>Variables related to the course characteristics</title>
      </sec>
      <sec id="sec-3-19">
        <title>It indicates the thematic area of the course, including (1) Humanities, (2) Social</title>
      </sec>
      <sec id="sec-3-20">
        <title>Sciences, (3) Natural Sciences, (4) Formal Sciences, and (5) Professions and Applied sciences</title>
      </sec>
      <sec id="sec-3-21">
        <title>It indicates whether the course is instructor-paced or self-paced (synchronous or asynchronous)</title>
      </sec>
      <sec id="sec-3-22">
        <title>It indicates the total number of exercises available in the course</title>
      </sec>
      <sec id="sec-3-23">
        <title>It indicates the total number of videos available in the course</title>
      </sec>
      <sec id="sec-3-24">
        <title>It indicates the average number of seconds of the videos in the course</title>
      </sec>
      <sec id="sec-3-25">
        <title>It indicates the total number of staf (mainly instructors) involved in the course</title>
      </sec>
      <sec id="sec-3-26">
        <title>It indicates whether the course is delivered in English (1) or Spanish (0)</title>
        <p>and four courses considering success. Given that it is not possible to compute AUC for those cases as
there is a single category, they appear with a dash.</p>
        <p>Results show that despite obtaining an accurate AUC at global level in general, the AUC for each
individual course may vary. When analyzing the training set of 15 courses, results are very similar in
all cases, but when testing specific courses, higher diferences are observed. For the first case of dropout
related to inactivity, the global AUC obtained with the 15 courses with cross-validation was between
0.76-0.79, and there was a drop higher than 0.1 in AUC in 40% of the individual courses (6 courses).</p>
      </sec>
      <sec id="sec-3-27">
        <title>Curso</title>
        <p>Moreover, there are 3 courses (20%) where there is an increase in AUC higher than 0.1.</p>
        <p>When analyzing the dropout with the alternative definition, the global AUC was between 0.92-0.94
and a drop in AUC higher than 0.1 was observed in only 18% of the courses (two courses). Among
those two courses, the AUC was below 0.8 in one course, which suggests that global models ofer a
high performance in most of the cases. Similarly, a global AUC between 0.93-0.95 was observed when
predicting students’ success and only 17% of the courses (two courses) experienced a drop in AUC
higher than 0.1. Thus, the global models trained with several courses may be useful in most of the cases
and this could be a valid approach when having a large set of courses. Nevertheless, there might be
courses where this approach does not work, and more specific models should be developed, which is
consistent with the literature, which suggests that one-size-fits-all solutions are not possible in learning
analytics models [19].</p>
        <p>A second analysis was conducted to analyze whether or not course-related variables could improve
global models. For this analysis, global models were computed with and without course-related variables
throughout the course. Thus, models without course-related variables include the other three categories
in Table 1 (videos, exercises and platform use) related to the students’ interactions. In addition, a model
with just corse-related variables was implemented to analyze the predictive power of these variables by
themselves (course model). Results of these models are presented in Figure 1.</p>
        <p>From this figure, it can be seen that there was an average improvement with course-related variables
between 0.03-0.04 in AUC for all the three dependent variables, and this value went up to 0.07-0.08 at the
beginning of the course, when the diference was higher. This may entail that course-related variables
contain meaningful information that could improve predictive models. In addition, it is observed that
the AUC of only course-related variables is fair even when there is not information about the studets,
which reinforce the importance of the course context.</p>
        <p>In order to delve into the variables, the importance of variables was computed using the Mean
Decrease of Gini [20]. For the case of dropout related to activity, the variable that stood out was the
percentage of attempted exercises, followed by the average number of repetitions per video and the
percentage of days that the student accesses to the platform. Regarding the course-related variables, they
were not the most important ones although their relative importance was similar in several variables.
Particularly, the most important ones were the average duration of videos and the number of exercises.
For the case of dropout related to completion, the most relevant variables were the percentage of
● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●
● ● ● ● ● ● ● ● ● ● ● ● ●
4</p>
        <p>8
Week
12
16
4 8 12</p>
        <p>Week
Variables ● Student ● Course ● All
16
4</p>
        <p>8
Week
12
16
attempted exercises, followed by the percentage of correct exercises at first attempt and the percentage
of visualized videos. It is noteworthy that the fourth most significant variables was the average length
of videos, which is a course-related variables. Regarding the other course related variables, their relative
importance was smaller although the number of exercises had higher values of importance. As for
success prediction, the percentage of attempted exercises was the most relevant variable, which is
reasonable as engaging with exercises is crucial to complete the course and be successful. After this
variable, the most relevant one is the percentage of correct questions at first attempt and the average
duration of videos. The latter variable reflects how relevant the course design might be for student
success and the fact that an inappropriate duration may lead to dropout or failure. In this case, the
number of exercises and videos appear in positions four and six, which also highlight how crucial a
good course design is. In summary, variables related to students’ interactions such the percentage
of attempted exercises are the most relevant ones, although the course-related variables also have a
significant impact in the models.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This study has analyzed the impact of global models to improve generalization and the use of
courserelated variables. Results have shown the potential of the models trained with many courses as they
generalized in most of the cases. Particularly, for the dropout related to activity, there was a drop in
AUC higher than 0.1 in 40% of the courses, when analyzing dropout related to completion, that drop
occurred in 18% of the courses, and when analyzing success, the drop occurred in 17% of the courses. For
the latter two cases, the global AUC was above 0.9, which means that global models work very well in
more than 80% of the courses. For the case of dropout related to inactivity, results were worse although
global models were not accurate enough. Further research should be done in that case as inactivity
might be high dependent on the context (the expected activity of the MOOC may vary depending on
the methodology).</p>
      <p>When analyzing the impact of course-related variables, results showed that they can improve the
predictive models, with an average improvement of 0.03-0.04 in AUC and up to 0.07-0.08 at the beginning
of the course. Moreover, the predictive power of these variables by themselves was not very strong, but
managed to achieve fair predictive results. Finally, the analysis of variable importance showed that the
percentage of attempted exercises was the variable that stood out, and the average duration of videos
and the number of exercises and videos were the most relevant course-related variables.</p>
      <p>Despite the aforementioned findings, there are some limitations that are worth mentioning. One key
limitation was the lack of information of the course methodology. This had a significant efect on the
dropout related to inactivity as the expected activity was unknown. In addition, these analyses were
done based only on the SPOC, but the summative assessments of the course and final grades are not
covered. Thus, students who are considered as dropouts may not engage with the SPOC but still pass
the course. Furthermore, the way dropout is defined may also have an efect on the results, and the
sample could also be influential. While several courses are analyzed, more courses would be needed to
improve the generalizability of the findings.</p>
      <p>As future work, it would be interesting to gather more information about the course methodology to
create a new category of variables related to the methodology and analyze whether or not they could
even improve the performance of global models. Moreover, it would be relevant to further analyze
the generalizability of the predictive models with more contexts and analyze the specific reasons why
models do not fit well in some cases. Finally, it would be relevant to put these models into practice in
active courses and analyze the possible interventions and their impact to improve academic success
and reduce dropout.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by Universidad Carlos III de Madrid (UC3M) through the Grants for the
Research Activity of Young Doctors of the UC3M’s Own Research and Transfer Program
(ASESORIA project). Moreover, it was supported by FEDER / Ministerio de Ciencia, Innovación y
Universidades - Agencia Estatal de Investigación through the grant PID2023-146692OB-C31 (GENIE Learn
project) funded by MICIU/AEI/10.13039/501100011033 and by ERDF/UE, by the UNESCO Chair of
“Scalable Digital Education for All” at UC3M and by the grant RED2022-134284-T funded by
MICIU/AEI/10.13039/501100011033.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[8] P. M. Moreno-Marcos, T. De Laet, P. J. Muñoz-Merino, C. Van Soom, T. Broos, K. Verbert, C.
Delgado Kloos, Generalizing predictive models of admission test success based on online interactions,
Sustainability 11 (2019) 4940.
[9] M. L. Bote Lorenzo, E. Gómez Sánchez, et al., An approach to build in situ models for the prediction
of the decrease of academic engagement indicators in massive open online courses (2018).
[10] C. Romero, S. Ventura, Guest editorial: Special issue on early prediction and supporting of learning
performance, IEEE Transactions on Learning Technologies 12 (2019) 145–147.
[11] J.-L. Hung, B. E. Shelton, J. Yang, X. Du, Improving predictive modeling for at-risk student
identification: A multistage approach, IEEE Transactions on Learning Technologies 12 (2019)
148–157.
[12] D. M. Olive, D. Q. Huynh, M. Reynolds, M. Dougiamas, D. Wiese, A quest for a one-size-fits-all
neural network: early prediction of students at risk in online courses, IEEE Transactions on
Learning Technologies 12 (2019) 171–183.
[13] N. Gitinabard, Y. Xu, S. Heckman, T. Barnes, C. F. Lynch, How widely can prediction models
be generalized? performance prediction in blended courses, IEEE Transactions on Learning
Technologies 12 (2019) 184–197.
[14] D. Zhidkikh, V. Heilala, C. Van Petegem, P. Dawyndt, M. Jarvinen, S. Viitanen, B. De Wever,
B. Mesuere, V. Lappalainen, L. Kettunen, et al., Reproducing predictive learning analytics in cs1:
Toward generalizable and explainable models for enhancing student retention., Journal of Learning
Analytics 11 (2024) 132–150.
[15] N. Sghir, A. Adadi, M. Lahmer, Recent advances in predictive learning analytics: A decade
systematic review (2012–2022), Education and information technologies 28 (2023) 8299–8333.
[16] P. M. Moreno-Marcos, P. J. Muñoz-Merino, C. Alario-Hoyos, C. Delgado Kloos, Re-defining,
analyzing and predicting persistence using student events in online learning, Applied Sciences 10
(2020) 1722.
[17] A. Jiménez-Macías, P. J. Muñoz-Merino, P. M. Moreno-Marcos, C. Delgado Kloos, Evaluation of
traditional machine learning algorithms for featuring educational exercises, Applied Intelligence
55 (2025) 1–25.
[18] R. Pelánek, Metrics for evaluation of student models., Journal of Educational Data Mining 7 (2015)
1–19.
[19] D. Gašević, S. Dawson, T. Rogers, D. Gasevic, Learning analytics should not promote one size
ifts all: The efects of instructional conditions in predicting academic success, The Internet and
Higher Education 28 (2016) 68–84.
[20] G. Louppe, L. Wehenkel, A. Sutera, P. Geurts, Understanding variable importances in forests of
randomized trees, Advances in neural information processing systems 26 (2013).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Moreno-Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alario-Hoyos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Muñoz-Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delgado Kloos</surname>
          </string-name>
          ,
          <article-title>Prediction in MOOCs: A review and future research directions</article-title>
          ,
          <source>IEEE transactions on Learning Technologies</source>
          <volume>12</volume>
          (
          <year>2018</year>
          )
          <fpage>384</fpage>
          -
          <lpage>401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Olivares Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Moreno Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Scheihing</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. Muñoz</given-names>
            <surname>Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delgado Kloos</surname>
          </string-name>
          ,
          <article-title>An actionable learning path-based model to predict and describe academic dropout (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Jiménez Macías</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Moreno Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. Muñoz</given-names>
            <surname>Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Ortiz</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Delgado Kloos, Analyzing feature importance for a predictive undergraduate student dropout model (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <article-title>A systematic review for MOOC dropout prediction from the perspective of machine learning</article-title>
          ,
          <source>Interactive Learning Environments</source>
          <volume>32</volume>
          (
          <year>2024</year>
          )
          <fpage>1642</fpage>
          -
          <lpage>1655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Hybrid neural network model for MOOC dropout prediction</article-title>
          ,
          <source>Complex System Modeling and Simulation</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ocumpaugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gowda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hefernan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hefernan</surname>
          </string-name>
          ,
          <article-title>Population validity for educational data mining models: A case study in afect detection</article-title>
          ,
          <source>British Journal of Educational Technology</source>
          <volume>45</volume>
          (
          <year>2014</year>
          )
          <fpage>487</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Boyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Veeramachaneni</surname>
          </string-name>
          ,
          <article-title>Transfer learning for predictive models in massive open online courses</article-title>
          ,
          <source>in: International conference on artificial intelligence in education</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>