<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Academic Performance in a Subject Using Classifier Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edwar Abril Saire Peralta</string-name>
          <email>esaire@unsa.edu.pe</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Rolando Cabrera Málaga</string-name>
          <email>gcabrerama@unsa.edu.pe</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonia Benilda Calloapaza Pari</string-name>
          <email>scalloapaza@unsa.edu.pe</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de San Agustín de Arequipa</institution>
          ,
          <addr-line>Santa Catalina 117, Arequipa</addr-line>
          <country country="PE">Perú</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The objective of this research is to determine the academic performance of students starting a Systems Engineering course. The discrete structure I course which is considered a course that is difficult to pass. The population is represented by 827 students, the research was approached from a quantitative approach, non-experimental design and at a correlational level. The methodology implemented is CRISP-DM (Cross Industry Standard Process for Data Mining) through the supervised learning technique using binary classification models based on random forest algorithms, xgboost and support vector machines. The results have allowed predicting if a student will pass or fail the course. The classification models that have shown the best results are based on random forest and xgboots algorithms with an accuracy of 82.5%.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Academic performance</kwd>
        <kwd>Data Mining</kwd>
        <kwd>Classification Algorithms</kwd>
        <kwd>Supervised Learning</kwd>
        <kwd>Data Mining</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Academic performance has been a concern in university education for many years. The biggest
challenge has always been to provide quality education, which means to improve academic
performance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One of the consequences of low academic performance of students is failing
courses, and one of the ways to solve the problem is to analyze the academic background of the
most influential data of students entering the university by data mining.
      </p>
      <p>The transition from high school to college can be a difficult transition for entering students. In
Engineering of Systems of the National University of San Agustin de Arequipa - Peru, the course
of discrete structures I has shown according to statistics provided by the same career, that on
average 50% of entrants have failed the course in their first enrollment, being considered a course
that shows difficulty to be approved. It is necessary to explore and analyze with what tendency
of academic performance students enter the university.</p>
      <p>
        Data mining allows to explore, analyze and find patterns in the data obtaining useful
information with the objective of understanding the performance and the environment where the
student performs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The results of finding patterns of behavior through the application of data
mining, allows decision making to solve problems in educational settings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Both [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] point out that academic performance is influenced by a set of internal and
external factors of the student, where the final result of the performance obtains a quantitative
value, reflected in the status of the courses with labels of passed and failed. Furthermore [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
indicate that the numerical average obtained in a course is the most accurate signal of a student's
academic achievement.
      </p>
      <p>
        Most of the research on academic performance has been approached using supervised
classification algorithms, where they point out that models can be built which can learn from
experiences (historically recorded experiences), and the more experiences the model improves
in its predictive learning [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] point out that supervised learning allows finding trends based
on behavioral patterns, which have been obtained from large amounts of data.
      </p>
      <p>The present research aims to predict the status of the discrete structure course I (pass or fail),
applying the CRISP-DM methodology through supervised classification algorithms. The final
result of the research will allow to identify in advance if a student passes or fails the course before
the beginning of the semester. Identifying in advance who will fail the course will allow alerting
teachers and educational authorities to take tutoring actions to improve academic performance.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have developed an investigation to predict students' academic performance based on
academic, demographic and sociodemographic data. The algorithms used are decision tree,
KNearest-Neighbor, support vector machines and naive bayes implemented with the Python
programming language. The research approach is quantitative and has worked with 4738
students of Industrial Engineering and Electronic Engineering. The data were collected from
2008 to 2018, with 324 variables for each student. In view of so many variables, the variables that
have the most influence on academic performance have been selected. Finally, the algorithm that
showed the best results was KNN with an accuracy ranging from 78.5% to 80%. The reduction
techniques of the most influential variables in the academic performance is a determinant work
in the prediction. A strength of this work is the number of attributes and records available.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] developed a model to predict public college dropout in COVID-19 time. Their goal was to
determine the most efficient Machine Learning algorithm that could classify students based on
historical data from 2018 to 2021. They applied the algorithms to a population of 652 students
with 106 variables with a descriptive type of research. In the end it was obtained as a result that
the K-Nearest-Neighbor algorithm found better results with an accuracy of 91% having as inputs
data related to the academic and socioeconomic aspect. With the results found, it was concluded
that the model is useful to predict early, in the first semesters, who are the possible university
students that could drop out. The dropout diagnosis shows early warnings for the university, so
that it can support these students with tutoring or other academic programs in favor of the
students. Reducing the dimension of 106 variables to the most significant ones indicates that the
model works with the most influential variables, which is a decisive contribution to the model
presented in the context presented.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] investigated the main predictor variables that influence the academic performance of
students after six semesters have elapsed since they entered university. They worked with 622
students and applied twelve classification algorithms, where an ensemble was used based on the
algorithms that showed the best results in the values of their metrics, which are logistic
regression, naive Bayes and support vector machines. When applying the ensemble with optimal
cut-off point, a specificity of 0.695 and a sensitivity of 0.947 were obtained. The grade obtained
in mathematics was a determining factor and sociodemographic factors had no influence. An
important fact in this research is that many of the sociodemographic variables did not have a
strong influence on the result. The score obtained in the university entrance exam was not taken
into account, which is something that is striking, since, in other research, it represents a decisive
variable in academic performance.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] developed a model based on supervised machine learning with the purpose of predicting
whether a student passes the leveling course. They used Gradient Boosting and Logistic
Regression algorithms, where the inputs were the predictor variables grouped into demographic,
socioeconomic, family, institutional and academic performance in the application. The population
consisted of 7139 students. With the first algorithm, an accuracy of 96% was obtained in the
cross-validation and 89% for predicting new data. The logistic regression algorithm indicates that
the average grade of the first bimester, the average grade with which the student entered the
university and his geographical location of origin, among others, do affect the probability that the
student will pass the course. Meanwhile, the variables that have determined that a student fails
the course are the grade obtained when entering the university, the province of origin and the
lack of academic support or tutoring. There remains the possibility of testing other machine
learning algorithms to see their accuracy and verify which would be the most influential
attributes in determining whether or not a student passes or fails the course.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] developed models with predictive ability of student academic risk, using educational data
mining, for early detection of academic risk. In this research, sociodemographic data and the
results of university entrance exams of 415 students of computer science majors enrolled
between the years 2016 and 2019 were applied. The best classification model was based on the
LMT algorithm with an accuracy of 75.42% and a value of 0.805 for the area under the ROC curve.
The data that have shown the most influence, such as college entrance exam score, were
identified. The research began with 65 attributes and when determining the most significant
ones, 9 variables remained, since this depends on the quality of the data and the predictive power
they have in relation to the target variable.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Application of the methodology</title>
      <p>The research has had a quantitative approach and has worked with 778 students, where the
CRISP-DM data mining methodology has been applied. Figure 1 shows the outline of the model to
be applied in the research based on the data mining methodology.</p>
      <p>1.Knowledge of the</p>
      <p>Business</p>
      <p>Students database
5.Assessment
2.- Knowledge of the</p>
      <p>data
3.- Preparation of</p>
      <p>data
4.- Modeling
(Random forest, Xgboost
and Logistic regression)</p>
      <p>The data used in the research are shown in Table 1. The proposed predictive model has as
input the admission data, academic data and other data that have been calculated such as age at
high school graduation, time elapsed before entering college and age at college entrance.</p>
      <p>Below, in Figure 2 we can see the scheme proposed in the research.</p>
      <p>Task: Determine Predictors of Academic Performance
Admission data</p>
      <p>Academic Data</p>
      <p>Calculated data
Task: Determine the most influential Predictors</p>
      <p>Most decisive factors in prediction</p>
      <p>Task: Build Machine Learning models
Task: Select the most accurate Predictive Model</p>
      <p>Student classifier (pass/fail)
Random Forest</p>
      <p>XGBoost</p>
      <p>Support Vector Machines</p>
      <sec id="sec-3-1">
        <title>3.1. Understanding the business</title>
        <p>Universities have three objectives, which are teaching, research and social responsibility. Both
licensing and accreditation contribute to achieving quality education. Entering students travel a
difficult path from college to university, which must be gradual in order for them to adapt. The
task of adaptation should be considered a priority for the university, since it must know what the
student is facing in the first semesters.</p>
        <p>Knowing in advance what the academic performance of incoming students will be is uncertain.
The evaluation of academic performance is classified in this research by labeling the student as
pass or fail. The problem of the present investigation is the lack of knowledge about their possible
academic performance in the course of discrete structure I, of the systems career of the
Universidad Nacional de San Agustin de Arequipa. According to data provided by the School of
Systems, statistics show that from 2011 to 2020 there has been an average of approximately 50%
of students who failed their first enrollment in the course. In Table 1 we can see the percentage
of passed and failed students from 2011 to 2020.</p>
        <p>According to data provided by the school of systems in the discrete structures I course, there
were 137 students who dropped out because they were never able to pass the discrete structures
I course, even though they tried to pass by taking the course up to 3 times. The population for the
present research is made up of students from the graduating classes from 2011 to 2020, which
consists of a total of 778 students.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Understanding the data</title>
        <p>The data requested for the project come from two sources, the first source is related to the
admission data of the students entering the systems career and the second source is related to
the academic data of the students of the School of Systems who have enrolled. The data provided
are shown in Table 2.</p>
        <p>Table 2
Admission and academic data</p>
        <p>ADMISSION DATA
N° Attribute Description
1 Last Name and First Name Last Name and First Name of student
2 Gender Student's gender
3 Date of birth Date of birth of student
4 Place of Birth (Department) Department where the student was born
5 Place of birth (Province) Province where the student was born
6 Place of birth (District) Province where the student was born
7 Place of birth (code) Place of birth (code)
8 School Origin of high school
9 School code Code of school
10 Location of school (Department) Department where school is located
11 Location of school (Province) Province where the school is located
12 Location of school (District) District where school is located
13 Type of school Type of school
14 Year of school leaving Year of graduation from school
15 Admission mode University entrance mode
16 Score University entrance score
17 Extraordinary admission Extraordinary mode of admission
ACADEMIC DATA
N° Attribute Description
1 CUI Student code
2 Entrance code University entrance code
3 Last name and first name Last name and first name of student
4 Course Course in which the student is enrolled
5 Grade Grade achieved in the course
6 Condition Student's condition
7 #Enrollment Number of enrollment</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Data preparation</title>
        <p>From all the attributes provided by the university, those to be used in the prediction models
have been selected. We have excluded data that do not have any contribution, such as the
student's entrance code, last names and first names, among others. New attributes have also been
generated. Table 3 shows the final data that will be used to train the models.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Modeling</title>
        <p>The data flow to build the classifier predictive model are shown in Figure 4, which allows
predicting whether a student passes or fails the course, for which there are nine inputs and one
output.</p>
        <p>INPUTS</p>
        <p>Gender, place of birth, place of school, type
of school, age of graduation, time elapsed, age of</p>
        <p>entrance, modality and score.
The classification models were implemented, where the dataset was uploaded to a Google
Collaboraty repository in CSV format. If the proposed model does not show the best values
in its metrics/indicators, the alternative is to return to the initial phases iteratively, until
•
the most appropriate metric values for the model are achieved. The tasks that have been
executed in the Google Colaboraty environment with Python are the next ones:
Separate from the total columns or variables, which are the predictor variables and which
is the target variable. The X variable represents the predictor variables, while the Y
variable represents the objective variable, as shown in Figure 5.</p>
        <p>Split data for training (80%) and data for validating the model (20%). We use the Split
function of python which allows us to split the data, the value of 0.2 in the variable
test_size represents 20% for testing the model and the remainder or complement
represents 80% for training the model, as shown in Figure 7.</p>
        <p>Scaling the data, has the objective of transforming the values of the features so that they
are within a range domain. In the research it was necessary to scale the data, because
there are machine learning algorithms that have problems when finding outliers or values
that show bias. In Figure 8 we can see the results of the data scaling process
• Same conversion procedure was done for the test data, which is represented by X_test,
which is data that the model has never seen and will be used to validate the model.</p>
        <p>The following is a summary of the tests that have been performed with some classification
algorithms. After several experiments and tests it has been determined that the best result has
been achieved by using the first 8 variables shown with the mutual information technique. The
models were experimented, first with four more determinant or significant variables (college
entrance score, college entrance age, time elapsed since leaving school until entering college and
finally the age at which they left school), and because it did not show results of improvement in
the prediction, it was experimented with three variables, four, five, six, seven, eight, nine, etc. until
a line that determines the point of improvement could be found. After several attempts, it was
determined that the first eight variables showed the best predictions, reflected in the values of
the metrics.</p>
        <p>We worked several times in the search for the most suitable values for the hyperparameters,
using Bayesian Optimization, which is very similar to GridSearch. We worked and tested again
with the random forest, xgboost and support vector machines algorithms, with the eight
predictors and the hyperparameters found, we retrained the models. The source code of the
tested algorithms is shown below. Figure 9 shows the random forest algorithm.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Validation of the approach</title>
        <p>All the proposed supervised models that have been implemented in classification tasks were
evaluated, the values of the performance metrics of the models were taken into account and the
most optimal classifier model was selected. The evaluation of the obtained models will verify the
efficiency with the test data, which represents 20% of the total, in other to say, those data that
were separated and that the obtained model does not know and was not taken into account in the
training of the models. In order to validate the models presented in this research, two aspects
have been taken into account, which are cross-validation (training data) and testing with the test.
Metrics represent values to measure the efficiency of a classifier model.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.5.1. Internal validation</title>
        <p>Internal validation refers to cross-validation, which is the training that the models
underwent. The accuracy metric of the models has shown values between 0 and 1, at the
beginning the values were from 0.4 to 0.6 depending on the model. However, after many tests and
experiments we have been able to reach up to 0.82. In Table 4 we can see the test results in the
training stage.</p>
        <p>In this testing stage the results showed that random forest is the algorithm that has shown
the best results in predicting if a student passes or fails the Discrete Structures I course in his first
enrollment.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>3.5.2. External validation</title>
        <p>The external validation consisted of testing if the model works with new data, this has been
tested by entering the data that were initially separated, which corresponds to 20% of records.
The tests indicated that the algorithms that have shown the highest prediction accuracy are the
model implemented with random forest and the model implemented with xgboost with an
accuracy of 82.5% as shown in Table 5. It is important to highlight that the values of the metrics
vary, due to the fact that the training and test data are selected randomly.</p>
        <p>Based on the research developed and the results obtained, the subfield of Machine Learning
related to supervised learning has shown great advances when applied in the field of education,
not only to predict the academic performance of students, but also to predict student desertion,
student dropout, learning patterns, among others, as shown in the literature consulted.</p>
        <p>The reduction of dimensionality through the technique of mutual information and
permutation of the random forest algorithm have allowed improving the results, showing that
the most determinant variables in this context are college entrance score, age of graduation from
high school, time elapsed and age of university entrance among the admission data. The research
of [15] approached the aspect of dimensionality and determined that the most influential variable
was the age at which they began their studies, which is a result that is common to this research.
However, there are other investigations such as that of [16] which, by collecting historical data
from a public institution and applying the decision tree algorithm, has shown that the admission
score was not significant in the prediction of academic performance, since other variables to be
considered were present, such as credits approved in relation to theoretical credits that should
have been approved; these changes are due to the fact that other additional data were available,
in comparison with the present investigation, where the score was the most determining variable
in the prediction.</p>
        <p>Another research with which we can compare is that of [17], which also worked with historical
data from a public institution, and determined that the number of failed courses and the level of
education of the father were determinant. These comparisons are mentioned because it is
different to work with historical data that the educational institution has been recording without
the intention of using it in research, compared to those institutions that have planned it for
research purposes, which increases the richness of the results.</p>
        <p>In the literature it has been found that predictions are sought by classifying students as
pass/fail, dropout/non-dropout, low performance/high performance among others, using
algorithms such as neural networks, random forest, decision trees, support vector machines,
logistic regression among others, seeking the best prediction accuracy as seen in the research of
[14], [18], [19] and [20] where they have managed to obtain predictions with an average accuracy
of 80%, even with more data compared to the research presented here. The literature reviewed
regarding this line of research and educational contexts, shows predictions of binary
classification of an object or event, however, the contribution of the present research work lies in
predicting whether a student passes or fails the course of discrete structures I with few attributes
which were not intended to be collected for research purposes, in addition to having few
attributes, for which it has been necessary to filter and select the most decisive attributes to
achieve better results.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The model that achieved the highest accuracy was implemented with the random forest
algorithm with an accuracy of 82.5% when tested with the test data and achieved an accuracy of
85% when tested with the training data. The quality of the data are determinant to achieve a
higher accuracy in the predictions of the models. The present research worked with data that
were not intended for that purpose, however, they were used and positive results have been
found in the use of them based on trial and error, looking for the most influential attributes and
also looking for the best values for the hyperparameters of the algorithms. Another important
aspect in the training of the models is the small amount of records, which is determinant for an
optimal training of the models, which could bring as a consequence low accuracies in the models.
It is important to have balanced data for model training, since this way we will avoid developing
biased models and achieve reliable model predictions. Finally, the most important conclusion that
has been deduced is that the quality of the attributes related to the subject to be predicted is
determinant for the success of the classification models.
educativos. Revista de Educación a Distancia (RED), 21(66), 1–36.
https://doi.org/10.6018/red.463561
[14] P. Chapman, C. Julian, K. Randy, K. Thomas, R. Thomas &amp; S. C. Wirth (1999). CRISP-DM 1.0:</p>
      <p>Step-by-step data mining guide.
[15] A. Reinoso Quijo (2023). Desarrollo de un modelo para predecir el rendimiento académico
de estudiantes de la EPN en base a su nivel de acceso a TICS y factores socioeconómicos.
[tesis maestria, Escuela Politecnica Nacional, Quito].
http://bibdigital.epn.edu.ec/handle/15000/23615
[16] N. Bedregal-Alpaca, D. Tupacyupanqui-Jaén &amp; V. Cornejo-Aparicio (2020b). Analysis of the
academic performance of systems engineering students, desertion possibilities and
proposals for retention. Ingeniare, 28(4), 668–683.
https://doi.org/10.4067/S071833052020000400668
[17] L. Quiñones &amp; Y. L. Carrasco (2020). Rendimiento académico empleando minería de datos.</p>
      <p>Espacios, 41(44), 277–285. https://doi.org/10.48082/espacios-a20v41n44p17
[18] P. Mejía Zamora (2023). Modelo matemático para predecir el grado de deserción de los
estudiantes en el Instituto Superior Tecnológico Bolívar [[tesis de maestria, Universidad
Técnica de Ambato, Ecuador]]. In Repositorio Institucional de la Universidad Técnica de
Ambato. https://repositorio.uta.edu.ec/bitstream/123456789/37204/1/t2153mma.pdf
[19] A. J. Camargo (2020). Modelo Para La Predicción De La Deserción De Estudiantes De</p>
      <p>Pregrado, Basado En Técnicas De Minería De Datos. Universidad de la Costa - Pregrado.
[20] H. E. C Gismondi (2021a). Modelo predictivo basado en machine learning como soporte para
el seguimiento académico del estudiante universitario. tesis doctor, Universidad Nacional del
Santa, Perú. https://hdl.handle.net/20.500.14278/3804</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          &amp; A.
          <string-name>
            <surname>Chadha</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>An Empirical Study of the Applications of Data Mining Techniques in Higher Education</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>80</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ramaswami</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>A Study on Feature Selection Techniques in Educational Data Mining</article-title>
          .
          <source>International Working Group On Educational Data Mining</source>
          , Vol.
          <volume>1</volume>
          , Issue 1.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Heiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Baker</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>K. Yacef</surname>
          </string-name>
          (
          <year>2006</year>
          ).
          <source>Proceedings of Educational Data Mining workshop. 8th International Conference on Intelligent Tutoring Systems.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pérez-Luño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Ramón</given-names>
            <surname>Jerónimo</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>J. Sánchez</surname>
            <given-names>Vázquez</given-names>
          </string-name>
          , “
          <article-title>Análisis exploratorio de las variables que condicionan el rendimiento académico ” Sevilla</article-title>
          , España: Universidad Pablo de Olavide,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. Vélez Van &amp; N.</given-names>
            <surname>Roa</surname>
          </string-name>
          , “
          <article-title>Factors associated with academic performance in medical students” PSIC</article-title>
          . Educación Médica,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Eva</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mercedes</surname>
          </string-name>
          , “
          <article-title>Rendimiento académico en la transición secundariauniversidad</article-title>
          ” Revista de Educación,
          <year>2003</year>
          . http://hdl.handle.
          <source>net/11162/67356</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bell</surname>
          </string-name>
          , “
          <article-title>Machine learning: hands-on for developers and technical professionals,” (</article-title>
          <string-name>
            <surname>J. W.</surname>
          </string-name>
          &amp; Sons. (Ed.); Second Edi),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J D.</given-names>
            <surname>Kelleher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Mac</given-names>
            <surname>Namee</surname>
          </string-name>
          and
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>D'arcy, “Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies,” (M. Press</article-title>
          . (Ed.)),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Contreras Bravo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nieves-Pimiento &amp; K. Gonzalez-Guerrero</surname>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Predicción del rendimiento académico universitario mediante mecanismos de aprendizaje automático y métodos supervisados</article-title>
          .
          <source>Ingeniería</source>
          ,
          <volume>1</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          . https://doi.org/https://doi.org/10.14483/23448393.19514
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. Valero</given-names>
            <surname>Cajahuanca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Navarro</given-names>
            <surname>Raymundo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Larios</given-names>
            <surname>Franco</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>J. Julca Flores</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Deserción universitaria: Evaluación de diferentes algoritmos de Machine Learning para su predicción</article-title>
          . Revista de Ciencias Sociales,
          <volume>28</volume>
          (
          <issue>3</issue>
          ),
          <fpage>362</fpage>
          -
          <lpage>375</lpage>
          . https://doi.org/10.31876/rcs.v28i3.
          <fpage>38480</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. E. Gamboa</given-names>
            <surname>Unsihuay</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>J. W. Salinas Flores</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Predicción de la situación académica en alumnos se pregrado usando algoritmos de Machine Learning</article-title>
          .
          <source>Perfiles</source>
          ,
          <volume>1</volume>
          (
          <issue>27</issue>
          ),
          <fpage>4</fpage>
          -
          <lpage>10</lpage>
          . https://doi.org/10.47187/perf.v1i27.
          <fpage>142</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Calva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Flores</surname>
          </string-name>
          &amp; H.
          <string-name>
            <surname>Porras</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Modelo de predicción del rendimiento académico para el curso de nivelación de la Escuela Politécnica Nacional a partir de un modelo de aprendizaje supervisado</article-title>
          .
          <source>Latin American Journal of Computing, VIII(1)</source>
          . https://doi.org/10.5281/zenodo.5770905
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ayala</surname>
          </string-name>
          <string-name>
            <surname>Franco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E. López</given-names>
            <surname>Martínez</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>V. H. Menéndez Domínguez</surname>
          </string-name>
          (
          <year>2021</year>
          ). Modelos predictivos de riesgo académico en carreras de computación con minería de datos
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>