<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning in Finalizing Grades of Students' Performance in Distance Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marat Nurtas</string-name>
          <email>m.nurtas@iitu.edu.kz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Temirlan Otepov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aizhan Altaibek</string-name>
          <email>a.altaibek@iitu.edu.kz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kateryna Kolesnikova</string-name>
          <email>kkolesnikova@iitu.edu.kz</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Borodkin</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Information Technology University</institution>
          ,
          <addr-line>Manas St. 34/1, Almaty, 050040</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, Machine learning (ML) in education is one of the less investigated areas of Data Science. However, the power of using ML in this is almost unlimited. As an example, ML in educational technology can be used for grading or testing students, improving student retention, and predicting student performance. During the pandemic, most students made the transition to distance learning, resulting in a substantial increase in the grades of select students during this period. However, as the pandemic situation resolved, these students reverted to their prior average grades. The first aim of this research is to demonstrate the predictive capacity of machine learning in forecasting students' final grades; the second aim is to examine the correlation between students' performance during the online learning phase and their final grades, while the third aim is to compare this correlation with that observed during offline periods.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Machine learning</kwd>
        <kwd>linear regression</kwd>
        <kwd>educational technology</kwd>
        <kwd>statistical learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the realm of education, the integration of ML stands as a promising frontier within the
expansive landscape of Data Science. Although this domain remains relatively underexplored, its
potential to revolutionize educational practices is virtually boundless [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. ML, when harnessed in
educational technology, unveils a diverse array of applications, ranging from the automated
grading of assessments to the enhancement of student retention strategies and the precise
prediction of student academic performance [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ].
      </p>
      <p>
        Between 2007 and 2014, only a limited number of publications, approximately 4-5, focused on
the theme of machine learning in education. However, from 2015 to 2017, there was a noticeable
surge in interest, with as many as 20 works being published in 2017[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This suggests a growing
enthusiasm for the application of machine learning in the educational sector during that time
period. The increase in publications reflects the expanding recognition of the potential
advantages that machine learning can bring to education, including personalized learning,
adaptive assessment, and data-driven insights for educators and students.
      </p>
      <p>
        The central focus of this paper lies in harnessing the formidable capabilities of ML to predict
the final grades of students accurately. As the world grappled with the unforeseen challenges
posed by the COVID-19 pandemic, the educational landscape underwent a seismic shift. The
widespread adoption of distance learning became a necessity, catalyzing a profound
transformation in the way students engaged with their academic pursuits [
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ].
      </p>
      <p>One intriguing phenomenon that emerged during this period of remote learning was the
dramatic increase in the grades of certain students. As classrooms transcended physical
boundaries and traditional assessment methods, select students experienced a notable surge in
their academic performance [7]. However, as the pandemic's grip on the world gradually
0000-0003-4351-0185 (M. Nurtas); 0000-0001-8431-7950 (A. Altaibek); 0000-0002-9160-5982 (K. Kolesnikova)
© 2023 Copyright for this paper by its authors.</p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
loosened, an equally remarkable reversion occurred, as these high-achieving students reverted to
their prior average grades [8].</p>
      <p>The main challenges in combating cheating during distance learning encompass obtaining
assessment answers in advance, unfair retaking of assessments, and unauthorized assistance
during assessments, necessitating measures such as proctoring systems, plagiarism detection,
and promoting academic integrity to address these issues [9].</p>
      <p>Considering these developments, the primary objective of this research endeavors to delve
deep into the dynamics of student academic performance during the online learning phase.
Specifically, the aim of the project is to unravel the underlying correlation between students' final
grades and their performance during the unique and transformative period of online education.</p>
      <p>The significance of this inquiry extends far beyond its immediate scope. It represents a critical
exploration at the intersection of technology, education, and the human learning experience. By
scrutinizing the academic trajectories of students during a time of unparalleled change, we seek
to unearth valuable insights that can inform educational strategies, interventions, and policies for
a post-pandemic world [10].</p>
    </sec>
    <sec id="sec-2">
      <title>2. The problem statement</title>
      <p>In the rapidly evolving landscape of education, especially in the context of the COVID-19
pandemic, it has become imperative to examine the true impact of online learning on a student's
final academic achievement [11]. This study endeavors to address the following critical question:
To what extent do grades obtained during online learning phases influence a student's ultimate
final grade, when juxtaposed with the grades achieved during traditional offline periods?</p>
      <p>The emergence of online education has reshaped the educational experience, with students
adapting to digital classrooms during certain quarters of their academic journey [12]. Considering
this, this research project seeks to delve deep into the intricate relationship between online and
offline grades and their role in predicting a student's final academic performance.</p>
      <p>Addressing this research problem involves utilizing the tool of Linear Regression, with its
formula defined as follows:
= Σ  ⋅ 
 ,</p>
      <p>represents the grades garnered by students in these specific educational phases.
where   denotes the efficiency coefficient for grades during online and offline quarters, and</p>
      <p>As per the formula, the sum of the coefficients should indeed equal 1, which is a fundamental
property in linear functions to maintain proportionality. However, it's important to note that the
constant coefficient (often referred to as the intercept) represents the baseline or starting point
of the function and can be critical in understanding the behavior of the function. In linear
regression or linear models, this constant term provides valuable information about the
relationship between the variables [13].</p>
      <p>In the context of comparing online and offline education, if the constant coefficient has a
significant value, it can signify the difference in the starting points or base levels of the two types
of education. For instance, it might indicate that even when all other coefficients are equal, there's
a baseline difference in outcomes between online and offline education.</p>
      <p>So, the constant coefficient can matter in understanding the relationship between coefficients
in the context of online and offline education, especially if it shows how the two methods differ in
their starting points or baseline performance.</p>
      <p>In essence, this constraint helps maintain the linearity of the model, ensuring that the
predicted values fall within an interpretable and meaningful range while accurately representing
the relationship between the predictor variables and the outcome.</p>
      <p>This inquiry serves not only to enhance our understanding of the dynamics between online
and offline learning but also to offer valuable insights to educational institutions and
policymakers regarding the efficacy of online education and its enduring influence on a student's
academic journey. Ultimately, the findings of this research will provide a comprehensive
understanding of the complex interplay between online and offline grades and their genuine
impact on a student's final academic success.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and research</title>
      <p>Considering the changing educational landscape, particularly during the COVID-19 pandemic, the
research project explored the application of ML in the field of education, an area within Data
Science that has received relatively less attention [14]. The primary objective was to harness the
potential of ML, with a specific emphasis on the Linear Regression method, to predict the final
grades of high school students.</p>
      <p>Linear regression analysis is employed to forecast the value of one variable based on another
variable's value. The variable being predicted is referred to as the dependent variable, while the
variable used to make the prediction is known as the independent variable [15].</p>
      <p>The central goal of the research was to create and employ ML tools capable of accurately
projecting the students' final grades. The chosen approach involved the use of the Linear
Regression method, a powerful statistical technique commonly applied in predictive modeling
[16]. By employing this method, the intention was to construct predictive models that could
estimate the students' forthcoming final grades based on various pertinent factors, including their
performance during the online learning phase, historical academic data, and potentially other
variables.</p>
      <p>The rationale behind this endeavor was to offer educational institutions and stakeholders
valuable insights into the potential of ML for enhancing the educational experience. Specifically,
the research sought to investigate whether the students' performance during the unique online
learning phase had a lasting impact on their overall academic achievement, including their final
grades.</p>
      <p>Subsequently, upon the creation of these predictive models, a comparison was made between
the projected final grades and the students' actual grades following examinations. This
comparison enabled an evaluation of the efficacy of the ML approach in accurately forecasting
academic outcomes.</p>
      <p>The research project represents an examination of the intersection between data science and
education, showcasing how advanced analytical techniques like ML can contribute to a deeper
comprehension of student performance, particularly within rapidly evolving educational
contexts. By shedding light on the connection between online learning experiences and final
grades, the aim is to provide valuable insights that can inform educational strategies and
interventions, benefiting students in Kazakhstan and beyond.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Data collection and analysis</title>
      <p>The study involved a group of 140 students in Kazakhstan who progressed from the 10th to the
12th grade. The statement that a dataset with 10 columns (features) should ideally have at least
100 rows for optimal results is a general guideline rather than a strict rule [17]. The relationship
between the number of features and the number of observations in a dataset can depend on
various factors, including the complexity of the data, the nature of the features, and the machine
learning algorithm being used. It is noteworthy that this student cohort experienced a unique
educational scenario during their 11th-grade year, and the 3rd and 4th quarters of their 10th-grade
year, characterized by a shift to online learning due to the pandemic.</p>
      <p>№
1
2
3
4
5
6
7
8
9
10</p>
      <p>Name
student 1
student 2
student 3
student 4
student 5
student 6
student 7
student 8
student 9
student 10
54
91
90
99
67
70
74
93
89
90
49
92
78
86
71
40
76
89
56
70
89
89
93
98
91
94
93
99
93
83
10(4)
100
100
98
100
100
100
90
100
100
90
Grades of first 10 students during 10th grade 1,2,3,4 quarters</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and future works</title>
      <p>the previously mentioned formula:
In Figure 1, we can observe how linear regression works with one variable. If we halt to examine
the efficiency coefficients for grades during online and offline quarters, represented as   , within
= Σ  ⋅ 


in our case, there are 12 variables, representing four quarters of 10th grade, four quarters of 11th
grade, and four quarters of 12th grade. This results in a 13-dimensional space to fully represent
the data, which is challenging to visualize graphically. However, it is possible to calculate data
metrics and perform analysis to grasp the connections and patterns within this complex dataset,
all without the necessity of generating a visual depiction.</p>
      <p>The model was trained using the Python programming language, specifically with the
TensorFlow library developed by Google. TensorFlow is a widely used open-source machine
learning framework that is particularly known for its capabilities in building and training deep
neural networks.</p>
      <p>In Figure 2, the depicted content is the presented code. This code likely corresponds to a
specific section of the research or document, and its inclusion in the figure serves to visually
illustrate or provide reference to a particular code snippet or algorithm discussed in the text.</p>
      <p>Here, two variables are printed: efficiency coefficients and bias. Efficiency coefficients, as
denoted previously as   are the values that need to be determined and are now being printed or
displayed. It should be clarified that there were 5 quarters of online education and 7 quarters of
offline education, resulting in 5 efficiency coefficients for online education and 7 efficiency
coefficients for offline education.</p>
      <p>Bias in the context of machine learning is often characterized as a systematic error stemming
from incorrect assumptions made during the model's training process. More technically, bias can
be defined as the discrepancy between the average predictions made by the model and the actual
ground truth values. This discrepancy indicates the presence of systematic inaccuracies in the
model's output, which can lead to deviations from the true values it is intended to predict.
Addressing bias is a critical aspect of improving the overall performance and accuracy of machine
learning models [19].</p>
      <p>The predicted grades of students are shown in Table 2, and subsequently predicted grades are
translated into letter grades. Grade ranges that correspond to each letter grade:
• A: 90-100;
• B: 80-89;
• C: 70-79;
• D: 60-69;
• E: Below 60.
where:
•   - real grades of students;
•    - predicted grades of students;
•  - average real grades of students [20].</p>
      <p>Following the calculation, it was determined that the RRSE value approached approximately
0.11, and the average bias was equal to 0.022. This result suggests that the regression model
exhibits a favorable fit, indicating its effectiveness in predicting the desired outcomes [21].</p>
      <p>For online learning quarters, the average efficiency coefficient is 0.072, whereas for offline
learning, it stands at 0.16. This signifies that, when comparing these two coefficients, grades
obtained during online education exhibit an approximately 2.2 times lower impact than those
acquired during offline education.</p>
      <p>
        In future research endeavors, the plan is to delve deeper into the realm of ML, specifically
focusing on the powerful classification method, to uncover novel insights and perspectives [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
This approach will enable a generation of distinct sets of results, further enriching the
understanding of the educational landscape.
      </p>
      <p>Moreover, forthcoming investigations will introduce an additional layer of complexity by
considering the teacher's level of expertise as a crucial influencing factor. Within the school
context from which the data was collected, teachers span a spectrum from novice to seasoned
professionals, with six distinct levels of expertise. This intricate parameter is known to exert a
profound influence on students' academic performance, and as such, it warrants comprehensive
exploration.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In conclusion, the analysis of the efficiency coefficients and bias in the context of machine learning
has provided valuable insights into the predictive capabilities of the regression model employed
in forecasting students' grades. The distinct sets of efficiency coefficients for online and offline
education underscore the differential impact of these learning modes on academic outcomes, with
online education exhibiting approximately 2.2 times lower influence compared to offline
education.</p>
      <p>The assessment of predicted grades, translated into letter grades, through the Linear
regression model, as presented in Table 2, further reinforces the model's effectiveness in
capturing and predicting students' academic performance. The Relative Root Squared Error
(RRSE) calculation, yielding a value of approximately 0.11, along with an average bias of 0.022,
signifies a favorable fit and accuracy of the model in predicting the desired outcomes.</p>
      <p>Looking ahead, future research endeavors are poised to delve deeper into the realm of machine
learning, particularly focusing on powerful classification methods, to uncover novel insights and
perspectives within the educational landscape. The inclusion of the teacher's level of expertise as
a crucial influencing factor in forthcoming investigations adds an additional layer of complexity,
acknowledging its profound impact on students' academic performance. By exploring these
nuanced aspects, future research aims to contribute further to our understanding of the
multifaceted dynamics influencing educational outcomes and inform potential improvements in
educational practices.</p>
    </sec>
    <sec id="sec-7">
      <title>7. References</title>
      <p>[7] Elzainy, A., El Sadik, A., &amp; Al Abdulmonem, W. (2020). Experience of e-learning and online
assessment during the COVID-19 pandemic at the College of Medicine, Qassim University.</p>
      <p>Journal of Taibah University Medical Sciences, 15(6), 456-462.
[8] Skar, G. B. U., Graham, S., &amp; Huebner, A. (2022). Learning loss during the COVID-19 pandemic
and the impact of emergency remote instruction on first grade students’ writing: A natural
experiment. Journal of Educational Psychology, 114(7), 1553.
[9] Rowe, N. C. (2004). Cheating in online student assessment: Beyond plagiarism. Online</p>
      <p>Journal of Distance Learning Administration, 7(2), 1-10.
[10] Rapanta, C., Botturi, L., Goodyear, P., Guàrdia, L., &amp; Koole, M. (2021). Balancing technology,
pedagogy and the new normal: Post-pandemic challenges for higher education. Postdigital
Science and Education, 3(3), 715-742.
[11] Le Thi Minh Que. (2021). Online Teaching and Learning in Higher Education During
Covid19 Pandemic: Vietnamese Students' Perspective. IUP Journal of Information Technology,
17(3), 23-48.
[12] Foo, Cc., Cheung, B., &amp; Chu, Km. (2021). A comparative study regarding distance learning and
the conventional face-to-face approach conducted problem-based learning tutorial during
the COVID-19 pandemic. BMC Medical Education, 21, 141, 2021.
doi:10.1186/s12909-02102575-1.
[13] Iwase, K. O. S. (1989). Linear regression through the origin with constant coefficient of
variation for the inverse Gaussian distribution. Communications in Statistics-Theory and
Methods, 18(10), 3587-3593.
[14] Hilbert, S., Coors, S., Kraus, E., Bischl, B., Lindl, A., Frei, M., Wild, J., Krauss, S., Goretzko, D., &amp;
Stachl, C. (2021). Machine learning for the educational sciences. Review of Education, 9,
e3310. doi:10.1002/rev3.3310.
[15] What is linear regression, 2023, URL:
https://www.ibm.com/topics/linearregression#:~:text=Resources,What%20is%20linear%20regression%3F,is%20called%20the%20independent%20varia
ble.
[16] Maulud, D., &amp; Abdulazeez, A. M., A Review on Linear Regression Comprehensive in Machine</p>
      <p>Learning. JASTT, 1(4), 140-147, 2020.
[17] Smolic H., How Much Data Is Needed For Machine Learning? 2022, URL:
https://graphitenote.com/how-much-data-is-needed-for-machinelearning#:~:text=Generally%20speaking%2C%20the%20rule%20of,100%20rows%20for
%20optimal%20results.
[18] Newton, P. M., &amp; Essex, K., How Common is Cheating in Online Exams and did it Increase
During the COVID-19 Pandemic? A Systematic Review. Journal of Academic Ethics, 2023.
doi:10.1007/s10805-023-09485-5.
[19] Wickramasinghe, S., Bias &amp; Variance in Machine Learning: Concepts &amp; Tutorials, 2021. URL:
https://www.bmc.com/blogs/bias-variance-machinelearning/#:~:text=Bias%20is%20considered%20a%20systematic,prediction%20and%20t
he%20ground%20truth.
[20] Coding Prof, 3 Ways to Calculate the Root Relative Squared Error (RRSE) in R, 2022, URL:
https://www.codingprof.com/3-ways-to-calculate-the-root-relative-squared-error-rrse-inr/.
[21] Padhma M., Models, Data Science Blogathon, 2023, URL:
https://www.analyticsvidhya.com/blog/2021/10/evaluation-metric-for-regressionmodels/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kucak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juricic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Dambic</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Machine Learning in Education - a Survey of Current Research Trends</article-title>
          . In B. Katalinic (Ed.),
          <source>Proceedings of the 29th DAAAM International Symposium</source>
          (pp.
          <fpage>0406</fpage>
          -
          <lpage>0410</lpage>
          ).
          <article-title>Published by DAAAM International</article-title>
          .
          <source>ISBN 978-3-902734-20-4, ISSN 1726-9679</source>
          , Vienna, Austria, doi: 10.2507/29th.daaam.
          <source>proceedings.059.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Stimpson</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cummings</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Assessing Intervention Timing in Computer-Based Education Using Machine Learning Algorithms</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>2</volume>
          ,
          <fpage>78</fpage>
          -
          <lpage>87</lpage>
          . doi:
          <volume>10</volume>
          .1109/access.
          <year>2014</year>
          .
          <volume>2303071</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Nafea</surname>
            ,
            <given-names>I. T.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Machine learning in educational technology</article-title>
          .
          <source>Machine learning-advanced techniques and emerging applications</source>
          ,
          <fpage>175</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Korkmaz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Correia</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>A review of research on machine learning in educational technology</article-title>
          .
          <source>Educational Media International</source>
          ,
          <volume>56</volume>
          (
          <issue>3</issue>
          ),
          <fpage>250</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Figaredo</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaurena</surname>
            ,
            <given-names>I. G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Encina</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>The impact of rapid adoption of online assessment on students' performance and perceptions: Evidence from a distance learning university</article-title>
          .
          <source>Electronic Journal of e-Learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <fpage>pp224</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bashkireva</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bashkireva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morozov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Severin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fateeva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baykova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Severina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Adaptation of students to distance learning in COVID-19 conditions in terms of ultradian rhythms of the cardiovascular system</article-title>
          .
          <source>In E3S Web of Conferences</source>
          . Vol.
          <volume>211</volume>
          , p.
          <fpage>04010</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>