<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>European Journal of Molecular &amp; Clinical Medicine</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-319-78503-5_6</article-id>
      <title-group>
        <article-title>Performance Evaluation of ML-based Classifiers for HEI Graduate Entrants</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khrystyna Zub</string-name>
          <email>khrystyna.zub@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavlo Zhezhnych</string-name>
          <email>pavlo.i.zhezhnych@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera str., 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>103</volume>
      <issue>4</issue>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>The development of intelligent decision support systems for admission to higher education institutions (HEI) is an essential task for both the institution, particularly for the selection of the best entrants, and for the entrant - to assess their chances of admission to the chosen HEI. The efficiency of such systems is largely based on the accuracy of the intelligent components underlying the system. This article investigates the effectiveness of machine learning (ML) based classifiers in solving the task of predicting the entry of entrants in the HEI. The simulation was performed using Orange software and a real data set. The task relates to binary classification in the case of an unbalanced data set. The simulation was performed by selecting the optimal operating parameters of each studied classifier and running it 100 times on a randomly generated data sample. This approach ensured the reliability of the results. Comparison of the accuracy of different classifiers was performed based on total accuracy, Fmeasure, Precision, and Recall measures. It has been experimentally established that Support Vector Machine (SVM) based classifiers demonstrated the highest accuracy in all four performance indicators among the considered methods. Receiver operating characteristic (ROC) curves in both classes also confirmed the highest accuracy of its work. This makes it possible to apply it in practice. classifiers, performance evaluation, machine learning, university, HEI, graduate admissions,</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>information systems, binary classification, imbalance dataset.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>An important point during the admission campaign, both for the educational institution and for those
interested in admission, is the choice of specialty and HEI to obtain an education and qualification level.
The effectiveness of the decision to choose a specialty made can directly affect both the activities of
HEI and the further educational and professional trajectory of a potential student. Therefore, one of the
critical factors influencing the entrant's choice is the assessment of their chances of admission.</p>
      <p>Obviously, in order to assess their chances of admission, a student needs to consider many factors.
Uncertainty of such a factor can lead to the fact that the student will eventually enter a HEI with a low
rating while deserving of a more prestigious. Given the complexity of such a set and the incomplete
awareness of entrants, it is unlikely that this can be determined independently in a proper way. In
addition to the risk of choosing the wrong specialty, there are other difficulties. Applying for admission
is, in any case, a cost of financial and time resources of the applicant. Also, admission to ranking HEI
is accompanied by high competition between entrants, which makes the decision-making process for
applying more challenging. Therefore, if we assume that the entrant has decided on a major, assessing
the chances of admission to a particular HEI is a critical task.</p>
      <p>Existing studies aimed at providing support to applicants in assessing the chances of admission,
indicate the feasibility of using ML methods to solve this task.</p>
      <p>This study [1] aims to provide entrants the probability to be admitted by the university. The gradient
boosting regressor model was deployed using the data of student’s academic performance and
university rating. The study showed effective statistical results fetched by graduate admission chance
prediction model.</p>
      <p>Applying different types of artificial intelligence (AI) algorithms, authors [2] proposed Graduate</p>
      <p>2021 Copyright for this paper by its authors.
Admissions Prediction framework. Besides that, a user interface to interact with a user to see the result
was proposed. Though from the proposed work, users are able to identify chances to get a seat without
the possibility to get a list of universities in which they can obtain admission.</p>
      <p>The authors of [3] claim that the disadvantage of existing admission prediction systems is using only
if/else methods. They emphasize the need to use ML algorithms to solve this task. This study aimed to
classify whether a student can get admission to a particular HEI. The dataset included previous years'
entered student data based on specific attributes or parameters, which profoundly affect the class,
attribute or have a high-value dependency. In addition, the purpose was to predict the number of
potential students that have ready to enroll in the current HEI. This was made to help the education
institution management work on these students interested in getting accepted/enrolled in the HEI.
Authors use classification methods/algorithms in supervised learning such as Decision Trees, Support
Vector Machines, k-Nearest Neighbors, Random Forest classifier, Naive Bayes classifier. Estimating
classifiers help authors to choose one model by measuring the accuracy of each mode.</p>
      <p>This study [4] presented a ML approach to predict the student's chances to be admitted helping them
to recognize and target the universities which are best suitable for their profile. This paper evaluates
these a few models to define the one that will give the highest accuracy rate and the least error. Authors
proposed regression strategies to predict the university rate given the students' profile; namely, Linear
Regression, Decision Tee (Tree), and Logistic Regression model. Logistic Regression model shows the
most accurate prediction.</p>
      <p>Additionally, there are commercial software solutions, which are regularly used by educational
institutions, and aims to maintain the admission process. However, just a small part reflects supporting
decision-making from the entrant's perspective. They provide wide enough functionality, but the
problem of privacy, data security, and the high price of purchase and support process become
challenging for many universities [5].</p>
      <p>We should mention that today's rapid development of information technologies, makes HEIs find
and implement the most effective technological solutions [6, 7]. Improving such technologies will
increase the efficiency of the task of supporting applicants during the admission campaign [8]. A clear
increase in the number of studies of ML methods confirms the relevance of their application in the
context of the enrollment campaign in the HE. Tthe Scopus search engine was used for analysis Search
request: TITLE ( ( enroll* OR entran* OR admiss* ) AND "Machine Learning" ) AND (
LIMITTO ( SUBJAREA , "COMP" ) ) in the number of publications in 2021 is due to the date of the search
query - September 2020. This search result confirms the interest of the scientific community in the
application ML methods in the context of the admission process.</p>
      <p>Therefore, given the relevance and effectiveness of the approaches described above to solve the
prediction task, this study focuses on the methods of ML methods. The primary purpose of this work is
to study and experimentally analyze the effectiveness of existing ML-based classifiers in resolving the
task of binary classification in the case of an unbalanced data set. The applied task is to assess the
chances of the entrant's admission to HEI.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Materials and methods</title>
      <p>This study provides performance evaluation of the different ML-based classifiers for solving the
task of prediction the possibility graduate admission. Modeling was conducted using Orange software
[9]. Orange software is an online tool for data visualization, machine learning and open-source data
analysis. It is equipped with a visual programming interface for fast high-quality data analysis and
interactive data visualization.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1.1. Dataset descriptions</title>
      <p>The task of binary classification was investigated in this work. The task was to predict whether the
entrant will enter the university or not. The evaluation of the effectiveness of different classifiers was
performed based on a real set of data [10] on admission to the US HEI. The authors collected
information on admission to graduate school at 29 United States(US) universities. The data set contains
1653 records, nine attributes [10]. Author considered the below features for dataset: English test score,
Graduate Record Examination Score, Quantitative Reasoning sections (Gre Score Quat), Verbal
Reasoning (Gre Score Verbal), Paper Published, Ranking, Undergradiation Score, Work experience.</p>
      <p>The task is to determine whether the candidate will enter the Computer Science program at the
chosen university or not (binary classification). The data set contains 574 successful entry cases and
1079 unsuccessful ones.</p>
      <p>Since the authors of this set chose the most important attributes for solving the task, the set was
cleared of omissions and anomalies [10], its previous processing in this work was only to normalize the
data, which were then processed by the studied classifiers.</p>
    </sec>
    <sec id="sec-5">
      <title>2.1.2. Modeling of the ML-based classifiers</title>
      <p>In this work, we investigated the accuracy of solving the classification problem using a number of
existing methods of machine learning, in particular:
• SVM;
• Naïve Bayes;
• Logistic regression;
• k-nearest neighbors (kNN);
• Neural Network;
• Tree;
• Stochastic gradient descent (SGD).</p>
      <p>The simulation was performed using Orange software. The block diagram of this process is
presented in Fig. 2.
According to the research methodology, the authors selected the optimal operating parameters of all
studied classifiers. To ensure the reliability of the obtained result, the procedure of forming training and
test samples in the ratio of 80% to 20% was random. In addition, each of the studied classifiers was run
100 times. Then the results were averaged and displayed on the screen.</p>
      <p>Performance evaluation was conducted using such indicators: total accuracy, F-measure, Precision
and Recall measures [11]. Visual analysis was performed using ROC-curves for each of the two classes
separately.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Results and discussion</title>
      <p>The results of all studied classifiers are summarized in Fig. 3.</p>
      <p>Naïve Bayes
Tree</p>
      <p>SGD</p>
      <p>Neural Network
Logistic Regression</p>
      <p>SVM
0,56
0,58
0,6
0,62
0,64
0,66
0,68
0,7
0,72
Recall</p>
      <p>Precision</p>
      <p>F-measure</p>
      <p>Total accuracy</p>
      <p>As can be seen fom Fig.3 the kNN method demonstrates the lowest accuracy of work on all
performance indicators. This was to be expected given that this simple non-parametric method has a
very small number of parameters that need to be used to customize it for a specific task. A number of
methods (Naïve Bayes, Tree, SGD and Neural Networ) show approximately the same results for Total
accuracy. However, SGD, despite a number of advantages in particular in terms of performance, shows
very low performance for the F-measure. However, since this measure ignores true negative results, it
should be neglected in case of solving an unbalanced problem. It should be noted that the data set
processed in this paper is unbalanced (about 65% of one class and 35% of another).</p>
      <p>Logistic regression classifier shows significantly better results on three indicators besides
Fmeasure. However, this optimization algorithm is inferior in the fast-learning procedure to many of
those studied in this work.</p>
      <p>The highest accuracy on all four performance indicators was obtained when using SVM. The
application of this fast and efficient algorithm for solving binary classification task in our case has fully
justified itself. Despite this, the total accuracy reaches only 70%, which is quite a bit to solve the task.</p>
      <p>To visualize the results of the study ROC-curve was used (Fig. 4). This is one of the most commonly
used methods of demonstrating the results of binary classification.</p>
      <p>The ROC-curve shows the dependence of the number of True Positive values on the number of False
Positive values for each separate class. Accordingly, the studied classifier, the ROC-curve of which is
above and to the left of the graph, demonstrates greater accuracy.</p>
      <p>As can be seen from both graphs of Figs. 4, ROC-curves several algorithms almost overlap. This
indicates that they are approximately equally effective. This is confirmed by the results presented in
Fig. 3. However, SVM shows slightly better results. This is also confirmed by numerical estimates of
all four performance indicators from Figs. 3. It should also be noted the high speed of its work on
samples of medium size [12]. All this provides the possibility of applying this method when building a
real system for predicting the success of the entrant's entry to HEI.</p>
    </sec>
    <sec id="sec-7">
      <title>4. Conclusions</title>
      <p>The current state of development of decision support systems for entrants in the choice of HEI
requires the use of data mining approaches. One of the typical tasks that can be the basis of such systems
is classification. Today there are many different ML algorithms for its solution. The work aimed to
evaluate their effectiveness in solving the problem.</p>
      <p>The work of the studied classifiers was simulated using a sample of 1653 surveys on the results of
admission to the specialty of Computer Science in 29 universities in the United States. The sample was
normalized and randomly divided into two parts - training and test. To ensure the reliability of the
obtained result, the work of each of the studied classifiers was performed ten times, then the results
were averaged, and the final result was formed.</p>
      <p>It has been experimentally established that the highest accuracy is obtained when using a classifier
based on SVM with rbf-core.</p>
      <p>Although the SVM-based classification method showed the highest accuracy, this value is still not
high enough to use this classifier to develop real systems. Therefore, further research will be conducted
in the direction of developing ensembles based on this method, particularly using a stacking approach,
to improve the accuracy of the classifier. In addition, this approach will increase the reliability of
classification subsystems based on it by using four or more models to obtain the final solution of the
system.</p>
      <p>The need for the entrant to making a decision about their choice of HEI and specialty to entry arises
every year during the admission campaign. The task of supporting applicants remains relevant every
year for all educational institutions. From the research, we could see the effectiveness of the application
of methods and techniques of ML. However, in addition to defining the most effective method, there
are other tasks that require further research. A preliminary study of a variety of independent traits that
may affect the outcome of prediction and recommendation in each individual case is critical.</p>
    </sec>
    <sec id="sec-8">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>