<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rajni Bhalla</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amit Sharma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geetha Ganesan</string-name>
          <email>geetha@advancedcomputingresearchsociety.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Computing Research Society</institution>
          ,
          <addr-line>Chennai, Tamilnadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lovely Professional University</institution>
          ,
          <addr-line>Jalandhar</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>38</lpage>
      <abstract>
        <p>The rapid number of reviews on different fields have contributed to the rising field of data analysis. Several methods are existing for data analysis but there is a need to find the right methodology that can provide better accuracy. The objective of the paper is to find an accurate method depending upon the type of dataset. Previous researches have primarily relied on using the KNN approach and issues for deciding the K-value. For the research work, the data from the Statistics Department of the University of Wisconsin-Madison has been taken to evaluate the teacher performance. The hybrid approach uses three different machine learning models for prediction. The prediction model was tested effectively using the teacher assistant evaluation dataset. The hybrid approach has been developed to improve the identification of teacher performance. Our findings indicate that on combining KNN, decision tree, and naïve Bayes, there is a considerable increase in the performance of the prediction analysis. The results have shown that the hybrid approach called KDN (KNN, Decision Tree, Naïve Bayes) obtained better results with 53.04% accuracy as compared to the baseline system performance.</p>
      </abstract>
      <kwd-group>
        <kwd>Hybrid approach</kwd>
        <kwd>KNN</kwd>
        <kwd>Classification</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, most academic institutes face a low-quality problem in the educational field. One of
these factors is educational student achievement and teacher assistant teaching quality. Some studies
had been done to engorge the students to improve their academic achievement, but still, the problem of
the teaching quality needs to be improved especially in the practical parts that are normally performed
by the Teacher Assistants.</p>
      <p>In this paper, the Hybrid approach is applied for checking the performance of the teacher. Naïve
Bayes, KNN, and decision trees are the best examples of supervised learning where data is already
labeled. A decision tree might your a good starting point. A decision tree is generated using a decision
tree classifier that gives a clear visual.</p>
      <p>K-nearest neighbor (K-NN) classification is a labor-intensive algorithm best adopted in the situation
of the large training dataset. The algorithm is found to conform to the Euclidean distance measure in
terms of the distance matrix.</p>
      <p>One of supervised learning algorithm is Naive Bayes. Naive Bayes is also known as linear
classification method. On the contrary, K-NN is not a linear classifier. When we process data using
KNN, there are lot of calculations need to perform on each step. This is the main reason K-NN is unable</p>
      <p>2022 Copyright for this paper by its authors.
to process large amount of data. Both Naive bayes and KNN are powerful techniques. Naive Bayes is
preferred over KNN when we need to process data considering speed. If you can't pick between the
three, your best strategy is to mix them all and run a test on your data to determine which one delivers
the greatest results.</p>
      <p>The suggested method's technique is described in Section 2. A quick summary of the datasets is
explained in Section 3. The collected results and consequences of the study are presented and compared
with other methods in Section 4. This research comes to an end in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>The detecting methods used in earlier models are introduced in this section. Then we compare and
contrast these strategies with those utilized in the proposed model. k Nearest Neighbors (KNN) is a
common and extensively used technique for classification [1] [2], clustering[3], and regression [4] in a
variety of research areas, including economic modelling [5], image interpolation (Smith et al., 1988),
and visual category recognition (Smith et al., 1988). (Zhang et al., 2006). A hybrid and layered Intrusion
Detection System (IDS) is suggested, which employs a mix of machine learning and feature selection
approaches to deliver high-performance intrusion detection in a variety of assault types [6]. Designing
a hybrid analysis is designed to increase the capacity to maintain significant findings and well-supported
outcomes by combining traditional statistical analysis and artificial intelligence technologies[7]. We
believe that a hybrid strategy that incorporates both machine and human-centered features can achieve
greater efficacy, competence, and social significance than either method alone[8].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
    </sec>
    <sec id="sec-4">
      <title>3.1. Dataset Description</title>
      <p>The dataset has been taken from the UCI repository. The statistics come from assessments of 151
teaching assistant (TA) assignments. By splitting the scores into three groups of about similar size, the
class variable was produced ("low," "mid," and "high").</p>
    </sec>
    <sec id="sec-5">
      <title>4. Experiment and Results</title>
      <p>The analysis design is a combination of several stages and each stage contains a different number of
steps as shown in figure1. Firstly, the teacher assistant dataset is retrieved and the rename operator is
used to rename the English Speaker attribute. In the second phase, the Spilt Validation operator is used
to divide the dataset into two groups; one potion for training data and the other for testing data, and in
the third phase, the KNN operator, Decision tree, Naïve Bayes, and hybrid approach is used to train the
data and then apply model operator is used for testing the data. In the fourth phase, the
differentdifferent models (KNN, decision tree, naïve Bayes, and hybrid) are applied that represent a sample, and
a data accuracy algorithm is used to get the performance. The fifth phase represents the results in
graphical shape.</p>
      <p>KNN</p>
      <p>K-nearest neighbours (KNN) is a simple, easy-to-implement supervised machine learning approach
that may be used to solve both classification and regression problems. The KNN algorithm believes that
objects that are similar are near together. To put it another way, related items are close together. The
KNN algorithm relies on this assumption being correct in order for it to work. KNN combines the
concept of similarity (also known as distance, proximity, or closeness) with some basic mathematics,
such as computing the distance between points on a graph.</p>
      <p>The Bayes' Theorem is used to produce the Naive Bayes classifiers, which are a set of classification
algorithms based on the Bayes' Theorem. It's a group of algorithms that all work on the same principle:
each pair of categorizing features is independent of the others.</p>
      <p>Decision tree is one of the powerful techniques that has been used for prediction. A decision tree
always presented the result in the form of decision tree. The results of all three algorithms will be
compared using ensemble approaches.</p>
      <p>The analysis of the proposed model achieved different shapes of results in the training and testing
stages. By using these results, the performance of the Teacher Assistant can be analyzed and controlled.
The performance output is analyzed based on accuracy, and prediction error.
4.2.</p>
    </sec>
    <sec id="sec-6">
      <title>Naive Bayes 4.3.</title>
    </sec>
    <sec id="sec-7">
      <title>Decision Tree 4.4.</title>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
      <p>We used the KNN approach to evaluate teachers and obtained a 47.83 percent accuracy, as shown
in Table1. When we use naïve Bayes, we got 42.38% accuracy as shown in Table2. At the time of the
decision tree, we got 37.04% accuracy as shown in Table3. We need to work on the performance of the
model.</p>
      <p>Finally, a vote operator has been used to combine KNN, Naive Bayes and decision tree and
performance has been compared with individual models as shown in Table 4Error! Reference source
not found..</p>
    </sec>
    <sec id="sec-9">
      <title>5. Conclusion</title>
      <p>It is clear from Table4 and Figure 3 that hybrid produces better results as compared to the model.</p>
      <p>This study was conducted to check the performance of different machine learning models after
performing data analysis on teaching assistant evaluation. The purpose of this research is to identify
effective strategies that can find an accurate model from several prediction models. As per previous
studies, there can be no doubt that existing methodologies like KNN, decision tree, and naïve Bayes
have proven great methodologies. As per result, KDN proved better in terms to find the accuracy of the
model. A hybrid classification approach that incorporates the KNN algorithm, Decision tree, and Naive
Bayes is presented here. This analysis adopts the prediction process based on the data size, time process,
accuracy, estimated error factor tried to investigate and evaluate the teacher assistant. The results of the
evaluation were obtained using the different sizes in the training and testing phases. The deep
examinations highlighted that the group of 53.04% achieved better results in the prediction accuracy,
estimated time, and error factor. In the future, we’ll look at different distance and similarity options that
might help us to get a more precise distance or similarity measurement.
To suggest a measurement with a reduced computational cost a method of categorization that is more
effective and efficient.</p>
    </sec>
    <sec id="sec-10">
      <title>6. References</title>
      <p>C. H. Wan, L. H. Lee, R. Rajkumar, and D. Isa, “A hybrid text classification approach
with low dependency on parameter by integrating K-nearest neighbor and support vector
machine,” Expert Syst. Appl., vol. 39, no. 15, pp. 11880–11888, 2012, doi:
10.1016/j.eswa.2012.02.068.</p>
      <p>Z.-H. Z. Min-Ling Zhang, “A k-nearest neighbor based algorithm for multi-label
classification,” IEEE Int. Conf. Granul. Comput., vol. 2, no. 2, pp. 718–721, 2005.
Q. B. Liu, S. Deng, C. H. Lu, B. Wang, and Y. F. Zhou, “Relative density based
Knearest neighbors clustering algorithm,” Int. Conf. Mach. Learn. Cybern., vol. 1, no.
November, pp. 133–137, 2003, doi: 10.1109/icmlc.2003.1264457.</p>
      <p>J. K. Solano Meza, D. Orjuela Yepes, J. Rodrigo-Ilarri, and E. Cassiraga, “Predictive
analysis of urban waste generation for the city of Bogotá, Colombia, through the
implementation of decision trees-based machine learning, support vector machines and
artificial neural networks,” Heliyon, vol. 5, no. 11, p. e02810, 2019, doi:
10.1016/j.heliyon.2019.e02810.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Xiao-Gao Yu</surname>
          </string-name>
          and
          <string-name>
            <surname>Xiao-Peng</surname>
            <given-names>Yu</given-names>
          </string-name>
          , “
          <article-title>New K-nearest neighbor searching algorithm based on angular similarity</article-title>
          ,” in
          <source>2008 International Conference on Machine Learning and Cybernetics</source>
          , Jul.
          <year>2008</year>
          , pp.
          <fpage>1779</fpage>
          -
          <lpage>1784</lpage>
          , doi: 10.1109/ICMLC.
          <year>2008</year>
          .
          <volume>4620693</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          Ü. Çavuşoğlu, “
          <article-title>A new hybrid approach for intrusion detection using machine learning methods</article-title>
          ,
          <source>” Appl. Intell. 49</source>
          , vol.
          <volume>7</volume>
          , no.
          <issue>49</issue>
          , pp.
          <fpage>2735</fpage>
          -
          <lpage>2761</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Costa-Mendes</surname>
          </string-name>
          ,
          <article-title>Ricardo and Oliveira, Tiago and Castelli, Mauro and Cruz-Jesus, “A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</article-title>
          ,” Educ. Inf. Technol., vol.
          <volume>26</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>1527</fpage>
          -
          <lpage>1547</lpage>
          (Springer),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sartas</surname>
          </string-name>
          ,
          <article-title>Murat and Cummings, Sarah and Garbero, Alessandra and Akramkhanov, A human machine hybrid approach for systematic reviews and maps in international development and social impact sectors</article-title>
          , vol.
          <volume>12</volume>
          , no.
          <issue>8</issue>
          . Multidisciplinary Digital Publishing Institute,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>