A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches Rajni Bhalla a, Amit Sharma b, Geetha Ganesan c a Lovely Professional University, Jalandhar, India b Lovely Professional University, Jalandhar, India c Advanced Computing Research Society, Chennai, Tamilnadu, India Abstract The rapid number of reviews on different fields have contributed to the rising field of data analysis. Several methods are existing for data analysis but there is a need to find the right methodology that can provide better accuracy. The objective of the paper is to find an accurate method depending upon the type of dataset. Previous researches have primarily relied on using the KNN approach and issues for deciding the K-value. For the research work, the data from the Statistics Department of the University of Wisconsin-Madison has been taken to evaluate the teacher performance. The hybrid approach uses three different machine learning models for prediction. The prediction model was tested effectively using the teacher assistant evaluation dataset. The hybrid approach has been developed to improve the identification of teacher performance. Our findings indicate that on combining KNN, decision tree, and naïve Bayes, there is a considerable increase in the performance of the prediction analysis. The results have shown that the hybrid approach called KDN (KNN, Decision Tree, Naïve Bayes) obtained better results with 53.04% accuracy as compared to the baseline system performance. Keywords 1 Hybrid approach, KNN, Classification, machine learning 1. Introduction Nowadays, most academic institutes face a low-quality problem in the educational field. One of these factors is educational student achievement and teacher assistant teaching quality. Some studies had been done to engorge the students to improve their academic achievement, but still, the problem of the teaching quality needs to be improved especially in the practical parts that are normally performed by the Teacher Assistants. In this paper, the Hybrid approach is applied for checking the performance of the teacher. Naïve Bayes, KNN, and decision trees are the best examples of supervised learning where data is already labeled. A decision tree might your a good starting point. A decision tree is generated using a decision tree classifier that gives a clear visual. K-nearest neighbor (K-NN) classification is a labor-intensive algorithm best adopted in the situation of the large training dataset. The algorithm is found to conform to the Euclidean distance measure in terms of the distance matrix. One of supervised learning algorithm is Naive Bayes. Naive Bayes is also known as linear classification method. On the contrary, K-NN is not a linear classifier. When we process data using KNN, there are lot of calculations need to perform on each step. This is the main reason K-NN is unable WAI-2022: Workshop on Artificial Intelligence, January 27 – 28, 2022, Chennai, India. EMAIL: geetha@advancedcomputingresearchsociety.org (Geetha Ganesan) ORCID: 0000-0001-7338-973X (Geetha Ganesan) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 33 to process large amount of data. Both Naive bayes and KNN are powerful techniques. Naive Bayes is preferred over KNN when we need to process data considering speed. If you can't pick between the three, your best strategy is to mix them all and run a test on your data to determine which one delivers the greatest results. The suggested method's technique is described in Section 2. A quick summary of the datasets is explained in Section 3. The collected results and consequences of the study are presented and compared with other methods in Section 4. This research comes to an end in Section 5. 2. Literature Review The detecting methods used in earlier models are introduced in this section. Then we compare and contrast these strategies with those utilized in the proposed model. k Nearest Neighbors (KNN) is a common and extensively used technique for classification [1] [2], clustering[3], and regression [4] in a variety of research areas, including economic modelling [5], image interpolation (Smith et al., 1988), and visual category recognition (Smith et al., 1988). (Zhang et al., 2006). A hybrid and layered Intrusion Detection System (IDS) is suggested, which employs a mix of machine learning and feature selection approaches to deliver high-performance intrusion detection in a variety of assault types [6]. Designing a hybrid analysis is designed to increase the capacity to maintain significant findings and well-supported outcomes by combining traditional statistical analysis and artificial intelligence technologies[7]. We believe that a hybrid strategy that incorporates both machine and human-centered features can achieve greater efficacy, competence, and social significance than either method alone[8]. 3. Methodology 3.1. Dataset Description The dataset has been taken from the UCI repository. The statistics come from assessments of 151 teaching assistant (TA) assignments. By splitting the scores into three groups of about similar size, the class variable was produced ("low," "mid," and "high"). 4. Experiment and Results The analysis design is a combination of several stages and each stage contains a different number of steps as shown in figure1. Firstly, the teacher assistant dataset is retrieved and the rename operator is used to rename the English Speaker attribute. In the second phase, the Spilt Validation operator is used to divide the dataset into two groups; one potion for training data and the other for testing data, and in the third phase, the KNN operator, Decision tree, Naïve Bayes, and hybrid approach is used to train the data and then apply model operator is used for testing the data. In the fourth phase, the different- different models (KNN, decision tree, naïve Bayes, and hybrid) are applied that represent a sample, and a data accuracy algorithm is used to get the performance. The fifth phase represents the results in graphical shape. 34 Figure 1: Pictorial representation of the methodology 4.1. KNN K-nearest neighbours (KNN) is a simple, easy-to-implement supervised machine learning approach that may be used to solve both classification and regression problems. The KNN algorithm believes that objects that are similar are near together. To put it another way, related items are close together. The KNN algorithm relies on this assumption being correct in order for it to work. KNN combines the concept of similarity (also known as distance, proximity, or closeness) with some basic mathematics, such as computing the distance between points on a graph. 4.2. Naive Bayes The Bayes' Theorem is used to produce the Naive Bayes classifiers, which are a set of classification algorithms based on the Bayes' Theorem. It's a group of algorithms that all work on the same principle: each pair of categorizing features is independent of the others. 4.3. Decision Tree Decision tree is one of the powerful techniques that has been used for prediction. A decision tree always presented the result in the form of decision tree. The results of all three algorithms will be compared using ensemble approaches. 4.4. Results The analysis of the proposed model achieved different shapes of results in the training and testing stages. By using these results, the performance of the Teacher Assistant can be analyzed and controlled. The performance output is analyzed based on accuracy, and prediction error. Table 1: Data Performance using KNN Accuracy: 47.83% True3 True2 True1 Class precision pred3 9 4 4 52.94% pred2 4 8 6 44.44% pred1 3 3 5 45.45% Class recall 56.25% 53.33% 33.33% 35 We used the KNN approach to evaluate teachers and obtained a 47.83 percent accuracy, as shown in Table1. When we use naïve Bayes, we got 42.38% accuracy as shown in Table2. At the time of the decision tree, we got 37.04% accuracy as shown in Table3. We need to work on the performance of the model. Figure 2: Scatter Plot showing Category Table 2: Data Performance using Naïve Bayes Accuracy: 42.38% +/-11.77% (micro average: 42.38%) True3 True2 True1 Class precision pred3 41 34 31 36.68% pred2 8 10 5 43.48% pred1 3 6 13 59.09% Class recall 78.85% 20.00% 26.53% Table 3: Data Performance using Decision Tree Accuracy: 37.04% +/-6.79% (micro average: 37.09%) True3 True2 True1 Class precision pred3 47 41 47 34.81% pred2 4 8 1 61.54% pred1 1 1 1 33.33% Class recall 90.38% 16.00% 2.04% Finally, a vote operator has been used to combine KNN, Naive Bayes and decision tree and performance has been compared with individual models as shown in Table 4Error! Reference source not found.. Table 4: Data Performance using Hybrid Approach Accuracy: 53.04% +/-8.62% (micro average: 52.98%) True3 True2 True1 Class precision pred3 39 20 19 50.00% pred2 10 23 12 51.11% pred1 3 7 18 64.29% Class recall 75.00% 46.00% 36.73% 36 Figure 3: Plot view using hybrid approach It is clear from Table4 and Figure 3 that hybrid produces better results as compared to the model. 5. Conclusion This study was conducted to check the performance of different machine learning models after performing data analysis on teaching assistant evaluation. The purpose of this research is to identify effective strategies that can find an accurate model from several prediction models. As per previous studies, there can be no doubt that existing methodologies like KNN, decision tree, and naïve Bayes have proven great methodologies. As per result, KDN proved better in terms to find the accuracy of the model. A hybrid classification approach that incorporates the KNN algorithm, Decision tree, and Naive Bayes is presented here. This analysis adopts the prediction process based on the data size, time process, accuracy, estimated error factor tried to investigate and evaluate the teacher assistant. The results of the evaluation were obtained using the different sizes in the training and testing phases. The deep examinations highlighted that the group of 53.04% achieved better results in the prediction accuracy, estimated time, and error factor. In the future, we’ll look at different distance and similarity options that might help us to get a more precise distance or similarity measurement. To suggest a measurement with a reduced computational cost a method of categorization that is more effective and efficient. 6. References [1] C. H. Wan, L. H. Lee, R. Rajkumar, and D. Isa, “A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine,” Expert Syst. Appl., vol. 39, no. 15, pp. 11880–11888, 2012, doi: 10.1016/j.eswa.2012.02.068. [2] Z.-H. Z. Min-Ling Zhang, “A k-nearest neighbor based algorithm for multi-label classification,” IEEE Int. Conf. Granul. Comput., vol. 2, no. 2, pp. 718–721, 2005. [3] Q. B. Liu, S. Deng, C. H. Lu, B. Wang, and Y. F. Zhou, “Relative density based K- nearest neighbors clustering algorithm,” Int. Conf. Mach. Learn. Cybern., vol. 1, no. November, pp. 133–137, 2003, doi: 10.1109/icmlc.2003.1264457. [4] J. K. Solano Meza, D. Orjuela Yepes, J. Rodrigo-Ilarri, and E. Cassiraga, “Predictive analysis of urban waste generation for the city of Bogotá, Colombia, through the implementation of decision trees-based machine learning, support vector machines and artificial neural networks,” Heliyon, vol. 5, no. 11, p. e02810, 2019, doi: 10.1016/j.heliyon.2019.e02810. 37 [5] Xiao-Gao Yu and Xiao-Peng Yu, “New K-nearest neighbor searching algorithm based on angular similarity,” in 2008 International Conference on Machine Learning and Cybernetics, Jul. 2008, pp. 1779–1784, doi: 10.1109/ICMLC.2008.4620693. [6] Ü. Çavuşoğlu, “A new hybrid approach for intrusion detection using machine learning methods,” Appl. Intell. 49, vol. 7, no. 49, pp. 2735–2761, 2019. [7] F. Costa-Mendes, Ricardo and Oliveira, Tiago and Castelli, Mauro and Cruz-Jesus, “A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach,” Educ. Inf. Technol., vol. 26, no. 2, pp. 1527-1547 (Springer), 2021. [8] A. Sartas, Murat and Cummings, Sarah and Garbero, Alessandra and Akramkhanov, A human machine hybrid approach for systematic reviews and maps in international development and social impact sectors, vol. 12, no. 8. Multidisciplinary Digital Publishing Institute, 2021. 38