A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches

A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches RajniBhalla Lovely Professional University

Jalandhar India

AmitSharma Lovely Professional University

Jalandhar India

GeethaGanesan geetha@advancedcomputingresearchsociety.org Advanced Computing Research Society

Chennai Tamilnadu India

A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches F68C1C0ECD1D95462FD5C47EBC636ECB GROBID - A machine learning software for extracting information from scholarly documents Hybrid approach KNN Classification machine learning

The rapid number of reviews on different fields have contributed to the rising field of data analysis. Several methods are existing for data analysis but there is a need to find the right methodology that can provide better accuracy. The objective of the paper is to find an accurate method depending upon the type of dataset. Previous researches have primarily relied on using the KNN approach and issues for deciding the K-value. For the research work, the data from the Statistics Department of the University of Wisconsin-Madison has been taken to evaluate the teacher performance. The hybrid approach uses three different machine learning models for prediction. The prediction model was tested effectively using the teacher assistant evaluation dataset. The hybrid approach has been developed to improve the identification of teacher performance. Our findings indicate that on combining KNN, decision tree, and naïve Bayes, there is a considerable increase in the performance of the prediction analysis. The results have shown that the hybrid approach called KDN (KNN, Decision Tree, Naïve Bayes) obtained better results with 53.04% accuracy as compared to the baseline system performance.

Introduction

Nowadays, most academic institutes face a low-quality problem in the educational field. One of these factors is educational student achievement and teacher assistant teaching quality. Some studies had been done to engorge the students to improve their academic achievement, but still, the problem of the teaching quality needs to be improved especially in the practical parts that are normally performed by the Teacher Assistants.

In this paper, the Hybrid approach is applied for checking the performance of the teacher. Naïve Bayes, KNN, and decision trees are the best examples of supervised learning where data is already labeled. A decision tree might your a good starting point. A decision tree is generated using a decision tree classifier that gives a clear visual. K-nearest neighbor (K-NN) classification is a labor-intensive algorithm best adopted in the situation of the large training dataset. The algorithm is found to conform to the Euclidean distance measure in terms of the distance matrix.

One of supervised learning algorithm is Naive Bayes. Naive Bayes is also known as linear classification method. On the contrary, K-NN is not a linear classifier. When we process data using KNN, there are lot of calculations need to perform on each step. This is the main reason K-NN is unable to process large amount of data. Both Naive bayes and KNN are powerful techniques. Naive Bayes is preferred over KNN when we need to process data considering speed. If you can't pick between the three, your best strategy is to mix them all and run a test on your data to determine which one delivers the greatest results.

The suggested method's technique is described in Section 2. A quick summary of the datasets is explained in Section 3. The collected results and consequences of the study are presented and compared with other methods in Section 4. This research comes to an end in Section 5.

Literature Review

The detecting methods used in earlier models are introduced in this section. Then we compare and contrast these strategies with those utilized in the proposed model. k Nearest Neighbors (KNN) is a common and extensively used technique for classification [1] [2], clustering [3], and regression [4] in a variety of research areas, including economic modelling [5], image interpolation (Smith et al., 1988), and visual category recognition (Smith et al., 1988). (Zhang et al., 2006). A hybrid and layered Intrusion Detection System (IDS) is suggested, which employs a mix of machine learning and feature selection approaches to deliver high-performance intrusion detection in a variety of assault types [6]. Designing a hybrid analysis is designed to increase the capacity to maintain significant findings and well-supported outcomes by combining traditional statistical analysis and artificial intelligence technologies [7]. We believe that a hybrid strategy that incorporates both machine and human-centered features can achieve greater efficacy, competence, and social significance than either method alone [8].

Methodology 3.1. Dataset Description

The dataset has been taken from the UCI repository. The statistics come from assessments of 151 teaching assistant (TA) assignments. By splitting the scores into three groups of about similar size, the class variable was produced ("low," "mid," and "high").

Experiment and Results

The analysis design is a combination of several stages and each stage contains a different number of steps as shown in figure1. Firstly, the teacher assistant dataset is retrieved and the rename operator is used to rename the English Speaker attribute. In the second phase, the Spilt Validation operator is used to divide the dataset into two groups; one potion for training data and the other for testing data, and in the third phase, the KNN operator, Decision tree, Naïve Bayes, and hybrid approach is used to train the data and then apply model operator is used for testing the data. In the fourth phase, the differentdifferent models (KNN, decision tree, naïve Bayes, and hybrid) are applied that represent a sample, and a data accuracy algorithm is used to get the performance. The fifth phase represents the results in graphical shape.

KNN

K-nearest neighbours (KNN) is a simple, easy-to-implement supervised machine learning approach that may be used to solve both classification and regression problems. The KNN algorithm believes that objects that are similar are near together. To put it another way, related items are close together. The KNN algorithm relies on this assumption being correct in order for it to work. KNN combines the concept of similarity (also known as distance, proximity, or closeness) with some basic mathematics, such as computing the distance between points on a graph.

Naive Bayes

The Bayes' Theorem is used to produce the Naive Bayes classifiers, which are a set of classification algorithms based on the Bayes' Theorem. It's a group of algorithms that all work on the same principle: each pair of categorizing features is independent of the others.

Decision Tree

Decision tree is one of the powerful techniques that has been used for prediction. A decision tree always presented the result in the form of decision tree. The results of all three algorithms will be compared using ensemble approaches.

Results

The analysis of the proposed model achieved different shapes of results in the training and testing stages. By using these results, the performance of the Teacher Assistant can be analyzed and controlled. The performance output is analyzed based on accuracy, and prediction error. We used the KNN approach to evaluate teachers and obtained a 47.83 percent accuracy, as shown in Table1. When we use naïve Bayes, we got 42.38% accuracy as shown in Table2. At the time of the decision tree, we got 37.04% accuracy as shown in Table3. We need to work on the performance of the model. It is clear from Table4 and Figure 3 that hybrid produces better results as compared to the model.

Conclusion

This study was conducted to check the performance of different machine learning models after performing data analysis on teaching assistant evaluation. The purpose of this research is to identify effective strategies that can find an accurate model from several prediction models. As per previous studies, there can be no doubt that existing methodologies like KNN, decision tree, and naïve Bayes have proven great methodologies. As per result, KDN proved better in terms to find the accuracy of the model. A hybrid classification approach that incorporates the KNN algorithm, Decision tree, and Naive Bayes is presented here. This analysis adopts the prediction process based on the data size, time process, accuracy, estimated error factor tried to investigate and evaluate the teacher assistant. The results of the evaluation were obtained using the different sizes in the training and testing phases. The deep examinations highlighted that the group of 53.04% achieved better results in the prediction accuracy, estimated time, and error factor. In the future, we'll look at different distance and similarity options that might help us to get a more precise distance or similarity measurement. To suggest a measurement with a reduced computational cost a method of categorization that is more effective and efficient.

Figure 1 :1Figure 1: Pictorial representation of the methodology

Figure 2 :2Figure 2: Scatter Plot showing Category

Figure 3 :3Figure 3: Plot view using hybrid approach

Table 1 :1Data Performance using KNNAccuracy: 47.83%True3True2True1Class precisionpred394452.94%pred248644.44%pred133545.45%Class recall56.25%53.33%33.33%

Table 2 :2Data Performance using Naïve Bayes Accuracy: 42.38% +/-11.77% (micro average: 42.38%)True3True2True1Class precisionpred341343136.68%pred2810543.48%pred1361359.09%Class recall78.85%20.00%26.53%

Table 3 :3DataFinally, a vote operator has been used to combine KNN, Naive Bayes and decision tree and performance has been compared with individual models as shown in Table4Error! Reference source not found..Performance using Decision TreeAccuracy: 37.04% +/-6.79% (micro average: 37.09%)True3True2True1Class precisionpred347414734.81%pred248161.54%pred111133.33%Class recall90.38%16.00%2.04%

Table 4 :4Data Performance using Hybrid ApproachAccuracy: 53.04% +/-8.62% (micro average: 52.98%)True3True2True1Class precisionpred339201950.00%pred210231251.11%pred1371864.29%Class recall75.00%46.00%36.73%

A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine CHWan LHLee RRajkumar DIsa 10.1016/j.eswa.2012.02.068 Expert Syst. Appl 39 15 2012 A k-nearest neighbor based algorithm for multi-label classification Z.-HZMin -LingZhang IEEE Int. Conf. Granul. Comput 2 2 2005 Relative density based Knearest neighbors clustering algorithm QBLiu SDeng CHLu BWang YFZhou 10.1109/icmlc.2003.1264457 Int. Conf. Mach. Learn. Cybern 1 November. 2003 Predictive analysis of urban waste generation for the city of Bogotá, Colombia, through the implementation of decision trees-based machine learning, support vector machines and artificial neural networks JKSolano Meza DOrjuela Yepes JRodrigo-Ilarri ECassiraga 10.1016/j.heliyon.2019.e02810 Heliyon 5 11 e02810 2019 New K-nearest neighbor searching algorithm based on angular similarity Xiao-GaoYu Xiao-PengYu 10.1109/ICMLC.2008.4620693 2008 International Conference on Machine Learning and Cybernetics Jul. 2008 A new hybrid approach for intrusion detection using machine learning methods ÜÇavuşoğlu Appl. Intell 49 49 2019 A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach FCosta-Mendes RicardoOliveira TiagoCastelli Mauro Cruz-Jesus Educ. Inf. Technol 26 2 2021 Springer A human machine hybrid approach for systematic reviews and maps in international development and social impact sectors ASartas Murat SarahCummings AlessandraGarbero Akramkhanov 2021 Multidisciplinary Digital Publishing Institute 12