<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rajni</forename><surname>Bhalla</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lovely Professional University</orgName>
								<address>
									<settlement>Jalandhar</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Amit</forename><surname>Sharma</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Lovely Professional University</orgName>
								<address>
									<settlement>Jalandhar</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Geetha</forename><surname>Ganesan</surname></persName>
							<email>geetha@advancedcomputingresearchsociety.org</email>
							<affiliation key="aff2">
								<orgName type="department">Advanced Computing Research Society</orgName>
								<address>
									<settlement>Chennai</settlement>
									<region>Tamilnadu</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Bibliographic Survey of Sentiment Classification using Hybrid Ensemble-based Machine Learning Approaches</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F68C1C0ECD1D95462FD5C47EBC636ECB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Hybrid approach</term>
					<term>KNN</term>
					<term>Classification</term>
					<term>machine learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The rapid number of reviews on different fields have contributed to the rising field of data analysis. Several methods are existing for data analysis but there is a need to find the right methodology that can provide better accuracy. The objective of the paper is to find an accurate method depending upon the type of dataset. Previous researches have primarily relied on using the KNN approach and issues for deciding the K-value. For the research work, the data from the Statistics Department of the University of Wisconsin-Madison has been taken to evaluate the teacher performance. The hybrid approach uses three different machine learning models for prediction. The prediction model was tested effectively using the teacher assistant evaluation dataset. The hybrid approach has been developed to improve the identification of teacher performance. Our findings indicate that on combining KNN, decision tree, and naïve Bayes, there is a considerable increase in the performance of the prediction analysis. The results have shown that the hybrid approach called KDN (KNN, Decision Tree, Naïve Bayes) obtained better results with 53.04% accuracy as compared to the baseline system performance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Nowadays, most academic institutes face a low-quality problem in the educational field. One of these factors is educational student achievement and teacher assistant teaching quality. Some studies had been done to engorge the students to improve their academic achievement, but still, the problem of the teaching quality needs to be improved especially in the practical parts that are normally performed by the Teacher Assistants.</p><p>In this paper, the Hybrid approach is applied for checking the performance of the teacher. Naïve Bayes, KNN, and decision trees are the best examples of supervised learning where data is already labeled. A decision tree might your a good starting point. A decision tree is generated using a decision tree classifier that gives a clear visual. K-nearest neighbor (K-NN) classification is a labor-intensive algorithm best adopted in the situation of the large training dataset. The algorithm is found to conform to the Euclidean distance measure in terms of the distance matrix.</p><p>One of supervised learning algorithm is Naive Bayes. Naive Bayes is also known as linear classification method. On the contrary, K-NN is not a linear classifier. When we process data using KNN, there are lot of calculations need to perform on each step. This is the main reason K-NN is unable to process large amount of data. Both Naive bayes and KNN are powerful techniques. Naive Bayes is preferred over KNN when we need to process data considering speed. If you can't pick between the three, your best strategy is to mix them all and run a test on your data to determine which one delivers the greatest results.</p><p>The suggested method's technique is described in Section 2. A quick summary of the datasets is explained in Section 3. The collected results and consequences of the study are presented and compared with other methods in Section 4. This research comes to an end in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature Review</head><p>The detecting methods used in earlier models are introduced in this section. Then we compare and contrast these strategies with those utilized in the proposed model. k Nearest Neighbors (KNN) is a common and extensively used technique for classification <ref type="bibr" target="#b0">[1]</ref> [2], clustering <ref type="bibr" target="#b2">[3]</ref>, and regression <ref type="bibr" target="#b3">[4]</ref> in a variety of research areas, including economic modelling <ref type="bibr" target="#b4">[5]</ref>, image interpolation <ref type="bibr">(Smith et al., 1988)</ref>, and visual category recognition <ref type="bibr">(Smith et al., 1988)</ref>. <ref type="bibr">(Zhang et al., 2006)</ref>. A hybrid and layered Intrusion Detection System (IDS) is suggested, which employs a mix of machine learning and feature selection approaches to deliver high-performance intrusion detection in a variety of assault types <ref type="bibr" target="#b5">[6]</ref>. Designing a hybrid analysis is designed to increase the capacity to maintain significant findings and well-supported outcomes by combining traditional statistical analysis and artificial intelligence technologies <ref type="bibr" target="#b6">[7]</ref>. We believe that a hybrid strategy that incorporates both machine and human-centered features can achieve greater efficacy, competence, and social significance than either method alone <ref type="bibr" target="#b7">[8]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology 3.1. Dataset Description</head><p>The dataset has been taken from the UCI repository. The statistics come from assessments of 151 teaching assistant (TA) assignments. By splitting the scores into three groups of about similar size, the class variable was produced ("low," "mid," and "high").</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiment and Results</head><p>The analysis design is a combination of several stages and each stage contains a different number of steps as shown in figure1. Firstly, the teacher assistant dataset is retrieved and the rename operator is used to rename the English Speaker attribute. In the second phase, the Spilt Validation operator is used to divide the dataset into two groups; one potion for training data and the other for testing data, and in the third phase, the KNN operator, Decision tree, Naïve Bayes, and hybrid approach is used to train the data and then apply model operator is used for testing the data. In the fourth phase, the differentdifferent models (KNN, decision tree, naïve Bayes, and hybrid) are applied that represent a sample, and a data accuracy algorithm is used to get the performance. The fifth phase represents the results in graphical shape. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">KNN</head><p>K-nearest neighbours (KNN) is a simple, easy-to-implement supervised machine learning approach that may be used to solve both classification and regression problems. The KNN algorithm believes that objects that are similar are near together. To put it another way, related items are close together. The KNN algorithm relies on this assumption being correct in order for it to work. KNN combines the concept of similarity (also known as distance, proximity, or closeness) with some basic mathematics, such as computing the distance between points on a graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Naive Bayes</head><p>The Bayes' Theorem is used to produce the Naive Bayes classifiers, which are a set of classification algorithms based on the Bayes' Theorem. It's a group of algorithms that all work on the same principle: each pair of categorizing features is independent of the others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Decision Tree</head><p>Decision tree is one of the powerful techniques that has been used for prediction. A decision tree always presented the result in the form of decision tree. The results of all three algorithms will be compared using ensemble approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Results</head><p>The analysis of the proposed model achieved different shapes of results in the training and testing stages. By using these results, the performance of the Teacher Assistant can be analyzed and controlled. The performance output is analyzed based on accuracy, and prediction error. We used the KNN approach to evaluate teachers and obtained a 47.83 percent accuracy, as shown in Table1. When we use naïve Bayes, we got 42.38% accuracy as shown in Table2. At the time of the decision tree, we got 37.04% accuracy as shown in Table3. We need to work on the performance of the model.    It is clear from Table4 and Figure <ref type="figure" target="#fig_2">3</ref> that hybrid produces better results as compared to the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>This study was conducted to check the performance of different machine learning models after performing data analysis on teaching assistant evaluation. The purpose of this research is to identify effective strategies that can find an accurate model from several prediction models. As per previous studies, there can be no doubt that existing methodologies like KNN, decision tree, and naïve Bayes have proven great methodologies. As per result, KDN proved better in terms to find the accuracy of the model. A hybrid classification approach that incorporates the KNN algorithm, Decision tree, and Naive Bayes is presented here. This analysis adopts the prediction process based on the data size, time process, accuracy, estimated error factor tried to investigate and evaluate the teacher assistant. The results of the evaluation were obtained using the different sizes in the training and testing phases. The deep examinations highlighted that the group of 53.04% achieved better results in the prediction accuracy, estimated time, and error factor. In the future, we'll look at different distance and similarity options that might help us to get a more precise distance or similarity measurement. To suggest a measurement with a reduced computational cost a method of categorization that is more effective and efficient.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Pictorial representation of the methodology</figDesc><graphic coords="3,200.00,72.00,194.70,165.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Scatter Plot showing Category</figDesc><graphic coords="4,132.00,135.24,331.00,196.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Plot view using hybrid approach</figDesc><graphic coords="5,180.50,72.00,234.00,186.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Data  </figDesc><table><row><cell cols="2">Performance using KNN</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Accuracy: 47.83%</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>True3</cell><cell>True2</cell><cell>True1</cell><cell>Class precision</cell></row><row><cell>pred3</cell><cell>9</cell><cell>4</cell><cell>4</cell><cell>52.94%</cell></row><row><cell>pred2</cell><cell>4</cell><cell>8</cell><cell>6</cell><cell>44.44%</cell></row><row><cell>pred1</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>45.45%</cell></row><row><cell>Class recall</cell><cell>56.25%</cell><cell>53.33%</cell><cell>33.33%</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Data Performance using Naïve Bayes Accuracy: 42.38% +/-11.77% (micro average: 42.38%)</figDesc><table><row><cell></cell><cell>True3</cell><cell>True2</cell><cell>True1</cell><cell>Class precision</cell></row><row><cell>pred3</cell><cell>41</cell><cell>34</cell><cell>31</cell><cell>36.68%</cell></row><row><cell>pred2</cell><cell>8</cell><cell>10</cell><cell>5</cell><cell>43.48%</cell></row><row><cell>pred1</cell><cell>3</cell><cell>6</cell><cell>13</cell><cell>59.09%</cell></row><row><cell>Class recall</cell><cell>78.85%</cell><cell>20.00%</cell><cell>26.53%</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>DataFinally, a vote operator has been used to combine KNN, Naive Bayes and decision tree and performance has been compared with individual models as shown in Table4Error! Reference source not found..</figDesc><table><row><cell cols="2">Performance using Decision Tree</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Accuracy: 37.04% +/-6.79% (micro average: 37.09%)</cell><cell></cell><cell></cell></row><row><cell></cell><cell>True3</cell><cell>True2</cell><cell>True1</cell><cell>Class precision</cell></row><row><cell>pred3</cell><cell>47</cell><cell>41</cell><cell>47</cell><cell>34.81%</cell></row><row><cell>pred2</cell><cell>4</cell><cell>8</cell><cell>1</cell><cell>61.54%</cell></row><row><cell>pred1</cell><cell>1</cell><cell>1</cell><cell>1</cell><cell>33.33%</cell></row><row><cell>Class recall</cell><cell>90.38%</cell><cell>16.00%</cell><cell>2.04%</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Data Performance using Hybrid Approach</figDesc><table><row><cell cols="3">Accuracy: 53.04% +/-8.62% (micro average: 52.98%)</cell><cell></cell><cell></cell></row><row><cell></cell><cell>True3</cell><cell>True2</cell><cell>True1</cell><cell>Class precision</cell></row><row><cell>pred3</cell><cell>39</cell><cell>20</cell><cell>19</cell><cell>50.00%</cell></row><row><cell>pred2</cell><cell>10</cell><cell>23</cell><cell>12</cell><cell>51.11%</cell></row><row><cell>pred1</cell><cell>3</cell><cell>7</cell><cell>18</cell><cell>64.29%</cell></row><row><cell>Class recall</cell><cell>75.00%</cell><cell>46.00%</cell><cell>36.73%</cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rajkumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Isa</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2012.02.068</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Syst. Appl</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">15</biblScope>
			<biblScope unit="page" from="11880" to="11888" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A k-nearest neighbor based algorithm for multi-label classification</title>
		<author>
			<persName><forename type="first">Z.-H</forename><forename type="middle">Z</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">-Ling</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Int. Conf. Granul. Comput</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="718" to="721" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Relative density based Knearest neighbors clustering algorithm</title>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">F</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.1109/icmlc.2003.1264457</idno>
	</analytic>
	<monogr>
		<title level="j">Int. Conf. Mach. Learn. Cybern</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="133" to="137" />
			<date type="published" when="2003-11">November. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Predictive analysis of urban waste generation for the city of Bogotá, Colombia, through the implementation of decision trees-based machine learning, support vector machines and artificial neural networks</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Solano Meza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Orjuela Yepes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rodrigo-Ilarri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cassiraga</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.heliyon.2019.e02810</idno>
	</analytic>
	<monogr>
		<title level="j">Heliyon</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page">e02810</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">New K-nearest neighbor searching algorithm based on angular similarity</title>
		<author>
			<persName><forename type="first">Xiao-Gao</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiao-Peng</forename><surname>Yu</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICMLC.2008.4620693</idno>
	</analytic>
	<monogr>
		<title level="m">2008 International Conference on Machine Learning and Cybernetics</title>
				<imprint>
			<date type="published" when="2008-07">Jul. 2008</date>
			<biblScope unit="page" from="1779" to="1784" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A new hybrid approach for intrusion detection using machine learning methods</title>
		<author>
			<persName><forename type="first">Ü</forename><surname>Çavuşoğlu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Appl. Intell</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="issue">49</biblScope>
			<biblScope unit="page" from="2735" to="2761" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</title>
		<author>
			<persName><forename type="first">F</forename><surname>Costa-Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ricardo</forename><surname>Oliveira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tiago</forename><surname>Castelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mauro</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Cruz-Jesus</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Educ. Inf. Technol</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1527" to="1547" />
			<date type="published" when="2021">2021</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">A human machine hybrid approach for systematic reviews and maps in international development and social impact sectors</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sartas</surname></persName>
		</author>
		<author>
			<persName><surname>Murat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Cummings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alessandra</forename><surname>Garbero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Akramkhanov</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Multidisciplinary Digital Publishing Institute</publisher>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
