<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Hybrid Method for Textual Data Classification Based on Support Vector Machine with Particle Swarm Optimization Metaheuristic and k-Means Clustering</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Konstantinas</forename><surname>Korovkinas</surname></persName>
							<email>konstantinas.korovkinas@knf.vu.lt</email>
							<affiliation key="aff0">
								<orgName type="department">Kaunas Faculty</orgName>
								<orgName type="institution">Vilnius University</orgName>
								<address>
									<addrLine>Muitines str. 8</addrLine>
									<postCode>LT-44280</postCode>
									<settlement>Kaunas</settlement>
									<country key="LT">Lithuania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Hybrid Method for Textual Data Classification Based on Support Vector Machine with Particle Swarm Optimization Metaheuristic and k-Means Clustering</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3D0A994A8DEE28CE6D1E02F8D494B66A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T09:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Support Vector Machine</term>
					<term>Particle Swarm Optimization</term>
					<term>k-Means</term>
					<term>Textual Data Classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper introduces a hybrid method for textual data classification. The goal of this paper is to improve classification accuracy of method presented in our previous work by integrating to it k-Means method for decreasing training dataset and particle swarm optimization metaheuristic for a linear support vector machine parameter tuning. The paper reports that the introduced method is characterized by higher improvements in all effectiveness metrics than the methods presented in our previous works.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Textual data analysis became a very popular since people started using Internet, to be more concrete when e-shops, social networks, Blogs etc., where people can write their comments, appeared. This area is considered as a very challenging -although a lot of work has been done in this field, accuracy is still rather average due to comments, slang, smiles etc. A Support Vector Machine (SVM) is one of the most widely used method which has proved its efficiency in different tasks and domains. It is very flexible to parameter tuning, as well as internal modifications, which allows to improve its performance and accuracy. However, despite all advantages, typical for SVM algorithm is, that it is characterized by slow performance in the big data arrays. The higher number of features is, the longer computation time it requires. There have been a number of efforts to speed up SVM, and most of them focus on reduction of the training set <ref type="bibr" target="#b15">[15,</ref><ref type="bibr" target="#b18">18,</ref><ref type="bibr" target="#b22">22]</ref>. One of the most widely known and promising method for that is k-Means <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b8">8,</ref><ref type="bibr" target="#b23">23]</ref>, which can be used as standalone method or in combination with others. Aforementioned authors conclude, that properly selected training data can improve executing time with no losing or similar accuracy. Increasing accuracy is another common problem. Particle swarm optimization (PSO) is a very promising option <ref type="bibr" target="#b9">[9,</ref><ref type="bibr" target="#b11">11,</ref><ref type="bibr" target="#b16">16]</ref>. One of its strengths is combination with other evolutionary methods. Its efficiency is also proved in SVM parameter selection tasks <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b10">10,</ref><ref type="bibr" target="#b21">21]</ref>.</p><p>Motivated by these improvements, this paper proposes a hybrid method for textual data classification, which is suitable to work with large datasets. The proposed hybrid method is a combination of three methods: SVM, k-Means and PSO, which are integrated into SpeedUP method, earlier presented in <ref type="bibr" target="#b12">[12]</ref>. Standalone SpeedUP method increased SVM classification speed, while slightly lost to ordinary SVM method in terms of accuracy <ref type="bibr" target="#b12">[12]</ref>. Considering on it, were proposed two separate methods for improving classification accuracy: k-Means method in <ref type="bibr" target="#b13">[13]</ref> -for training data reduction and PSO in <ref type="bibr" target="#b14">[14]</ref> -for finding the best cost (penalty) parameter for SVM. Both aforementioned methods still lost to ordinary SVM, when are used separately, so it led to conclusion of possibility to combine these methods. The rest of the paper is organized as follows. Section 2 introduces the methods which were used in the experiments and proposed method is described, whereas Section 3 gives a description of datasets and experimental settings used to evaluate proposed approach, together with results obtained during experimenting. Finally, Section 4 outlines the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methodology</head><p>This section shortly presents the methods relevant to research presented in this paper: Support Vector Machine <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b3">4]</ref>, k-Means <ref type="bibr" target="#b17">[17]</ref>, Particle Swarm Optimization <ref type="bibr" target="#b2">[3]</ref> and Term Frequency -Inverse Document Frequency (TF-IDF) <ref type="bibr" target="#b20">[20]</ref>. The brief description of a proposed hybrid method, also presented herein.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Relevant methods</head><p>TF-IDF Since machine learning algorithms cannot work with text data directly, it should be converted into vector of numbers. TF-IDF works by determining the relative frequency of words in a specific document compared to the inverse proportion of that word over the entire document corpus. This calculation determines how relevant a given word is in particular document. TfidfVectorizer module from scikit-learn <ref type="bibr" target="#b19">[19]</ref> library is used to implement TF-IDF.</p><p>Support Vector Machine Herein is used linear SVM (LSVM), which is optimized for large-scale learning. LinearSVC module from scikit-learn <ref type="bibr" target="#b19">[19]</ref> library is used to implement LSVM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>k-Means</head><p>The main idea of this method is to partition the input dataset into k clusters, represented by adaptively-changing centroids. k-Means computes the squared distances between the input data points and centroids, and assigns inputs to the nearest centroid. KMeans module from scikit-learn <ref type="bibr" target="#b19">[19]</ref> library used to implement k-Means method.</p><p>Particle Swarm Optimization It is a population-based stochastic metaheuristic algorithm for solving continuous and discrete optimization problems. Herein is used global best variant, which was manually programmed using Python language and adopted for LSVM parameter tuning for textual data classification tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The proposed method</head><p>The proposed hybrid method (LSVM P SO km 30K SpeedUP ) is a combination of three methods: SVM, k-Means and PSO, which are integrated into SpeedUP method. The main idea of it is to reduce training dataset size regarding to subset size. Thus the testing dataset is split into equal subsets and the size of training data is calculated on the basis of the first subset size. k-Means and PSO methods are used for increasing accuracy of SpeedUP.  1. Before passing to SpeedUP method, training and testing datasets are preprocessed. Preprocessing contains two actions: text preprocessing and data cleaning. Text preprocessing includes actions like converting to lowercase, removing redundant tokens such as hashtag, symbols @, numbers, "http" for links, punctuation symbols, usernames etc. Data cleaning performs dataset checking for empty strings and removing them. 2. Depending on Subset size , function f (x) calculates training data size and number of sets for k-Means method. 3. Selected training data is converted into vector of numbers with TF-IDF and passed to k-Means method for the selection of the best training data for SpeedUP method, which is performed depending on results R(km i ). 4. The best training data selected is converted into vector of numbers with TF-IDF and passed to PSO method, which returns the best C value for LSVM.</p><p>After, the same training data is passed into LSVM. LSVM is trained with it and parameter C is set to one, which is returned from PSO method. 5. Depending on Subset size testing dataset is dividing into subsets td 1 , td 2 etc.</p><p>(function y(x)). 6. Subsets are converted into vector of numbers with TF-IDF and are passed to LSVM algorithm one by one and achieved results are stored in separate sets r 1 -for td 1 , r 2 -for td 2 etc. 7. The results are combined into one result set -Results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments and results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dataset</head><p>In this paper are used two existing labeled datasets available: The Stanford Twitter sentiment corpus (sentiment140<ref type="foot" target="#foot_0">1</ref> ) dataset and Amazon customer reviews dataset<ref type="foot" target="#foot_1">2</ref> . The Stanford Twitter sentiment corpus dataset is introduced in <ref type="bibr" target="#b7">[7]</ref> and contains 1.6 million tweets automatically labeled as positive or negative based on emotions. The dataset is split to 70% (1.12M tweets) for training and 30% (480K tweets) for testing. Amazon customer reviews dataset contains 4 million reviews and star ratings; it was also split to select 70% (2.8M reviews) entries for training and 30% (1.2M reviews) for testing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Experiments and settings</head><p>The main goal of this research is to improve classification accuracy of method 30K SpeedUP introduced in <ref type="bibr" target="#b12">[12]</ref> by integrating k-Means (km 30K SpeedUP ) and PSO (LSVM P SO 30K SpeedUP ) methods presented respectively in <ref type="bibr" target="#b13">[13]</ref> and <ref type="bibr" target="#b14">[14]</ref>, also to compare the aforementioned methods with proposed method by performing comparative analysis between them. Two experiments are performed to reach the goal: one experiment with the Stanford Twitter sentiment corpus dataset (sentiment140) and second experiment with Amazon customer reviews dataset (Amazon reviews). Table <ref type="table" target="#tab_1">1</ref> shows the sizes of training and testing data for LSVM input. It is assumed that the testing subset size should be 30K instances (30%), then training data calculated dependently on subset size is 70K instances (70%). All testing data is divided into subsets containing 30K instances (the last subset is the remainder and it could contain less than 30K instances). Python programming language and scikit-learn <ref type="bibr" target="#b19">[19]</ref> library for machine learning were used to implement and evaluate the proposed method.</p><p>Computer with processor Intel(R) Core(TM) i7-4712MQ CPU 2.30 GHz and 16.00 GB installed memory (RAM) is used for experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Performance evaluation</head><p>Effectiveness is measured using statistical measures: accuracy (ACC), precision (PPV -positive predictive value and NPV -negative predictive value), recall (TPR --true positive rate and TNR --true negative rate) and F 1 score (harmonic mean of PPV and TPR).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Results</head><p>Two experiments were performed to evaluate the effectiveness of proposed method in terms of accuracy, precision, recall and F 1 score. Table <ref type="table" target="#tab_2">2</ref> presents averaged results for proposed method in comparison with 30K SpeedUP, km 30K SpeedUP and LSVM P SO 30K SpeedUP. It is worth to mention, that all experiments were redone by using the same training and testing datasets for all methods, so they can be different from results in previous works. Also k-Means method from previous work was implemented without SVM tuning part, because it is a part of PSO method. The results clearly show, that LSVM P SO km 30K SpeedUP performs better compared with 30K SpeedUP method -1.02%, km 30K SpeedUP -0.80% and LSVM P SO 30K SpeedUP -0.22%, when applied on sentiment140 dataset. In the case of Amazon reviews dataset the proposed hybrid method also performs better compared with 30K SpeedUP -0.86% and km 30K SpeedUP -0.71%, while slightly lost (0.01%) to LSVM P SO 30K SpeedUP.</p><p>Other metrics in terms of -PPV, NPV, TPR, TNR, F 1 score -also show the superiority of the proposed method compared with previously introduced methods on sentiment140 dataset, while LSVM P SO 30K SpeedUP slightly outperform it in term of TNR. In the case of Amazon reviews dataset the proposed hybrid method perform better in terms of -NPV, TPR, while slightly lost to LSVM P SO 30K SpeedUP in terms of -PPV, TNR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>The main advantage of the proposed hybrid method is that training data selection is performed with k-Means method, which ensure the variety of the training data and could positively affect PSO metaheuristics choice in finding the best cost parameter C for LSVM. When training data is selected randomly, there is a risk, that training data will contain the same data or data will be not useful and this could negatively affect accuracy in different runs; therefore, multiple runs are required for more objective results, which is affecting classification speed. The proposed method increased the classification accuracy, without minor losses in classification speed to compare with previously presented methods. It is also proved that by using only 70,000 instances for training the proposed hybrid method can classify much bigger testing datasets (starting from 480K, 1.2M etc.) with minor losses in accuracy.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Diagram of proposed hybrid method</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Experimental settings<ref type="bibr" target="#b12">[12]</ref> </figDesc><table><row><cell cols="2">Exp. Dataset</cell><cell cols="3">Testing Subset Subsets</cell><cell>Remainder</cell><cell>Calculated</cell></row><row><cell>No.</cell><cell></cell><cell cols="2">data size size</cell><cell cols="3">quantity (SQ) TDs-(Ss*SQ) training data</cell></row><row><cell></cell><cell></cell><cell>(TDs)</cell><cell>(Ss)</cell><cell>trunc(TDs/Ss)</cell><cell></cell><cell>dependently</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>on Ss</cell></row><row><cell>1</cell><cell>sentiment140</cell><cell>480K</cell><cell>30K</cell><cell>16</cell><cell>0</cell><cell>70K</cell></row><row><cell>2</cell><cell cols="2">Amazon reviews 1.2M</cell><cell>30K</cell><cell>40</cell><cell>0</cell><cell>70K</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Experimental results</figDesc><table><row><cell>Method</cell><cell>ACC</cell><cell>PPV</cell><cell>NPV</cell><cell>TPR</cell><cell>TNR</cell><cell>F 1 score</cell></row><row><cell cols="5">The Stanford Twitter sentiment corpus dataset</cell><cell></cell><cell></cell></row><row><cell>30K SpeedUP</cell><cell cols="6">77.10% 76.60% 77.62% 78.05% 76.16% 77.32%</cell></row><row><cell>km 30K SpeedUP</cell><cell cols="6">77.30% 76.78% 77.83% 78.26% 76.33% 77.51%</cell></row><row><cell>LSVM P SO 30K SpeedUP</cell><cell cols="6">77.90% 77.46% 78.35% 78.70% 77.09% 78.08%</cell></row><row><cell cols="7">LSVM P SO km 30K SpeedUP 78.12% 77.55% 78.72% 79.17% 77.08% 78.35%</cell></row><row><cell cols="5">The Amazon customer reviews dataset</cell><cell></cell><cell></cell></row><row><cell>30K SpeedUP</cell><cell cols="6">87.59% 87.50% 87.68% 87.71% 87.47% 87.60%</cell></row><row><cell>km 30K SpeedUP</cell><cell cols="6">87.74% 87.76% 87.72% 87.71% 87.76% 87.73%</cell></row><row><cell>LSVM P SO 30K SpeedUP</cell><cell cols="6">88.46% 88.63% 88.28% 88.22% 88.69% 88.43%</cell></row><row><cell cols="7">LSVM P SO km 30K SpeedUP 88.45% 88.57% 88.33% 88.29% 88.61% 88.43%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://help.sentiment140.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.kaggle.com/bittlingmayer/amazonreviews/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. I would like to thank my supervisor Prof. Dr. Gintautas Garšva for support and advices.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A training algorithm for optimal margin classifiers</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Boser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Guyon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">N</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth annual workshop on Computational learning theory</title>
				<meeting>the fifth annual workshop on Computational learning theory</meeting>
		<imprint>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="144" to="152" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Support-vector networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="273" to="297" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A new optimizer using particle swarm theory</title>
		<author>
			<persName><forename type="first">R</forename><surname>Eberhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kennedy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth International Symposium on Micro Machine and Human Science</title>
				<meeting>the Sixth International Symposium on Micro Machine and Human Science</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="39" to="43" />
		</imprint>
	</monogr>
	<note>MHS&apos;95</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">LIBLINEAR: A library for large linear classification</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1871" to="1874" />
			<date type="published" when="2008-08">Aug). 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">K-means based on active learning for support vector machine</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">L</forename><surname>Lei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer and Information Science</title>
				<imprint>
			<publisher>ICIS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m">IEEE/ACIS 16th International Conference on</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="727" to="731" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Particle swarm optimization for linear support vector machines based classifier selection</title>
		<author>
			<persName><forename type="first">G</forename><surname>Garšva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Danėnas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nonlinear Analysis: Modelling and Control</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="26" to="42" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Twitter sentiment classification using distant supervision</title>
		<author>
			<persName><forename type="first">A</forename><surname>Go</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bhayani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CS224N Project Report</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">12</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
	<note>Stanford</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Clustered support vector machines</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial Intelligence and Statistics</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="307" to="315" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Traffic fatalities prediction using support vector machine with hybrid particle swarm optimization</title>
		<author>
			<persName><forename type="first">X</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Algorithms &amp; Computational Technology</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="20" to="29" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A novel differential particle swarm optimization for parameter selection of support vector machines for monitoring metal-oxide surge arrester conditions</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">T</forename><surname>Hoang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Y</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">T</forename><surname>Vu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Swarm and Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="120" to="126" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A distributed PSO-SVM hybrid system with feature selection and parameter optimization</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Dun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied soft computing</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1381" to="1391" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SVM Accuracy and Training Speed Trade-Off in Sentiment Analysis Tasks</title>
		<author>
			<persName><forename type="first">K</forename><surname>Korovkinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Danėnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Garšva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Information and Software Technologies</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="227" to="239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">SVM and k-Means Hybrid Method for Textual Data Sentiment Analysis</title>
		<author>
			<persName><forename type="first">K</forename><surname>Korovkinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Danėnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Garšva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Baltic Journal of Modern Computing</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="47" to="60" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Support Vector Machine Parameter Tuning Based on Particle Swarm Optimization Metaheuristic</title>
		<author>
			<persName><forename type="first">K</forename><surname>Korovkinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Danėnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Garšva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nonlinear Analysis: Modelling and Control</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="266" to="281" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Reduced support vector machines</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">L</forename><surname>Mangasarian</surname></persName>
		</author>
		<author>
			<persName><surname>Rsvm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2001 SIAM International Conference on Data Mining</title>
				<meeting>the 2001 SIAM International Conference on Data Mining</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="1" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Particle swarm optimization for parameter determination and feature selection of support vector machines</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Ying</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">J</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert systems with applications</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1817" to="1824" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Some methods for classification and analysis of multivariate observations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Macqueen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</title>
				<meeting>the fifth Berkeley symposium on mathematical statistics and probability</meeting>
		<imprint>
			<date type="published" when="1967">1967</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="281" to="297" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Data subset selection for efficient SVM training</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mourad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tewfik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Vikalo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Signal Processing Conference (EUSIPCO)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="833" to="837" />
		</imprint>
	</monogr>
	<note>25th European</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011-10">Oct). 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Using tf-idf to determine word relevance in document queries</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ramos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the first instructional conference on machine learning</title>
				<meeting>the first instructional conference on machine learning</meeting>
		<imprint>
			<date type="published" when="2003-12">Dec. 2003</date>
			<biblScope unit="volume">242</biblScope>
			<biblScope unit="page" from="133" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Feature selection and hyperparameter optimization of SVM for human activity recognition</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">A</forename><surname>Sunkad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Soft Computing &amp; Machine Intelligence (ISCMI)</title>
				<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="104" to="109" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Training data reduction to speed up SVM training</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied intelligence</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="405" to="420" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">-SVM: An Effective SVM Algorithm Based on K-means Clustering</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JCP</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="2632" to="2639" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
