<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hybrid Method for Textual Data Classi cation Based on Support Vector Machine with Particle Swarm Optimization Metaheuristic and k-Means Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Konstantinas Korovkinas</string-name>
          <email>konstantinas.korovkinas@knf.vu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kaunas Faculty, Vilnius University</institution>
          ,
          <addr-line>Muitines str. 8, LT-44280, Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces a hybrid method for textual data classi cation. The goal of this paper is to improve classi cation accuracy of method presented in our previous work by integrating to it k-Means method for decreasing training dataset and particle swarm optimization metaheuristic for a linear support vector machine parameter tuning. The paper reports that the introduced method is characterized by higher improvements in all e ectiveness metrics than the methods presented in our previous works.</p>
      </abstract>
      <kwd-group>
        <kwd>Support Vector Machine k-Means</kwd>
        <kwd>Textual Data Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Textual data analysis became a very popular since people started using Internet,
to be more concrete when e-shops, social networks, Blogs etc., where people can
write their comments, appeared. This area is considered as a very challenging
{ although a lot of work has been done in this eld, accuracy is still rather
average due to comments, slang, smiles etc. A Support Vector Machine (SVM)
is one of the most widely used method which has proved its e ciency in di erent
tasks and domains. It is very exible to parameter tuning, as well as internal
modi cations, which allows to improve its performance and accuracy. However,
despite all advantages, typical for SVM algorithm is, that it is characterized
by slow performance in the big data arrays. The higher number of features is,
the longer computation time it requires. There have been a number of e orts
to speed up SVM, and most of them focus on reduction of the training set
[
        <xref ref-type="bibr" rid="ref15 ref18 ref22">15, 18, 22</xref>
        ]. One of the most widely known and promising method for that is
k-Means [
        <xref ref-type="bibr" rid="ref23 ref5 ref8">5, 8, 23</xref>
        ], which can be used as standalone method or in combination
with others. Aforementioned authors conclude, that properly selected training
data can improve executing time with no losing or similar accuracy. Increasing
accuracy is another common problem. Particle swarm optimization (PSO) is a
very promising option [
        <xref ref-type="bibr" rid="ref11 ref16 ref9">9, 11, 16</xref>
        ]. One of its strengths is combination with other
evolutionary methods. Its e ciency is also proved in SVM parameter selection
tasks [
        <xref ref-type="bibr" rid="ref10 ref21 ref6">6, 10, 21</xref>
        ].
      </p>
      <p>
        Motivated by these improvements, this paper proposes a hybrid method for
textual data classi cation, which is suitable to work with large datasets. The
proposed hybrid method is a combination of three methods: SVM, k-Means and
PSO, which are integrated into SpeedUP method, earlier presented in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Standalone SpeedUP method increased SVM classi cation speed, while slightly lost
to ordinary SVM method in terms of accuracy [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Considering on it, were
proposed two separate methods for improving classi cation accuracy: k-Means
method in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] { for training data reduction and PSO in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] { for nding the
best cost (penalty) parameter for SVM. Both aforementioned methods still lost
to ordinary SVM, when are used separately, so it led to conclusion of possibility
to combine these methods. The rest of the paper is organized as follows.
Section 2 introduces the methods which were used in the experiments and proposed
method is described, whereas Section 3 gives a description of datasets and
experimental settings used to evaluate proposed approach, together with results
obtained during experimenting. Finally, Section 4 outlines the conclusions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        This section shortly presents the methods relevant to research presented in this
paper: Support Vector Machine [
        <xref ref-type="bibr" rid="ref1 ref2 ref4">1, 2, 4</xref>
        ], k-Means [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Particle Swarm
Optimization [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Term Frequency | Inverse Document Frequency (TF-IDF) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The
brief description of a proposed hybrid method, also presented herein.
2.1
      </p>
      <p>
        Relevant methods
TF-IDF Since machine learning algorithms cannot work with text data directly,
it should be converted into vector of numbers. TF-IDF works by determining
the relative frequency of words in a speci c document compared to the inverse
proportion of that word over the entire document corpus. This calculation
determines how relevant a given word is in particular document. T dfVectorizer
module from scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] library is used to implement TF-IDF.
Support Vector Machine Herein is used linear SVM (LSVM), which is
optimized for large-scale learning. LinearSVC module from scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] library
is used to implement LSVM.
k-Means The main idea of this method is to partition the input dataset into
k clusters, represented by adaptively-changing centroids. k-Means computes the
squared distances between the input data points and centroids, and assigns
inputs to the nearest centroid. KMeans module from scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] library used
to implement k-Means method.
      </p>
      <p>Particle Swarm Optimization It is a population-based stochastic
metaheuristic algorithm for solving continuous and discrete optimization problems.
Herein is used global best variant, which was manually programmed using Python
language and adopted for LSVM parameter tuning for textual data classi cation
tasks.
2.2</p>
      <p>The proposed method
The proposed hybrid method (LSVMP SO km 30K SpeedUP ) is a combination
of three methods: SVM, k-Means and PSO, which are integrated into SpeedUP
method. The main idea of it is to reduce training dataset size regarding to subset
size. Thus the testing dataset is split into equal subsets and the size of training
data is calculated on the basis of the rst subset size. k-Means and PSO methods
are used for increasing accuracy of SpeedUP.</p>
      <p>Trn - Trn(kmi)
k-Means method</p>
      <p>Training data
Training data
for tuning
Testing data
for tuning
PSO method
SpeedUP method</p>
      <p>Training dataset (Trn)</p>
      <p>Testing dataset (td)
TFIDF
f(x)</p>
      <p>Subsetsize</p>
      <p>y(x)
Trn(km1)</p>
      <p>Trn(km2) : : : Trn(kmi)
td(kmi)</p>
      <p>LSVM
R(km1)</p>
      <p>R(km2) : : : R(kmi)</p>
      <p>Select Best Trn(km)
TFIDF</p>
      <p>Best C value
PSO search</p>
      <p>LSVM
r1
r2
.
.
.
ri</p>
      <p>TFIDF
td1
td2
.
.
.</p>
      <p>tdi</p>
      <p>Results (r)</p>
      <p>Fig. 1. Diagram of proposed hybrid method
Trn { training dataset
td { testing dataset
Subsetsize { size of testing data subsets tdi into which testing dataset is
divided
y(x) { function which divides testing dataset into subsets according Subsetsize
f (x) { function which calculates training data size according Subsetsize and
de nes the number of training data sets
Trn(kmi) { sets of training data after k-Means method is applied
td(kmi) { testing data for k-Means method
R(kmi) { results of every training data subset
Select Best Trn(km) { method, which returns the best training data
selected by k-Means method according to the results R(kmi)
ri { sets of results achieved from each subset
Results { the nal result set, which contains results of all ri sets
Diagram consists of steps as follows:
1. Before passing to SpeedUP method, training and testing datasets are
preprocessed. Preprocessing contains two actions: text preprocessing and data
cleaning. Text preprocessing includes actions like converting to lowercase,
removing redundant tokens such as hashtag, symbols @, numbers, \http" for
links, punctuation symbols, usernames etc. Data cleaning performs dataset
checking for empty strings and removing them.
2. Depending on Subsetsize, function f (x) calculates training data size and
number of sets for k-Means method.
3. Selected training data is converted into vector of numbers with TF-IDF and
passed to k-Means method for the selection of the best training data for
SpeedUP method, which is performed depending on results R(kmi).
4. The best training data selected is converted into vector of numbers with
TFIDF and passed to PSO method, which returns the best C value for LSVM.
After, the same training data is passed into LSVM. LSVM is trained with it
and parameter C is set to one, which is returned from PSO method.
5. Depending on Subsetsize testing dataset is dividing into subsets td1, td2 etc.</p>
      <p>(function y(x)).
6. Subsets are converted into vector of numbers with TF-IDF and are passed
to LSVM algorithm one by one and achieved results are stored in separate
sets r1 { for td1, r2 { for td2 etc.
7. The results are combined into one result set { Results.</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and results</title>
      <p>
        Dataset
In this paper are used two existing labeled datasets available: The Stanford
Twitter sentiment corpus (sentiment1401) dataset and Amazon customer reviews
dataset2. The Stanford Twitter sentiment corpus dataset is introduced in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
contains 1.6 million tweets automatically labeled as positive or negative based
on emotions. The dataset is split to 70% (1.12M tweets) for training and 30%
(480K tweets) for testing. Amazon customer reviews dataset contains 4 million
reviews and star ratings; it was also split to select 70% (2.8M reviews) entries
for training and 30% (1.2M reviews) for testing.
3.2
      </p>
      <p>
        Experiments and settings
The main goal of this research is to improve classi cation accuracy of method
30K SpeedUP introduced in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] by integrating k-Means (km 30K SpeedUP ) and
PSO (LSVMP SO 30K SpeedUP ) methods presented respectively in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
also to compare the aforementioned methods with proposed method by
performing comparative analysis between them. Two experiments are performed to reach
the goal: one experiment with the Stanford Twitter sentiment corpus dataset
(sentiment140) and second experiment with Amazon customer reviews dataset
(Amazon reviews). Table 1 shows the sizes of training and testing data for LSVM
input. It is assumed that the testing subset size should be 30K instances (30%),
then training data calculated dependently on subset size is 70K instances (70%).
All testing data is divided into subsets containing 30K instances (the last subset
is the remainder and it could contain less than 30K instances).
      </p>
      <p>Exp. Dataset
No.
1 http://help.sentiment140.com/
2 https://www.kaggle.com/bittlingmayer/amazonreviews/
E ectiveness is measured using statistical measures: accuracy (ACC), precision
(PPV { positive predictive value and NPV { negative predictive value), recall
(TPR -{ true positive rate and TNR -{ true negative rate) and F1score
(harmonic mean of PPV and TPR).
3.4</p>
      <p>Results
Two experiments were performed to evaluate the e ectiveness of proposed method
in terms of accuracy, precision, recall and F1score. Table 2 presents averaged
results for proposed method in comparison with 30K SpeedUP, km 30K SpeedUP
and LSVMP SO 30K SpeedUP. It is worth to mention, that all experiments were
redone by using the same training and testing datasets for all methods, so they
can be di erent from results in previous works. Also k-Means method from
previous work was implemented without SVM tuning part, because it is a part of
PSO method.</p>
      <p>The results clearly show, that LSVMP SO km 30K SpeedUP performs better
compared with 30K SpeedUP method { 1.02%, km 30K SpeedUP { 0.80% and
LSVMP SO 30K SpeedUP { 0.22%, when applied on sentiment140 dataset. In
the case of Amazon reviews dataset the proposed hybrid method also performs
better compared with 30K SpeedUP { 0.86% and km 30K SpeedUP { 0.71%,
while slightly lost (0.01%) to LSVMP SO 30K SpeedUP.</p>
      <p>Other metrics in terms of | PPV, NPV, TPR, TNR, F1score { also show
the superiority of the proposed method compared with previously introduced
methods on sentiment140 dataset, while LSVMP SO 30K SpeedUP slightly
outperform it in term of TNR. In the case of Amazon reviews dataset the proposed
hybrid method perform better in terms of | NPV, TPR, while slightly lost to
LSVMP SO 30K SpeedUP in terms of | PPV, TNR.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>The main advantage of the proposed hybrid method is that training data
selection is performed with k-Means method, which ensure the variety of the training
data and could positively a ect PSO metaheuristics choice in nding the best
cost parameter C for LSVM. When training data is selected randomly, there
is a risk, that training data will contain the same data or data will be not
useful and this could negatively a ect accuracy in di erent runs; therefore,
multiple runs are required for more objective results, which is a ecting classi cation
speed. The proposed method increased the classi cation accuracy, without
minor losses in classi cation speed to compare with previously presented methods.
It is also proved that by using only 70,000 instances for training the proposed
hybrid method can classify much bigger testing datasets (starting from 480K,
1.2M etc.) with minor losses in accuracy.</p>
      <p>Acknowledgments. I would like to thank my supervisor Prof. Dr. Gintautas
Garsva for support and advices.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Boser</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V. N.</given-names>
          </string-name>
          <article-title>A training algorithm for optimal margin classi ers</article-title>
          .
          <source>In: Proceedings of the fth annual workshop on Computational learning theory</source>
          ,
          <volume>144</volume>
          {
          <fpage>152</fpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <volume>273</volume>
          {
          <fpage>297</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>A new optimizer using particle swarm theory</article-title>
          .
          <source>In: MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science</source>
          ,
          <volume>39</volume>
          {
          <fpage>43</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C. J. LIBLINEAR</given-names>
          </string-name>
          :
          <article-title>A library for large linear classi cation</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>9</volume>
          (Aug),
          <year>1871</year>
          {
          <year>1874</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>Q.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>K-means based on active learning for support vector machine</article-title>
          .
          <source>In: Computer and Information Science (ICIS)</source>
          ,
          <year>2017</year>
          IEEE/ACIS 16th International Conference on,
          <volume>727</volume>
          {
          <fpage>731</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Garsva</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Dane_nas, P. Particle swarm optimization for linear support vector machines based classi er selection</article-title>
          .
          <source>Nonlinear Analysis: Modelling and Control</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <volume>26</volume>
          {
          <fpage>42</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Go</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhayani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Twitter sentiment classi cation using distant supervision</article-title>
          .
          <source>CS224N Project Report</source>
          , Stanford,
          <volume>1</volume>
          (
          <issue>12</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , Han,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Clustered support vector machines</article-title>
          .
          <source>In: Arti cial Intelligence and Statistics</source>
          ,
          <volume>307</volume>
          {
          <fpage>315</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Tra c fatalities prediction using support vector machine with hybrid particle swarm optimization</article-title>
          .
          <source>Journal of Algorithms &amp; Computational Technology</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <volume>20</volume>
          {
          <fpage>29</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hoang</surname>
          </string-name>
          , T. T.,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>M. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vu</surname>
            ,
            <given-names>Q. T.</given-names>
          </string-name>
          <article-title>A novel di erential particle swarm optimization for parameter selection of support vector machines for monitoring metal-oxide surge arrester conditions</article-title>
          .
          <source>Swarm and Evolutionary Computation</source>
          ,
          <volume>38</volume>
          , 120{
          <fpage>126</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dun</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          <article-title>A distributed PSO{SVM hybrid system with feature selection and parameter optimization</article-title>
          .
          <source>Applied soft computing</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ),
          <volume>1381</volume>
          {
          <fpage>1391</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Korovkinas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Dane_nas,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Garsva</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>SVM Accuracy and Training Speed TradeO in Sentiment Analysis Tasks</article-title>
          .
          <source>In: International Conference on Information and Software Technologies</source>
          ,
          <volume>227</volume>
          {
          <fpage>239</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Korovkinas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Dane_nas,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Garsva</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>SVM and k-Means Hybrid Method for Textual Data Sentiment Analysis</article-title>
          .
          <source>Baltic Journal of Modern Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>47</volume>
          {
          <fpage>60</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Korovkinas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Dane_nas,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Garsva</surname>
          </string-name>
          , G.
          <source>Support Vector Machine Parameter Tuning Based on Particle Swarm Optimization Metaheuristic. Nonlinear Analysis: Modelling and Control</source>
          ,
          <volume>25</volume>
          (
          <issue>2</issue>
          ),
          <volume>266</volume>
          {
          <fpage>281</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Y.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mangasarian</surname>
            ,
            <given-names>O.L. RSVM</given-names>
          </string-name>
          :
          <article-title>Reduced support vector machines</article-title>
          .
          <source>In: Proceedings of the 2001 SIAM International Conference on Data Mining</source>
          ,
          <volume>1</volume>
          {
          <fpage>17</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>S. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ying</surname>
            ,
            <given-names>K. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Z. J.</given-names>
          </string-name>
          <article-title>Particle swarm optimization for parameter determination and feature selection of support vector machines</article-title>
          .
          <source>Expert systems with applications</source>
          ,
          <volume>35</volume>
          (
          <issue>4</issue>
          ),
          <year>1817</year>
          {
          <year>1824</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Some methods for classi cation and analysis of multivariate observations</article-title>
          .
          <source>In: Proceedings of the fth Berkeley symposium on mathematical statistics and probability</source>
          , vol.
          <volume>1</volume>
          , No.
          <volume>14</volume>
          ,
          <issue>281</issue>
          {
          <fpage>297</fpage>
          (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mourad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>Tew k, A</article-title>
          .,
          <string-name>
            <surname>Vikalo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Data subset selection for e cient SVM training</article-title>
          .
          <source>In: Signal Processing Conference (EUSIPCO)</source>
          ,
          <year>2017</year>
          25th European,
          <volume>833</volume>
          {
          <fpage>837</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>12</volume>
          (Oct),
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ramos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Using tf-idf to determine word relevance in document queries</article-title>
          .
          <source>In Proceedings of the rst instructional conference on machine learning</source>
          ,
          <volume>242</volume>
          (Dec),
          <volume>133</volume>
          {
          <fpage>142</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sunkad</surname>
            ,
            <given-names>Z. A.</given-names>
          </string-name>
          <article-title>Feature selection and hyperparameter optimization of SVM for human activity recognition</article-title>
          .
          <source>In: 2016 3rd International Conference on Soft Computing &amp; Machine Intelligence (ISCMI)</source>
          ,
          <volume>104</volume>
          {
          <fpage>109</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Training data reduction to speed up SVM training</article-title>
          .
          <source>Applied intelligence</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <volume>405</volume>
          {
          <fpage>420</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <article-title>K-SVM: An E ective SVM Algorithm Based on K-means Clustering</article-title>
          .
          <source>JCP</source>
          ,
          <volume>8</volume>
          (
          <issue>10</issue>
          ),
          <volume>2632</volume>
          {
          <fpage>2639</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>