<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Selection of Intelligent Algorithms for Sentiment Classification Method Creation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Konstantinas Korovkinas</string-name>
          <email>konstantinas.korovkinas@knf.vu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gintautas Garsˇva</string-name>
          <email>gintautas4garsva@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Applied Informatics, Kaunas Faculty, Vilnius University</institution>
          ,
          <addr-line>Muitines Str. 8, Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>152</fpage>
      <lpage>157</lpage>
      <abstract>
        <p>-The main goal of this paper is to select two single intelligent algorithms for sentiment classification method creation. We perform set of experiments to recognize positive or negative sentiment, using single intelligent methods and combination of them. It was observed that the better results were obtained by the single methods: Logistic regression, SVM and Na¨ıve Bayes, also the combination of Logistic regression with SVM. Index Terms-Sentiment analysis, Logistic Regression, Na¨ıve Bayes classification, Support Vector Machines, Random Forest.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>Nowadays sentiment analysis is a very popular research
area. A lot of works are done, but still there are no good
enough method for sentiment classification. Many authors
declare results of average slightly above 80%, but it is not
enough if we need more accurate results.</p>
      <p>
        Pang et al. in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] evaluated the performance of Na¨ıve
Bayes, Maximum Entropy, and Support Vector Machines in
the specific domain of movie reviews, obtaining that Na¨ıve
Bayes shown the worst and SVM the best results, although the
differences aren’t very large. Later Go et al. in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] obtained
similar results with unigrams by introducing a more novel
approach to automatically classify the sentiment of Twitter
messages as either positive or negative with respect to a query
term. The same techniques were also used by Kharde and
Sonawane in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to perform sentiment analysis on Twitter
data, yet resulting in lower accuracy; again, SVM proved to
perform best. Davidov et al. in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] also stated that SVM and
Na¨ıve Bayes are the best techniques to classify the data and
can be regarded as the baseline learning methods, by applying
them for analysis based on the Twitter user defined hashtag
in tweets. Tian et al. in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] applied seven classification
algorithms: J48, Random Forest, ADTree, AdaBoostM1, Bagging,
Multilayer Perceptron and Na¨ıve Bayes for imbalanced
sentiment classification of Chinese product reviews. They found
that their proposed method helps a Support Vector Machines
(SVM) to outperform other classification methods. Singh et al.
in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] used a novel technique to predict the outcome of US
presidential elections using sentiment analysis. To accomplish
this task they used SVM. Jayalekshmi and Mathew in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
proposed a system that automatically recognize the facial
Copyright held by the author(s).
expression from the image and classify emotions for final
decision. For classification they used SVM, Random Forest and
KNN classifier. Ahmed et al. in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] did investigation on a new
approach of finding sentence level sentiment analysis using
different machine learning algorithms. SVM (Support Vector
Machines), Na¨ıve Bayes and MLP (Multilayer Perceptron)
were used for movie reviews sentiment analysis. Moreover
they used two different classifiers of Na¨ıve Bayes and two
different types of SVM kernels to identify and analyze the
difference in accuracy as well as to find the best outcome
among all the experiments. Tayade et al. in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] used sentiment
analysis through machine learning to identify the targets of
trolls, so as to prevent trolling before it happens. Na¨ıve
Bayes, Support Vector Machines (SVM) and Maximum
Entropy (MaxEnt) classifiers have shown very promising results.
Maheshwari et al. in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] applied Support Vector Machines,
Logistic Regression and Random Forests machine learners
to identify the best linguistic and non-linguistic features for
automatic classification of values and ethics. Pranckevic˘ius
and Marcinkevic˘ius in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] did experiments on short text
for product-review data from Amazon in case to compare
Na¨ıve Bayes, Random Forest, Decision Tree, Support Vector
Machines, and Logistic Regression classifiers implemented in
Apache Spark by evaluating the classification accuracy, based
on the size of training data sets, and the number of n-grams.
Ahmad et al. in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] did a review of different machine learning
techniques and algorithms (Maximum Entrophy, Random
Forest, SailAil Sentiment Analyzer, Multilayer Perceptron, Na¨ıve
Bayes and Support Vector Machines) which were applied by
the researches on movie reviews and product reviews for
the evaluation. Brito et al. in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented how different
hyperparameter combinations impact the resulting German
word vectors and how these word representations can be part
of more complex models. For prediction whether a user liked
an app given a review with three different algorithms: Logistic
Regression, Decision Trees and Random Forests. Ashok et
al. in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed a social framework, which extracts user’s
reviews, comments of restaurants and points of interest such as
events and locations, to personalize and rank suggestions based
on user preferences. Na¨ıve Bayes, Support Vector Machines
with two different kernels (Gausian and Linear), Maximum
Entropy and Random Forest have been used in this work.
      </p>
      <p>Such results led to the conclusion that Logistic Regression,
SVM, Na¨ıve Bayes and Random Forest are still prominent
for future research. Therefore, in this paper we perform
experiments with each of them also with various combinations
of two of them (depending on results of previous) to
recognize positive or negative sentiment and to compare accuracy
between them. The rest of the paper is organized as follows.
In section II, a description of techniques used in research. In
section III, presented method for combining results. In section
IV, described preparation of dataset, experiments, experimental
settings, effectiveness measure and results. In section V, we
conclude and give tasks of our future works.</p>
      <p>II. DESCRIPTION OF TECHNIQUES USED IN RESEARCH</p>
    </sec>
    <sec id="sec-2">
      <title>A. Logistic Regression</title>
      <p>
        The logistic regression model arises from the desire to
model the posterior probabilities of the K classes via linear
functions in x, while at the same time ensuring that they sum
to one and remain in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. The model has the form
log
      </p>
      <p>P r(G = 1jX = x)
log</p>
      <p>P r(G = KjX = x)</p>
      <p>P r(G = 2jX = x)
log</p>
      <p>P r(G = KjX = x)
.
.</p>
      <p>.</p>
      <p>P r(G = K 1jX = x)</p>
      <p>P r(G = KjX = x)
= 10 +
= 20 +
1T x
2T x</p>
      <p>(1)
= (K 1)0 +</p>
      <p>KT 1x
The model is specified in terms of K 1 log-odds or logit
transformations (reflecting the constraint that the probabilities
sum to one). Although the model uses the last class as the
denominator in the odds-ratios, the choice of denominator is
arbitrary in that the estimates are equivariant under this choice.
A simple calculation shows that</p>
      <p>P r(G = kjX = x) =</p>
      <p>
        exp( k0 + kT x)
1 + PK
l=11 exp( l0 + lT x)
;
k = 1; : : : ; K
1;
1
P r(G = KjX = x) =
1 + PlK=11 exp( l0 + lT x)
;
(2)
and they clearly sum to one. To emphasize the dependence on
the entire parameter set = f 10; 1T ; : : : ; (K 1)0; KT 1g,
we denote the probabilities P r(G = kjX = x) = pk(x; )
(Hastie et al. in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]).
      </p>
    </sec>
    <sec id="sec-3">
      <title>B. Na¨ıve Bayes Classification</title>
      <p>A Na¨ıve Bayes classifier is a simple probabilistic classifier
based on Bayes’ theorem and is particularly suited when the
dimensionality of the inputs are high. In text classification, the
given document is assigned a class</p>
      <p>C
= arg max p(cjd)
c
(3)
(4)
(5)
Its underlying probability model can be described as an
“independent feature model”. The Na¨ıve Bayes (NB) classifier
uses the Bayes’ rule Eq. (3),
p(cjd) =
p(c)p(djc)
p(d)
Where, p(d) plays no role in selecting C . To estimate the
term p(djc), Na¨ıve Bayes decomposes it by assuming the fi’s
are conditionally independent given d’s class as in Eq.(4),
pNB (cjd) :=
p(c)
m
Y p(fijc)ni(d)
i=1</p>
      <p>
        !
p(d)
Where, m is the no of features and fi is the feature vector.
Consider a training method consisting of a relative-frequency
estimation p(c) and p (fijc) (Pang et al. in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
      </p>
      <p>
        In our experiments are used Multinomial Na¨ıve Bayes,
presented in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. It implements the Na¨ıve Bayes algorithm for
multinomially distributed data, and is one of the two classic
Na¨ıve Bayes variants used in text classification (where the
data are typically represented as word vector counts, although
tf-idf vectors are also known to work well in practice). The
distribution is parametrized by vectors y = ( y1; : : : ; yn)
for each class y, where n is the number of features (in
text classification, the size of the vocabulary) and yi is the
probability P (xi j y) of feature i appearing in a sample
belonging to class y.
      </p>
      <p>The parameters y is estimated by a smoothed version of
maximum likelihood, i.e. relative frequency counting:
^yi =</p>
      <p>Nyi +
Ny +
n
where Nyi = Px2T xi is the number of times feature i
appears in a sample of class y in the training set T , and Ny =
PjiT=j1 Nyi is the total count of all features for class y.</p>
      <p>
        The smoothing priors 0 accounts for features not
present in the learning samples and prevents zero probabilities
in further computations. Setting = 1 is called Laplace
smoothing, while &lt; 1 is called Lidstone smoothing [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>C. Support Vector Machines</title>
      <p>
        Support vector machines were introduced by Boser et al. in
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and basically attempt to find the best possible surface to
separate positive and negative training samples. Support Vector
Machines (SVMs) are supervised learning methods used for
classification.
      </p>
      <p>
        Given training vectors xi 2 Rn, i = 1; : : : ; l, in two classes,
and an indicator vector y 2 Rl such that yi 2 f1,-1g, C
SV C (Boser et al. in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; Cortes and Vapnik in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) solves the
following primal optimization problem [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>l
min 1 wT w + C X
w;b; 2</p>
      <p>i=1
subject to yi(wT (xi) + b)
i;
i
1
where (xi) maps xi into a higher-dimensional space and C &gt;
0 is the regularization parameter. Due to the possible high
dimensionality of the vector variable w, usually we solve the
following dual problem.
where e = [1; :::; 1]T is the vector of all ones, Q is an l
by l positive semidefinite matrix, Qij yiyj K(xi; xj ), and
K(xi; xj ) (xi)T (xj ) is the kernel function.</p>
      <p>After problem (6) is solved, using the primal-dual
relationship, the optimal w satisfies.</p>
      <p>w =</p>
      <p>l
X yi i (xi)
i=1
and the decision function is</p>
      <p>
        sgn(wT (x) + b) = sgn
(Chang and Lin in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ])
      </p>
    </sec>
    <sec id="sec-5">
      <title>D. Random Forests</title>
      <p>l
X yi iK(xi; x) + b
i=1
!
(6)
(7)
(8)
(9)
(10)</p>
      <p>
        Random Forests were introduced by Leo Breiman in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
who was inspired by earlier work by Amit and Geman [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Random Forest is a tree-based ensemble with each tree
depending on a collection of random variables. More formally,
for a p-dimensional random vector X = (X1; : : : ; Xp)T
representing the real-valued input or predictor variables and
a random variable Y representing the real-valued response,
we assume an unknown joint distribution PXY (X; Y ). The
goal is to find a prediction function f(X) for predicting Y. The
prediction function is determined by a loss function L(Y, f(X))
and defined to minimize the expected value of the loss
      </p>
      <p>
        EXY (L(Y; f (X)))
where the subscripts denote expectation with respect to the
joint distribution of X and Y [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Intuitively, L(Y, f(X)) is a measure of how close f(X) is to
Y; it penalizes values of f(X) that are a long way from Y.
Typical choices of L are squared error loss L(Y; f (X)) =
(Y f (X))2 for regression and zero-one loss for classification:
L(Y; f (X)) = I(Y 6= f (X)) =
(0 if Y = f (X)
1 otherwise:
It turns out that minimizing EXY (L(Y; f (X))) for squared
error loss gives the conditional expectation
f (x) = E(Y jX = x)
(11)
(12)
(13)
otherwise known as the regression function. In the
classification situation, if the set of possible values of Y is denoted by
Y, minimizing EXY (L(Y; f (X))) for zero-one loss gives
f (x) = arg max P (Y = yjX = x)</p>
      <p>
        y2Y
otherwise known as the Bayes rule [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Ensembles
construct f in terms of a collection of so-called “base learners”
h1(x); : : : ; hJ (x) and these base learners are combined to give
the “ensemble predictor” f(x). In regression, the base learners
are averaged
      </p>
      <p>J
f (x) = 1 X hj (x)</p>
      <p>J
j=1
while in classification, f(x) is the most frequently predicted
class (“voting”)</p>
      <p>J
f (x) = arg max X I(y = hj (x))</p>
      <p>
        y2Y j=1
In Random Forests the jth base learner is a tree denoted
hj (X; j ), where j is a collection of random variables and
the j ’s are independent for j = 1; : : : ; J (Cutler et al. in
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]).
      </p>
      <p>III. THE METHOD FOR COMBINING RESULTS</p>
      <p>
        The method for combining results is presented in this
section. Proposed method is based on our introduced method
(Algorithm for sentences) in paper [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. We modified this
algorithm for using it with different machine learning
algorithms. This algorithm is presented below.
      </p>
      <p>Algorithm for combining results</p>
      <p>Input: Let us denote ML1 as the strongest classifier and
ML2 as the weakest classifier.</p>
      <p>RML1 = fML1 sent; pg – set of the first algorithm results,
obtained after performing machine learning algorithm ML1
classification; ML1 sent – sentiment;
p – the probability of classification;</p>
      <p>RML2 = fML2 sent; vg – set of the second machine learning
ML2 classification results obtained after performing ML2;
ML2 sent – sentiment;</p>
      <p>v – ML2 results value, contains “positive” or “negative”
sentiment;</p>
      <p>th2 = 0:8. The threshold value was selected by manually
investigating the results;</p>
      <p>th3 = min(RML1fpg) + ( RML1fpg n 2) 0:01 (used our
proposed formula), where RML1fpg is the standard deviation
of RML1fpg.</p>
    </sec>
    <sec id="sec-6">
      <title>Algorithm for results combining:</title>
      <p>1) Find results which are the same in both ML1 and ML2.</p>
      <p>Results = RML1 \ RML2 = fx : x 2</p>
      <sec id="sec-6-1">
        <title>RML1fML1 sentg and x 2 RML2fML2 sentgg</title>
        <p>2) Find results which are different between ML1 and ML2.</p>
      </sec>
      <sec id="sec-6-2">
        <title>RML1fML1 sentg RML2fML2 sentg and</title>
        <p>RML1fpg &lt; th2
(</p>
        <sec id="sec-6-2-1">
          <title>Results [ RML1; if jRML1fpgj &lt; th3</title>
          <p>3) Results =</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>Results [ RML2; if jRML1fpgj th3</title>
          <p>
            Output: set of classification results Results =
fSentence; Sentimentg and Accuracy (Korovkinas et al.
in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]).
          </p>
          <p>IV. EXPERIMENTS AND RESULTS</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>A. Dataset</title>
      <p>
        In this paper are used two existing datasets: The Stanford
Twitter sentiment corpus (sentiment1401 ) dataset and Amazon
customer reviews dataset2 . The Stanford Twitter sentiment
corpus dataset is introduced by Go et al. in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and contains
1.6 million tweets automatically labeled as positive or negative
based on emotions. The dataset is splitted into training dataset
70% (1.12M tweets) and testing dataset 30% (480K tweets).
Amazon customer reviews dataset contains 4 million reviews
and star ratings. The dataset is splitted into training dataset
70% (2.8M reviews) and testing dataset 30% (1.2M reviews).
      </p>
      <p>Training and testing data has been preprocessed and has
been cleaned before it was passed as the input of intelligent
algorithm. It included removing redundant tokens such as
hashtag symbols @, numbers, http for links, punctuation
symbols, etc. After cleaning was performed all datasets were
checked and empty strings were removed.</p>
    </sec>
    <sec id="sec-8">
      <title>B. Experiments</title>
      <p>In this paper are performed four experiments: two
experiments with The Stanford Twitter sentiment corpus
(sentiment140) dataset and two experiments with Amazon customer
reviews dataset.</p>
      <p>In the first and second experiments are used above described
datasets, using split into 70% for training and 30% for testing,
and apply them to four machine learning algorithms: Logistic
Regression, Na¨ıve Bayes classification, Support Vector
Machines and Random Forest.</p>
      <p>In the third and fourth experiments the best three machine
learning algorithms are selected, depending on results of the
previous experiments, for the creating various combinations
of two different single methods and apply them on above
described datasets.</p>
    </sec>
    <sec id="sec-9">
      <title>C. Experimental settings</title>
      <p>
        Data cleaning and preparing are performed with R [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
The experiments are implemented with Python programming
language and scikit-learn [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]: library for machine learning.
      </p>
      <p>Machine learning algorithms are used with their default
parameters. They are described below.</p>
    </sec>
    <sec id="sec-10">
      <title>Logistic Regression default parameters [18]:</title>
      <p>C (Inverse of regularization strength): float, default: 1.0.
dual (Dual or primal formulation): bool, default: False
fit intercept (Specifies if a constant should be added to
the decision function): bool, default: True
intercept scaling: float, default 1
1http://help.sentiment140.com/
2https://www.kaggle.com/bittlingmayer/amazonreviews/
max iter(Maximum number of iterations taken for the
solvers to converge): int, default: 100
multi class: str, default: ‘ovr’. With ‘ovr’ a binary
problem is fit for each label.
n jobs (Number of CPU cores used when parallelizing
over classes if multi class=‘ovr’): int, default: 1
penalty (Used to specify the norm used in the
penalization): str, ‘l1’ or ‘l2’, default: ‘l2’
solver (Algorithm to use in the optimization problem):
‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’, default:
‘liblinear’.</p>
      <p>
        tol (Tolerance for stopping criteria): float, default: 0.0001
Na¨ıve Bayes default parameters [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]:
alpha (Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing)): float, optional (default=1.0)
fit prior (Whether to learn class prior probabilities or
not): boolean, optional (default=True)
class prior (Prior probabilities of the classes): array-like,
size (n classes), optional (default=None)
      </p>
    </sec>
    <sec id="sec-11">
      <title>Support Vector Machines default parameters [18]:</title>
      <p>
        C (Penalty parameter C of the error term): float, optional
(default=1.0)
kernel (Specifies the kernel type to be used in the
algorithm): string, optional (default=‘rbf’). We used ‘linear’
kernel instead.
loss (Specifies the loss function): string, ‘hinge’
or ‘squared hinge’ (default=‘squared hinge’).
‘squared hinge’ is the square of the hinge loss.
max iter (The maximum number of iterations to be run):
int, (default=1000)
multi class (Determines the multi-class strategy if y
contains more than two classes): string, ‘ovr’ or
‘crammer singer’ (default=‘ovr’). ‘ovr’ trains n classes
onevs-rest classifiers.
penalty (Specifies the norm used in the penalization):
string, ‘l1’ or ‘l2’ (default=‘l2’)
tol (Tolerance for stopping criteria): float, optional
(default=0,0001)
Random Forest default parameters [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]:
n estimators (The number of trees in the forest): integer,
optional (default=10)
max features (The number of features to consider when
looking for the best split): int, float, string or None,
optional (default=“auto”)
max depth (The maximum depth of the tree): integer or
None, optional (default=None)
min samples split (The minimum number of samples
required to split an internal node): int, float, optional
(default=2)
min samples leaf (The minimum number of samples
required to be at a leaf node): int, float, optional (default=1)
min weight fraction leaf (The minimum weighted
fraction of the sum total of weights (of all the input samples)
required to be at a leaf node): float, optional (default=0.0)
max leaf nodes (Grow trees with max leaf nodes in
best-first fashion. Best nodes are defined as relative
reduction in impurity): int or None, optional (default=None)
min impurity decrease (A node will be split if this split
induces a decrease of the impurity greater than or equal
to this value): float, optional (default=0.0)
bootstrap (Whether bootstrap samples are used when
building trees): boolean, optional (default=True)
oob score (Whether to use out-of-bag samples to estimate
the generalization accuracy): bool (default=False)
n jobs (The number of jobs to run in parallel for both fit
and predict): integer, optional (default=1)
verbose (Controls the verbosity of the tree building
process): int, optional (default=0)
warm start : bool, optional (default=False)
criterion (The function to measure the quality of a split):
string, optional (default=“gini”)
      </p>
    </sec>
    <sec id="sec-12">
      <title>D. Effectiveness</title>
      <p>
        Effectiveness is measured using statistical measures:
accuracy (ACC), precision (PPV – positive predictive value and
NPV – negative predictive value), recall (TPR – true positive
rate and TNR – true negative rate) and F1 (Harmonic mean of
PPV and TPR). Formulas are presented below (Sammut and
Webb in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]):
Accuracy (ACC):
      </p>
      <p>ACC =</p>
      <p>T P + T N</p>
      <p>T P + T N + F P + F N
Positive predictive value (PPV):
Negative predictive value (NPV):
True positive rate (TPR):
True negative rate (TNR):
Harmonic mean of PPV and TPR (F1):</p>
      <p>P P V =
N P V =
T P R =
T N R =</p>
      <p>T P
T P + F P</p>
      <p>T N
T N + F N</p>
      <p>T P
T P + F N</p>
      <p>T N</p>
      <p>T N + F P
F1 =</p>
      <p>2</p>
      <p>TABLE I contains the results of standard single machine
learning algorithms with their default parameters. Results
show that Logistic Regression (LR) obtained the best
accuracy (ACC) in both experiments 79,67% and 90,21%. Other
methods are arranged in the following order: SVM (ACC)
– 79,16% and 90,00%, Na¨ıve Bayes classification (ACC) –
76,72% and 84,18%, Random Forest (ACC) – 75,81% and
80,15%.</p>
      <p>The better accuracy obtained when was used Amazon
reviews dataset, while it significantly bigger than sentiment140
dataset. This happened because tweets are very short, contain
noises, slangs, acronyms and etc.</p>
      <p>Logistic Regression and SVM provided more uniform
recognition of both classes; PPV, NPV, TPR, TNR, F1, are
almost even, compared to other methods.</p>
      <p>Depending on results presented in TABLE I, for the further
experiments were selected Logistic Regression, SVM and
Na¨ıve Bayes. Various combinations of two different single
algorithms were performed in these experiments.</p>
      <p>Table II shows that using proposed method (see Section
III) for combination of two single methods let us to obtain the
better accuracy to compare with a single method.
where TP – count of correctly classified “positive”
sentiments, TN – count of correctly classified “negative”
sentiments. FP – count of incorrectly classified “positive”
sentiments. FN – count of incorrectly classified “negative”
sentiments.</p>
      <p>LR-SVM (Logistic Regression and SVM combination)
shows the better accuracy (ACC) 81,83% and 90,22%, while
(ACC) of other combinations are smaller: LR-NB (Logistic
Regression and Na¨ıve Bayes combination) – 79,81% and
90,22%, SVM-NB (SVM and Na¨ıve Bayes combination) –
79,26% and 89,98%. Our introduced method also
outperformed single LR algorithm in all experiments, except the
fourth experiment where SVM-NB obtained accuracy (ACC)
89,98% to compare with Logistic Regression 90,21%.</p>
      <p>Our method also provided more uniform recognition of both
classes PPV, NPV, TPR, TNR, F1.</p>
      <p>V. CONCLUSIONS AND FUTURE WORK</p>
      <p>The main idea of this paper was to select two single
intelligent algorithms to create a combined method for sentiment
classification.</p>
      <p>Results show that combination of two almost equal
intelligent methods, which shown the best results (in our case
Logistic Regression and SVM) can obtain the bigger accuracy
(ACC) 81,83% and 90,22% to compare with the best results
obtained single method like Logistic Regression 79,67% and
90,21%.</p>
      <p>Combination between the strongest and the weakest method
(in our case Na¨ıve Bayes classification with accuracy (ACC)
79,81% and 90,22%) also outperform the best results obtained
single method Logistic Regression.</p>
      <p>The main advantage of methods combination is that
combined method provided more uniform recognition of both
classes PPV, NPV, TPR, TNR, F1 to compare with Na¨ıve
Bayes and Random Forest.</p>
      <p>
        Such results let to conclude that Logistic Regression and
SVM, and combination of these methods fit the best for our
further work. Our method presented in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] can be applied
with different algorithms and obtain the better classification
accuracy. The goal of this approach was to test proposed
method with existing datasets to be able in the future continue
work with real-world data.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aftab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Muhammad</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          .
          <article-title>Machine learning techniques for sentiment analysis: A review</article-title>
          .
          <source>Int. J. Multidiscip. Sci. Eng</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. U.</given-names>
            <surname>Sazzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Azad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Islam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Ali</surname>
          </string-name>
          . Challenges,
          <article-title>comparative analysis and a proposed methodology to predict sentiment from movie reviews using machine learning</article-title>
          .
          <source>In Big Data Analytics and Computational Intelligence (ICBDAC)</source>
          , 2017 International Conference on, pages
          <fpage>86</fpage>
          -
          <lpage>91</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Amit</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Geman</surname>
          </string-name>
          .
          <article-title>Shape quantization and recognition with randomized trees</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1545</fpage>
          -
          <lpage>1588</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ashok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. V.</given-names>
            <surname>Joshi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kamath</surname>
          </string-name>
          .
          <article-title>A personalized recommender system using machine learning based sentiment analysis over social data</article-title>
          .
          <source>In Electrical, Electronics and Computer Science (SCEECS)</source>
          ,
          <source>2016 IEEE Students' Conference on</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Boser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Guyon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>A training algorithm for optimal margin classifiers</article-title>
          .
          <source>In Proceedings of the fifth annual workshop on Computational learning theory</source>
          , pages
          <fpage>144</fpage>
          -
          <lpage>152</lpage>
          . ACM,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cvejoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ojeda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bauckhage</surname>
          </string-name>
          .
          <article-title>Towards german word embeddings: A use case with predictive sentiment analysis</article-title>
          .
          <source>In Data Science-Analytics and Applications</source>
          , pages
          <fpage>59</fpage>
          -
          <lpage>62</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.-J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Libsvm: a library for support vector machines</article-title>
          .
          <source>ACM transactions on intelligent systems and technology (TIST)</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>27</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cutler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cutler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>In Ensemble machine learning</source>
          , pages
          <fpage>157</fpage>
          -
          <lpage>175</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Davidov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tsur</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rappoport</surname>
          </string-name>
          .
          <article-title>Enhanced sentiment learning using twitter hashtags and smileys</article-title>
          .
          <source>In Proceedings of the 23rd international conference on computational linguistics: posters</source>
          , pages
          <fpage>241</fpage>
          -
          <lpage>249</lpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Go</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhayani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          .
          <source>CS224N Project Report</source>
          , Stanford,
          <volume>1</volume>
          (
          <issue>12</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jayalekshmi</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mathew</surname>
          </string-name>
          .
          <article-title>Facial expression recognition and emotion classification system for sentiment analysis</article-title>
          .
          <source>In Networks &amp; Advances in Computational Technologies (NetACT)</source>
          , 2017 International Conference on, pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kharde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sonawane</surname>
          </string-name>
          , et al.
          <article-title>Sentiment analysis of twitter data: a survey of techniques</article-title>
          .
          <source>arXiv preprint arXiv:1601.06971</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Korovkinas</surname>
          </string-name>
          , P. Dane˙nas, and G. Garsˇva.
          <article-title>Svm and na¨ıve bayes classification ensemble method for sentiment analysis</article-title>
          .
          <source>Baltic Journal of Modern Computing</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>398</fpage>
          -
          <lpage>409</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Maheshwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Reganti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jamatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Gamba¨ck, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Das</surname>
          </string-name>
          .
          <article-title>A societal sentiment analysis: Predicting the values and ethics of individuals by analysing social media content</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          , pages
          <fpage>731</fpage>
          -
          <lpage>741</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          .
          <article-title>Thumbs up?: sentiment classification using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL02 conference on Empirical methods in natural language processingVolume 10</source>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . Association for Computational Linguistics,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          , et al.
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>12</volume>
          (Oct):
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pranckevicˇius</surname>
          </string-name>
          and V. Marcinkevicˇius.
          <article-title>Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification</article-title>
          .
          <source>Baltic Journal of Modern Computing</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>221</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sammut</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Webb</surname>
          </string-name>
          .
          <source>Encyclopedia of machine learning. Springer Science &amp; Business Media</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sawhney</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Kahlon</surname>
          </string-name>
          .
          <article-title>Forecasting the 2016 us presidential elections using sentiment analysis</article-title>
          .
          <source>In Conference on eBusiness, e-Services and e-Society</source>
          , pages
          <fpage>412</fpage>
          -
          <lpage>423</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Starczewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pabiasz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vladymyrska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marvuglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Woz´niak. Self organizing maps for 3d face understanding</article-title>
          .
          <source>In International Conference on Artificial Intelligence and Soft Computing</source>
          , pages
          <fpage>210</fpage>
          -
          <lpage>217</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>P. M. Tayade</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          <string-name>
            <surname>Shaikh</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Deshmukh</surname>
          </string-name>
          .
          <article-title>To discover trolling patterns in social media: Troll filter</article-title>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Team</surname>
          </string-name>
          et al. R:
          <article-title>A language and environment for statistical computing</article-title>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-M. Chao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lan</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yue</surname>
          </string-name>
          .
          <article-title>A topic sentence-based instance transfer method for imbalanced sentiment classification of chinese product reviews</article-title>
          .
          <source>Electronic Commerce Research and Applications</source>
          ,
          <volume>16</volume>
          :
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Trevor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Robert</surname>
          </string-name>
          , and
          <string-name>
            <surname>F. JH.</surname>
          </string-name>
          <article-title>The elements of statistical learning: data mining, inference, and prediction</article-title>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Venckauskas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpavicius</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Damasˇevicˇius</surname>
          </string-name>
          , R. Marcinkevicˇius, J. Kapocˇiu¯te-Dzikiene´, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          .
          <article-title>Open class authorship attribution of lithuanian internet comments using one-class classifier</article-title>
          .
          <source>In Federated Conference on Computer Science and Information Systems (FedCSIS)</source>
          , pages
          <fpage>373</fpage>
          -
          <lpage>382</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>