Selection of Intelligent Algorithms for Sentiment
             Classification Method Creation
                        Konstantinas Korovkinas                                         Gintautas Garšva
          Institute of Applied Informatics, Kaunas Faculty               Institute of Applied Informatics, Kaunas Faculty
                           Vilnius University                                             Vilnius University
                  Muitines Str. 8, Kaunas, Lithuania                             Muitines Str. 8, Kaunas, Lithuania
                   konstantinas.korovkinas@knf.vu.lt                                gintautas4garsva@gmail.com


   Abstract—The main goal of this paper is to select two single in-   expression from the image and classify emotions for final de-
telligent algorithms for sentiment classification method creation.    cision. For classification they used SVM, Random Forest and
We perform set of experiments to recognize positive or negative       KNN classifier. Ahmed et al. in [2] did investigation on a new
sentiment, using single intelligent methods and combination of
them. It was observed that the better results were obtained by        approach of finding sentence level sentiment analysis using
the single methods: Logistic regression, SVM and Naı̈ve Bayes,        different machine learning algorithms. SVM (Support Vector
also the combination of Logistic regression with SVM.                 Machines), Naı̈ve Bayes and MLP (Multilayer Perceptron)
   Index Terms—Sentiment analysis, Logistic Regression, Naı̈ve        were used for movie reviews sentiment analysis. Moreover
Bayes classification, Support Vector Machines, Random Forest.         they used two different classifiers of Naı̈ve Bayes and two
                                                                      different types of SVM kernels to identify and analyze the
                         I. I NTRODUCTION                             difference in accuracy as well as to find the best outcome
    Nowadays sentiment analysis is a very popular research            among all the experiments. Tayade et al. in [23] used sentiment
area. A lot of works are done, but still there are no good            analysis through machine learning to identify the targets of
enough method for sentiment classification. Many authors              trolls, so as to prevent trolling before it happens. Naı̈ve
declare results of average slightly above 80%, but it is not          Bayes, Support Vector Machines (SVM) and Maximum En-
enough if we need more accurate results.                              tropy (MaxEnt) classifiers have shown very promising results.
   Pang et al. in [17] evaluated the performance of Naı̈ve            Maheshwari et al. in [16] applied Support Vector Machines,
Bayes, Maximum Entropy, and Support Vector Machines in                Logistic Regression and Random Forests machine learners
the specific domain of movie reviews, obtaining that Naı̈ve           to identify the best linguistic and non-linguistic features for
Bayes shown the worst and SVM the best results, although the          automatic classification of values and ethics. Pranckevic̆ius
differences aren’t very large. Later Go et al. in [12] obtained       and Marcinkevic̆ius in [19] did experiments on short text
similar results with unigrams by introducing a more novel             for product-review data from Amazon in case to compare
approach to automatically classify the sentiment of Twitter           Naı̈ve Bayes, Random Forest, Decision Tree, Support Vector
messages as either positive or negative with respect to a query       Machines, and Logistic Regression classifiers implemented in
term. The same techniques were also used by Kharde and                Apache Spark by evaluating the classification accuracy, based
Sonawane in [14] to perform sentiment analysis on Twitter             on the size of training data sets, and the number of n-grams.
data, yet resulting in lower accuracy; again, SVM proved to           Ahmad et al. in [1] did a review of different machine learning
perform best. Davidov et al. in [11] also stated that SVM and         techniques and algorithms (Maximum Entrophy, Random For-
Naı̈ve Bayes are the best techniques to classify the data and         est, SailAil Sentiment Analyzer, Multilayer Perceptron, Naı̈ve
can be regarded as the baseline learning methods, by applying         Bayes and Support Vector Machines) which were applied by
them for analysis based on the Twitter user defined hashtag           the researches on movie reviews and product reviews for
in tweets. Tian et al. in [25] applied seven classification algo-     the evaluation. Brito et al. in [7] presented how different
rithms: J48, Random Forest, ADTree, AdaBoostM1, Bagging,              hyperparameter combinations impact the resulting German
Multilayer Perceptron and Naı̈ve Bayes for imbalanced sen-            word vectors and how these word representations can be part
timent classification of Chinese product reviews. They found          of more complex models. For prediction whether a user liked
that their proposed method helps a Support Vector Machines            an app given a review with three different algorithms: Logistic
(SVM) to outperform other classification methods. Singh et al.        Regression, Decision Trees and Random Forests. Ashok et
in [21] used a novel technique to predict the outcome of US           al. in [4] proposed a social framework, which extracts user’s
presidential elections using sentiment analysis. To accomplish        reviews, comments of restaurants and points of interest such as
this task they used SVM. Jayalekshmi and Mathew in [13]               events and locations, to personalize and rank suggestions based
proposed a system that automatically recognize the facial             on user preferences. Naı̈ve Bayes, Support Vector Machines
                                                                      with two different kernels (Gausian and Linear), Maximum
  Copyright held by the author(s).                                    Entropy and Random Forest have been used in this work.


                                                                  152
   Such results led to the conclusion that Logistic Regression,       Its underlying probability model can be described as an
SVM, Naı̈ve Bayes and Random Forest are still prominent               “independent feature model”. The Naı̈ve Bayes (NB) classifier
for future research. Therefore, in this paper we perform              uses the Bayes’ rule Eq. (3),
experiments with each of them also with various combinations
                                                                                                        p(c)p(d|c)
of two of them (depending on results of previous) to recog-                                  p(c|d) =                                  (3)
                                                                                                           p(d)
nize positive or negative sentiment and to compare accuracy
between them. The rest of the paper is organized as follows.          Where, p(d) plays no role in selecting C ∗ . To estimate the
In section II, a description of techniques used in research. In       term p(d|c), Naı̈ve Bayes decomposes it by assuming the fi ’s
section III, presented method for combining results. In section       are conditionally independent given d’s class as in Eq.(4),
IV, described preparation of dataset, experiments, experimental                                       m
                                                                                                      Y
                                                                                                                        !
settings, effectiveness measure and results. In section V, we                                  p(c)      p(fi |c)ni (d)
conclude and give tasks of our future works.                                                              i=1
                                                                                   pN B (c|d) :=                                       (4)
                                                                                                             p(d)
  II. D ESCRIPTION OF TECHNIQUES USED IN RESEARCH
                                                                      Where, m is the no of features and fi is the feature vector.
A. Logistic Regression                                                Consider a training method consisting of a relative-frequency
   The logistic regression model arises from the desire to            estimation p(c) and p (fi |c) (Pang et al. in [17]).
model the posterior probabilities of the K classes via linear            In our experiments are used Multinomial Naı̈ve Bayes,
functions in x, while at the same time ensuring that they sum         presented in [18]. It implements the Naı̈ve Bayes algorithm for
to one and remain in [0, 1]. The model has the form                   multinomially distributed data, and is one of the two classic
                                                                      Naı̈ve Bayes variants used in text classification (where the
                P r(G = 1|X = x)                                      data are typically represented as word vector counts, although
            log                  = β10 + β1T x                        tf-idf vectors are also known to work well in practice). The
                P r(G = K|X = x)
                                                                      distribution is parametrized by vectors θy = (θy1 , . . . , θyn )
                P r(G = 2|X = x)
            log                  = β20 + β2T x                 (1)    for each class y, where n is the number of features (in
                P r(G = K|X = x)                                      text classification, the size of the vocabulary) and θyi is the
                            ..                                        probability P (xi | y) of feature i appearing in a sample
                             .
                                                                      belonging to class y.
          P r(G = K − 1|X = x)                 T                         The parameters θy is estimated by a smoothed version of
      log                       = β(K−1)0 + βK−1 x
            P r(G = K|X = x)                                          maximum likelihood, i.e. relative frequency counting:
The model is specified in terms of K − 1 log-odds or logit
                                                                                                        Nyi + α
transformations (reflecting the constraint that the probabilities                              θ̂yi =
                                                                                                        Ny + αn
sum to one). Although the model uses the last class as the
                                                                                         P
denominator in the odds-ratios, the choice of denominator is            where Nyi =        x∈T xi is the number of times feature i
arbitrary in that the estimates are equivariant under this choice.    appears in a sample of class y in the training set T , and Ny =
                                                                      P|T |
A simple calculation shows that                                          i=1 Nyi is the total count of all features for class y.
                                                                        The smoothing priors α ≥ 0 accounts for features not
                                 exp(βk0 + βkT x)                     present in the learning samples and prevents zero probabilities
     P r(G = k|X = x) =        PK−1                ,                  in further computations. Setting α = 1 is called Laplace
                       1 + l=1 exp(βl0 + βlT x)
                                                                      smoothing, while α < 1 is called Lidstone smoothing [18].
                  k = 1, . . . , K − 1,
                                        1                             C. Support Vector Machines
   P r(G = K|X = x) =        PK−1                 , (2)
                      1 + l=1 exp(βl0 + βlT x)                           Support vector machines were introduced by Boser et al. in
                                                                      [5] and basically attempt to find the best possible surface to
and they clearly sum to one. To emphasize the dependence on           separate positive and negative training samples. Support Vector
the entire parameter set θ = {β10 , β1T , . . . , β(K−1)0 , βK−1
                                                             T
                                                                 },   Machines (SVMs) are supervised learning methods used for
we denote the probabilities P r(G = k|X = x) = pk (x; θ)              classification.
(Hastie et al. in [26]).                                                 Given training vectors xi ∈ Rn , i = 1, . . . , l, in two classes,
                                                                      and an indicator vector y ∈ Rl such that yi ∈ {1,-1}, C −
B. Naı̈ve Bayes Classification
                                                                      SV C (Boser et al. in [5]; Cortes and Vapnik in [9]) solves the
   A Naı̈ve Bayes classifier is a simple probabilistic classifier     following primal optimization problem [8].
based on Bayes’ theorem and is particularly suited when the
dimensionality of the inputs are high. In text classification, the                                       l
                                                                                                 1      X
given document is assigned a class                                                         min wT w + C     ξi                         (5)
                                                                                           w,b,ξ 2
                                                                                                        i=1

                     C ∗ = arg max p(c|d)                                          subject to yi (wT φ(xi ) + b) ≥ 1 − ξi ,
                                  c


                                                                  153
                     ξi ≥ 0, i = 1, . . . , l                           otherwise known as the regression function. In the classifica-
                                                                        tion situation, if the set of possible values of Y is denoted by
where φ(xi ) maps xi into a higher-dimensional space and C >            Y, minimizing EXY (L(Y, f (X))) for zero-one loss gives
0 is the regularization parameter. Due to the possible high
dimensionality of the vector variable w, usually we solve the                         f (x) = arg max P (Y = y|X = x)                 (11)
                                                                                                    y∈Y
following dual problem.
                                                                        otherwise known as the Bayes rule [10]. Ensembles con-
                        1                                               struct f in terms of a collection of so-called “base learners”
                     min αT Qα − eT α                             (6)
                      α 2                                               h1 (x), . . . , hJ (x) and these base learners are combined to give
                                                                        the “ensemble predictor” f(x). In regression, the base learners
                     subject to y T α = 0,
                                                                        are averaged
                                                                                                              J
                  0 ≤ αi ≤ C, i = 1, . . . , l                                                            1X
                                                                                                 f (x) =         hj (x)                (12)
                                                                                                          J j=1
where e = [1, ..., 1]T is the vector of all ones, Q is an l
by l positive semidefinite matrix, Qij ≡ yi yj K(xi , xj ), and         while in classification, f(x) is the most frequently predicted
K(xi , xj ) ≡ φ(xi )T φ(xj ) is the kernel function.                    class (“voting”)
  After problem (6) is solved, using the primal-dual relation-                                            J
                                                                                                          X
ship, the optimal w satisfies.                                                        f (x) = arg max           I(y = hj (x))         (13)
                                                                                                    y∈Y
                           l                                                                              j=1
                           X
                      w=          yi αi φ(xi )                    (7)   In Random Forests the jth base learner is a tree denoted
                            i=1                                         hj (X, Θj ), where Θj is a collection of random variables and
and the decision function is                                            the Θj ’s are independent for j = 1, . . . , J (Cutler et al. in
                                                             !          [10]).
                                     l
                                     X
     sgn(wT φ(x) + b) = sgn                yi αi K(xi , x) + b                  III. T HE METHOD FOR COMBINING RESULTS
                                     i=1
                                                                           The method for combining results is presented in this
(Chang and Lin in [8])                                                  section. Proposed method is based on our introduced method
                                                                        (Algorithm for sentences) in paper [15]. We modified this
D. Random Forests                                                       algorithm for using it with different machine learning
   Random Forests were introduced by Leo Breiman in [6]                 algorithms. This algorithm is presented below.
who was inspired by earlier work by Amit and Geman [3].
Random Forest is a tree-based ensemble with each tree de-                  Algorithm for combining results
pending on a collection of random variables. More formally,                Input: Let us denote ML1 as the strongest classifier and
for a p-dimensional random vector X = (X1 , . . . , Xp )T               ML2 as the weakest classifier.
representing the real-valued input or predictor variables and              RML1 = {ML1 sent, p} – set of the first algorithm results,
a random variable Y representing the real-valued response,              obtained after performing machine learning algorithm ML1
we assume an unknown joint distribution PXY (X, Y ). The                classification; ML1 sent – sentiment;
goal is to find a prediction function f(X) for predicting Y. The           p – the probability of classification;
prediction function is determined by a loss function L(Y, f(X))            RML2 = {ML2 sent, v} – set of the second machine learning
and defined to minimize the expected value of the loss                  ML2 classification results obtained after performing ML2;
                                                                        ML2 sent – sentiment;
                      EXY (L(Y, f (X)))                           (8)      v – ML2 results value, contains “positive” or “negative”
                                                                        sentiment;
where the subscripts denote expectation with respect to the                th2 = 0.8. The threshold value was selected by manually
joint distribution of X and Y [10].                                     investigating the results;
Intuitively, L(Y, f(X)) is a measure of how close f(X) is to               th3 = min(RML1 {p}) + (σRML1 {p} \ 2) − 0.01 (used our
Y; it penalizes values of f(X) that are a long way from Y.              proposed formula), where σRML1 {p} is the standard deviation
Typical choices of L are squared error loss L(Y, f (X)) =               of RML1 {p}.
(Y −f (X))2 for regression and zero-one loss for classification:
                                    (                                     Algorithm for results combining:
                                     0 if Y = f (X)
   L(Y, f (X)) = I(Y 6= f (X)) =                             (9)          1) Find results which are the same in both ML1 and ML2.
                                     1 otherwise.
                                                                             Results = RML1 ∩ RML2 = {x : x ∈
It turns out that minimizing EXY (L(Y, f (X))) for squared                   RML1 {ML1 sent} and x ∈ RML2 {ML2 sent}}
error loss gives the conditional expectation                              2) Find results which are different between ML1 and ML2.
                                                                             RML1 {ML1 sent}∆RML2 {ML2 sent} and
                    f (x) = E(Y |X = x)                          (10)        RML1 {p} < th2


                                                                    154
                 (
                   Results ∪ RML1 , if |RML1 {p}| < th3               • max iter(Maximum number of iterations taken for the
   3) Results =
                   Results ∪ RML2 , if |RML1 {p}| ≥ th3                 solvers to converge): int, default: 100
   Output: set of classification results Results        =             • multi class: str, default: ‘ovr’. With ‘ovr’ a binary prob-

{Sentence, Sentiment} and Accuracy (Korovkinas et al.                   lem is fit for each label.
in [15]).                                                             • n jobs (Number of CPU cores used when parallelizing
                                                                        over classes if multi class=‘ovr’): int, default: 1
                IV. E XPERIMENTS AND RESULTS                          • penalty (Used to specify the norm used in the penaliza-
A. Dataset                                                              tion): str, ‘l1’ or ‘l2’, default: ‘l2’
   In this paper are used two existing datasets: The Stanford         • solver (Algorithm to use in the optimization problem):

Twitter sentiment corpus (sentiment1401 ) dataset and Amazon            ‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’, default:
customer reviews dataset2 . The Stanford Twitter sentiment              ‘liblinear’.
corpus dataset is introduced by Go et al. in [12] and contains        • tol (Tolerance for stopping criteria): float, default: 0.0001

1.6 million tweets automatically labeled as positive or negative      Naı̈ve Bayes default parameters [18]:
based on emotions. The dataset is splitted into training dataset      • alpha (Additive (Laplace/Lidstone) smoothing parameter
70% (1.12M tweets) and testing dataset 30% (480K tweets).               (0 for no smoothing)): float, optional (default=1.0)
Amazon customer reviews dataset contains 4 million reviews            • fit prior (Whether to learn class prior probabilities or
and star ratings. The dataset is splitted into training dataset         not): boolean, optional (default=True)
70% (2.8M reviews) and testing dataset 30% (1.2M reviews).            • class prior (Prior probabilities of the classes): array-like,
   Training and testing data has been preprocessed and has              size (n classes), optional (default=None)
been cleaned before it was passed as the input of intelligent
algorithm. It included removing redundant tokens such as              Support Vector Machines default parameters [18]:
hashtag symbols @, numbers, http for links, punctuation               • C (Penalty parameter C of the error term): float, optional
symbols, etc. After cleaning was performed all datasets were            (default=1.0)
checked and empty strings were removed.                               • kernel (Specifies the kernel type to be used in the algo-
                                                                        rithm): string, optional (default=‘rbf’). We used ‘linear’
B. Experiments
                                                                        kernel instead.
   In this paper are performed four experiments: two exper-           • loss (Specifies the loss function): string, ‘hinge’
iments with The Stanford Twitter sentiment corpus (senti-               or        ‘squared hinge’        (default=‘squared hinge’).
ment140) dataset and two experiments with Amazon customer               ‘squared hinge’ is the square of the hinge loss.
reviews dataset.                                                      • max iter (The maximum number of iterations to be run):
   In the first and second experiments are used above described         int, (default=1000)
datasets, using split into 70% for training and 30% for testing,      • multi class (Determines the multi-class strategy if y
and apply them to four machine learning algorithms: Logistic            contains more than two classes): string, ‘ovr’ or ‘cram-
Regression, Naı̈ve Bayes classification, Support Vector Ma-             mer singer’ (default=‘ovr’). ‘ovr’ trains n classes one-
chines and Random Forest.                                               vs-rest classifiers.
   In the third and fourth experiments the best three machine         • penalty (Specifies the norm used in the penalization):
learning algorithms are selected, depending on results of the           string, ‘l1’ or ‘l2’ (default=‘l2’)
previous experiments, for the creating various combinations           • tol (Tolerance for stopping criteria): float, optional (de-
of two different single methods and apply them on above                 fault=0,0001)
described datasets.
                                                                      Random Forest default parameters [18]:
C. Experimental settings
                                                                      • n estimators (The number of trees in the forest): integer,
   Data cleaning and preparing are performed with R [24].               optional (default=10)
The experiments are implemented with Python programming               • max features (The number of features to consider when
language and scikit-learn [18]: library for machine learning.           looking for the best split): int, float, string or None,
   Machine learning algorithms are used with their default              optional (default=“auto”)
parameters. They are described below.                                 • max depth (The maximum depth of the tree): integer or
   Logistic Regression default parameters [18]:                         None, optional (default=None)
   • C (Inverse of regularization strength): float, default: 1.0.     • min samples split (The minimum number of samples
   • dual (Dual or primal formulation): bool, default: False            required to split an internal node): int, float, optional
   • fit intercept (Specifies if a constant should be added to          (default=2)
     the decision function): bool, default: True                      • min samples leaf (The minimum number of samples re-
   • intercept scaling: float, default 1                                quired to be at a leaf node): int, float, optional (default=1)
                                                                      • min weight fraction leaf (The minimum weighted frac-
  1 http://help.sentiment140.com/
  2 https://www.kaggle.com/bittlingmayer/amazonreviews/
                                                                        tion of the sum total of weights (of all the input samples)
                                                                        required to be at a leaf node): float, optional (default=0.0)


                                                                155
  • max leaf nodes (Grow trees with max leaf nodes in               E. Results
    best-first fashion. Best nodes are defined as relative reduc-
                                                                       TABLE I contains the results of standard single machine
    tion in impurity): int or None, optional (default=None)
                                                                    learning algorithms with their default parameters. Results
  • min impurity decrease (A node will be split if this split
                                                                    show that Logistic Regression (LR) obtained the best accu-
    induces a decrease of the impurity greater than or equal
                                                                    racy (ACC) in both experiments 79,67% and 90,21%. Other
    to this value): float, optional (default=0.0)
                                                                    methods are arranged in the following order: SVM (ACC)
  • bootstrap (Whether bootstrap samples are used when
                                                                    – 79,16% and 90,00%, Naı̈ve Bayes classification (ACC) –
    building trees): boolean, optional (default=True)
                                                                    76,72% and 84,18%, Random Forest (ACC) – 75,81% and
  • oob score (Whether to use out-of-bag samples to estimate
                                                                    80,15%.
    the generalization accuracy): bool (default=False)
                                                                       The better accuracy obtained when was used Amazon
  • n jobs (The number of jobs to run in parallel for both fit
                                                                    reviews dataset, while it significantly bigger than sentiment140
    and predict): integer, optional (default=1)
                                                                    dataset. This happened because tweets are very short, contain
  • verbose (Controls the verbosity of the tree building pro-
                                                                    noises, slangs, acronyms and etc.
    cess): int, optional (default=0)
                                                                       Logistic Regression and SVM provided more uniform
  • warm start : bool, optional (default=False)
                                                                    recognition of both classes; PPV, NPV, TPR, TNR, F1 , are
  • criterion (The function to measure the quality of a split):
                                                                    almost even, compared to other methods.
    string, optional (default=“gini”)
                                                                       Depending on results presented in TABLE I, for the further
                                                                    experiments were selected Logistic Regression, SVM and
D. Effectiveness
                                                                    Naı̈ve Bayes. Various combinations of two different single
   Effectiveness is measured using statistical measures: accu-      algorithms were performed in these experiments.
racy (ACC), precision (PPV – positive predictive value and
NPV – negative predictive value), recall (TPR – true positive                                      TABLE I
rate and TNR – true negative rate) and F1 (Harmonic mean of                       THE SINGLE METHODS EXPERIMENTS RESULTS

PPV and TPR). Formulas are presented below (Sammut and                     ML                      Effectiveness (%)
Webb in [20]):                                                             alg.     ACC      PPV     NPV      TPR      TNR       F1
Accuracy (ACC):                                                                                 Experiment No 1
                                                                           LR       79,67    80,19   79,16    79,38    79,98    79,78
                         TP + TN                                           NB       76,72    73,18   80,26    78,76    74,95    75,87
              ACC =                                                       SVM       79,16    79,49   78,82    78,97    79,35    79,23
                    TP + TN + FP + FN                                      RF       75,81    70,12   81,49    79,13    73,17    74,35
                                                                                                Experiment No 2
Positive predictive value (PPV):                                           LR       90,21    90,19   90,24    90,24    90,19    90,21
                                                                           NB       84,18    81,46   86,89    86,14    82,42    83,74
                                 TP                                       SVM       90,00    90,03   89,98    89,98    90,03    90,01
                     PPV =
                               TP + FP                                     RF       80,15    73,05   87,25    85,14    76,40    78,63

Negative predictive value (NPV):
                                                                       Table II shows that using proposed method (see Section
                                 TN
                    NPV =                                           III) for combination of two single methods let us to obtain the
                               TN + FN                              better accuracy to compare with a single method.
True positive rate (TPR):
                                                                                                   TABLE II
                             TP                                               THE COMBINED METHODS EXPERIMENTS RESULTS
                     TPR =
                           TP + FN
                                                                           ML                       Effectiveness (%)
True negative rate (TNR):                                                  alg.       ACC     PPV     NPV      TPR       TNR       F1
                                                                                               Experiment No 3
                                 TN                                      LR-NB       79,81    79,49   80,12    80,00    79,62    79,75
                     TNR =                                              SVM-NB       79,26    78,01   80,51    80,02    78,54    78,99
                               TN + FP                                  LR-SVM       81,83    79,98   83,69    83,06    80,69    81,49
                                                                                               Experiment No 4
Harmonic mean of PPV and TPR (F1 ):                                      LR-NB       90,22    90,06   90,37    90,34    90,09    90,20
                                                                        SVM-NB       89,98    89,81   90,15    90,12    89,84    89,96
                                   2                                    LR-SVM       90,22    90,22   90,21    90,21    90,22    90,22
                     F1 =    1
                            PPV    + T P1 R

  where TP – count of correctly classified “positive” senti-          LR-SVM (Logistic Regression and SVM combination)
ments, TN – count of correctly classified “negative” senti-         shows the better accuracy (ACC) 81,83% and 90,22%, while
ments. FP – count of incorrectly classified “positive” senti-       (ACC) of other combinations are smaller: LR-NB (Logistic
ments. FN – count of incorrectly classified “negative” senti-       Regression and Naı̈ve Bayes combination) – 79,81% and
ments.                                                              90,22%, SVM-NB (SVM and Naı̈ve Bayes combination) –


                                                                156
79,26% and 89,98%. Our introduced method also outper-                              [11] D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning
formed single LR algorithm in all experiments, except the                               using twitter hashtags and smileys. In Proceedings of the 23rd interna-
                                                                                        tional conference on computational linguistics: posters, pages 241–249.
fourth experiment where SVM-NB obtained accuracy (ACC)                                  Association for Computational Linguistics, 2010.
89,98% to compare with Logistic Regression 90,21%.                                 [12] A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using
   Our method also provided more uniform recognition of both                            distant supervision. CS224N Project Report, Stanford, 1(12), 2009.
                                                                                   [13] J. Jayalekshmi and T. Mathew. Facial expression recognition and
classes PPV, NPV, TPR, TNR, F1 .                                                        emotion classification system for sentiment analysis. In Networks &
                                                                                        Advances in Computational Technologies (NetACT), 2017 International
              V. C ONCLUSIONS AND FUTURE WORK                                           Conference on, pages 1–8. IEEE, 2017.
                                                                                   [14] V. Kharde, P. Sonawane, et al. Sentiment analysis of twitter data: a
   The main idea of this paper was to select two single intel-                          survey of techniques. arXiv preprint arXiv:1601.06971, 2016.
ligent algorithms to create a combined method for sentiment                        [15] K. Korovkinas, P. Danėnas, and G. Garšva. Svm and naı̈ve bayes
classification.                                                                         classification ensemble method for sentiment analysis. Baltic Journal of
                                                                                        Modern Computing, 5(4):398–409, 2017.
   Results show that combination of two almost equal intel-                        [16] T. Maheshwari, A. N. Reganti, S. Gupta, A. Jamatia, U. Kumar,
ligent methods, which shown the best results (in our case                               B. Gambäck, and A. Das. A societal sentiment analysis: Predicting
Logistic Regression and SVM) can obtain the bigger accuracy                             the values and ethics of individuals by analysing social media content.
                                                                                        In Proceedings of the 15th Conference of the European Chapter of
(ACC) 81,83% and 90,22% to compare with the best results                                the Association for Computational Linguistics: Volume 1, Long Papers,
obtained single method like Logistic Regression 79,67% and                              volume 1, pages 731–741, 2017.
90,21%.                                                                            [17] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classi-
                                                                                        fication using machine learning techniques. In Proceedings of the ACL-
   Combination between the strongest and the weakest method                             02 conference on Empirical methods in natural language processing-
(in our case Naı̈ve Bayes classification with accuracy (ACC)                            Volume 10, pages 79–86. Association for Computational Linguistics,
79,81% and 90,22%) also outperform the best results obtained                            2002.
                                                                                   [18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
single method Logistic Regression.                                                      O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al.
   The main advantage of methods combination is that com-                               Scikit-learn: Machine learning in python. Journal of machine learning
bined method provided more uniform recognition of both                                  research, 12(Oct):2825–2830, 2011.
                                                                                   [19] T. Pranckevičius and V. Marcinkevičius. Comparison of naive bayes,
classes PPV, NPV, TPR, TNR, F1 to compare with Naı̈ve                                   random forest, decision tree, support vector machines, and logistic
Bayes and Random Forest.                                                                regression classifiers for text reviews classification. Baltic Journal of
   Such results let to conclude that Logistic Regression and                            Modern Computing, 5(2):221, 2017.
                                                                                   [20] C. Sammut and G. I. Webb. Encyclopedia of machine learning. Springer
SVM, and combination of these methods fit the best for our                              Science & Business Media, 2011.
further work. Our method presented in [15] can be applied                          [21] P. Singh, R. S. Sawhney, and K. S. Kahlon. Forecasting the 2016 us
with different algorithms and obtain the better classification                          presidential elections using sentiment analysis. In Conference on e-
                                                                                        Business, e-Services and e-Society, pages 412–423. Springer, 2017.
accuracy. The goal of this approach was to test proposed                           [22] J. T. Starczewski, S. Pabiasz, N. Vladymyrska, A. Marvuglia, C. Napoli,
method with existing datasets to be able in the future continue                         and M. Woźniak. Self organizing maps for 3d face understanding. In
work with real-world data.                                                              International Conference on Artificial Intelligence and Soft Computing,
                                                                                        pages 210–217. Springer, 2016.
                                                                                   [23] P. M. Tayade, S. S. Shaikh, and S. Deshmukh. To discover trolling
                               R EFERENCES                                              patterns in social media: Troll filter. 2017.
 [1] M. Ahmad, S. Aftab, S. S. Muhammad, and S. Ahmad. Machine learning            [24] R. C. Team et al. R: A language and environment for statistical
     techniques for sentiment analysis: A review. Int. J. Multidiscip. Sci. Eng,        computing. 2015.
     8(3):27–32, 2017.                                                             [25] F. Tian, F. Wu, K.-M. Chao, Q. Zheng, N. Shah, T. Lan, and J. Yue. A
 [2] E. Ahmed, M. A. U. Sazzad, M. T. Islam, M. Azad, S. Islam, and M. H.               topic sentence-based instance transfer method for imbalanced sentiment
     Ali. Challenges, comparative analysis and a proposed methodology                   classification of chinese product reviews. Electronic Commerce Research
     to predict sentiment from movie reviews using machine learning. In                 and Applications, 16:66–76, 2016.
     Big Data Analytics and Computational Intelligence (ICBDAC), 2017              [26] H. Trevor, T. Robert, and F. JH. The elements of statistical learning:
     International Conference on, pages 86–91. IEEE, 2017.                              data mining, inference, and prediction, 2009.
 [3] Y. Amit and D. Geman. Shape quantization and recognition with                 [27] A. Venckauskas, A. Karpavicius, R. Damaševičius, R. Marcinkevičius,
     randomized trees. Neural computation, 9(7):1545–1588, 1997.                        J. Kapočiūte-Dzikiené, and C. Napoli. Open class authorship attribution
 [4] M. Ashok, S. Rajanna, P. V. Joshi, and S. Kamath. A personalized                   of lithuanian internet comments using one-class classifier. In Federated
     recommender system using machine learning based sentiment analysis                 Conference on Computer Science and Information Systems (FedCSIS),
     over social data. In Electrical, Electronics and Computer Science                  pages 373–382. IEEE, 2017.
     (SCEECS), 2016 IEEE Students’ Conference on, pages 1–6. IEEE, 2016.
 [5] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for
     optimal margin classifiers. In Proceedings of the fifth annual workshop
     on Computational learning theory, pages 144–152. ACM, 1992.
 [6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
 [7] E. Brito, R. Sifa, K. Cvejoski, C. Ojeda, and C. Bauckhage. Towards
     german word embeddings: A use case with predictive sentiment analysis.
     In Data Science–Analytics and Applications, pages 59–62. Springer,
     2017.
 [8] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector
     machines. ACM transactions on intelligent systems and technology
     (TIST), 2(3):27, 2011.
 [9] C. Cortes and V. Vapnik. Support-vector networks. Machine learning,
     20(3):273–297, 1995.
[10] A. Cutler, D. R. Cutler, and J. R. Stevens. Random forests. In Ensemble
     machine learning, pages 157–175. Springer, 2012.


                                                                               157