Selection of Intelligent Algorithms for Sentiment Classification Method Creation Konstantinas Korovkinas Gintautas Garšva Institute of Applied Informatics, Kaunas Faculty Institute of Applied Informatics, Kaunas Faculty Vilnius University Vilnius University Muitines Str. 8, Kaunas, Lithuania Muitines Str. 8, Kaunas, Lithuania konstantinas.korovkinas@knf.vu.lt gintautas4garsva@gmail.com Abstract—The main goal of this paper is to select two single in- expression from the image and classify emotions for final de- telligent algorithms for sentiment classification method creation. cision. For classification they used SVM, Random Forest and We perform set of experiments to recognize positive or negative KNN classifier. Ahmed et al. in [2] did investigation on a new sentiment, using single intelligent methods and combination of them. It was observed that the better results were obtained by approach of finding sentence level sentiment analysis using the single methods: Logistic regression, SVM and Naı̈ve Bayes, different machine learning algorithms. SVM (Support Vector also the combination of Logistic regression with SVM. Machines), Naı̈ve Bayes and MLP (Multilayer Perceptron) Index Terms—Sentiment analysis, Logistic Regression, Naı̈ve were used for movie reviews sentiment analysis. Moreover Bayes classification, Support Vector Machines, Random Forest. they used two different classifiers of Naı̈ve Bayes and two different types of SVM kernels to identify and analyze the I. I NTRODUCTION difference in accuracy as well as to find the best outcome Nowadays sentiment analysis is a very popular research among all the experiments. Tayade et al. in [23] used sentiment area. A lot of works are done, but still there are no good analysis through machine learning to identify the targets of enough method for sentiment classification. Many authors trolls, so as to prevent trolling before it happens. Naı̈ve declare results of average slightly above 80%, but it is not Bayes, Support Vector Machines (SVM) and Maximum En- enough if we need more accurate results. tropy (MaxEnt) classifiers have shown very promising results. Pang et al. in [17] evaluated the performance of Naı̈ve Maheshwari et al. in [16] applied Support Vector Machines, Bayes, Maximum Entropy, and Support Vector Machines in Logistic Regression and Random Forests machine learners the specific domain of movie reviews, obtaining that Naı̈ve to identify the best linguistic and non-linguistic features for Bayes shown the worst and SVM the best results, although the automatic classification of values and ethics. Pranckevic̆ius differences aren’t very large. Later Go et al. in [12] obtained and Marcinkevic̆ius in [19] did experiments on short text similar results with unigrams by introducing a more novel for product-review data from Amazon in case to compare approach to automatically classify the sentiment of Twitter Naı̈ve Bayes, Random Forest, Decision Tree, Support Vector messages as either positive or negative with respect to a query Machines, and Logistic Regression classifiers implemented in term. The same techniques were also used by Kharde and Apache Spark by evaluating the classification accuracy, based Sonawane in [14] to perform sentiment analysis on Twitter on the size of training data sets, and the number of n-grams. data, yet resulting in lower accuracy; again, SVM proved to Ahmad et al. in [1] did a review of different machine learning perform best. Davidov et al. in [11] also stated that SVM and techniques and algorithms (Maximum Entrophy, Random For- Naı̈ve Bayes are the best techniques to classify the data and est, SailAil Sentiment Analyzer, Multilayer Perceptron, Naı̈ve can be regarded as the baseline learning methods, by applying Bayes and Support Vector Machines) which were applied by them for analysis based on the Twitter user defined hashtag the researches on movie reviews and product reviews for in tweets. Tian et al. in [25] applied seven classification algo- the evaluation. Brito et al. in [7] presented how different rithms: J48, Random Forest, ADTree, AdaBoostM1, Bagging, hyperparameter combinations impact the resulting German Multilayer Perceptron and Naı̈ve Bayes for imbalanced sen- word vectors and how these word representations can be part timent classification of Chinese product reviews. They found of more complex models. For prediction whether a user liked that their proposed method helps a Support Vector Machines an app given a review with three different algorithms: Logistic (SVM) to outperform other classification methods. Singh et al. Regression, Decision Trees and Random Forests. Ashok et in [21] used a novel technique to predict the outcome of US al. in [4] proposed a social framework, which extracts user’s presidential elections using sentiment analysis. To accomplish reviews, comments of restaurants and points of interest such as this task they used SVM. Jayalekshmi and Mathew in [13] events and locations, to personalize and rank suggestions based proposed a system that automatically recognize the facial on user preferences. Naı̈ve Bayes, Support Vector Machines with two different kernels (Gausian and Linear), Maximum Copyright held by the author(s). Entropy and Random Forest have been used in this work. 152 Such results led to the conclusion that Logistic Regression, Its underlying probability model can be described as an SVM, Naı̈ve Bayes and Random Forest are still prominent “independent feature model”. The Naı̈ve Bayes (NB) classifier for future research. Therefore, in this paper we perform uses the Bayes’ rule Eq. (3), experiments with each of them also with various combinations p(c)p(d|c) of two of them (depending on results of previous) to recog- p(c|d) = (3) p(d) nize positive or negative sentiment and to compare accuracy between them. The rest of the paper is organized as follows. Where, p(d) plays no role in selecting C ∗ . To estimate the In section II, a description of techniques used in research. In term p(d|c), Naı̈ve Bayes decomposes it by assuming the fi ’s section III, presented method for combining results. In section are conditionally independent given d’s class as in Eq.(4), IV, described preparation of dataset, experiments, experimental m Y ! settings, effectiveness measure and results. In section V, we p(c) p(fi |c)ni (d) conclude and give tasks of our future works. i=1 pN B (c|d) := (4) p(d) II. D ESCRIPTION OF TECHNIQUES USED IN RESEARCH Where, m is the no of features and fi is the feature vector. A. Logistic Regression Consider a training method consisting of a relative-frequency The logistic regression model arises from the desire to estimation p(c) and p (fi |c) (Pang et al. in [17]). model the posterior probabilities of the K classes via linear In our experiments are used Multinomial Naı̈ve Bayes, functions in x, while at the same time ensuring that they sum presented in [18]. It implements the Naı̈ve Bayes algorithm for to one and remain in [0, 1]. The model has the form multinomially distributed data, and is one of the two classic Naı̈ve Bayes variants used in text classification (where the P r(G = 1|X = x) data are typically represented as word vector counts, although log = β10 + β1T x tf-idf vectors are also known to work well in practice). The P r(G = K|X = x) distribution is parametrized by vectors θy = (θy1 , . . . , θyn ) P r(G = 2|X = x) log = β20 + β2T x (1) for each class y, where n is the number of features (in P r(G = K|X = x) text classification, the size of the vocabulary) and θyi is the .. probability P (xi | y) of feature i appearing in a sample . belonging to class y. P r(G = K − 1|X = x) T The parameters θy is estimated by a smoothed version of log = β(K−1)0 + βK−1 x P r(G = K|X = x) maximum likelihood, i.e. relative frequency counting: The model is specified in terms of K − 1 log-odds or logit Nyi + α transformations (reflecting the constraint that the probabilities θ̂yi = Ny + αn sum to one). Although the model uses the last class as the P denominator in the odds-ratios, the choice of denominator is where Nyi = x∈T xi is the number of times feature i arbitrary in that the estimates are equivariant under this choice. appears in a sample of class y in the training set T , and Ny = P|T | A simple calculation shows that i=1 Nyi is the total count of all features for class y. The smoothing priors α ≥ 0 accounts for features not exp(βk0 + βkT x) present in the learning samples and prevents zero probabilities P r(G = k|X = x) = PK−1 , in further computations. Setting α = 1 is called Laplace 1 + l=1 exp(βl0 + βlT x) smoothing, while α < 1 is called Lidstone smoothing [18]. k = 1, . . . , K − 1, 1 C. Support Vector Machines P r(G = K|X = x) = PK−1 , (2) 1 + l=1 exp(βl0 + βlT x) Support vector machines were introduced by Boser et al. in [5] and basically attempt to find the best possible surface to and they clearly sum to one. To emphasize the dependence on separate positive and negative training samples. Support Vector the entire parameter set θ = {β10 , β1T , . . . , β(K−1)0 , βK−1 T }, Machines (SVMs) are supervised learning methods used for we denote the probabilities P r(G = k|X = x) = pk (x; θ) classification. (Hastie et al. in [26]). Given training vectors xi ∈ Rn , i = 1, . . . , l, in two classes, and an indicator vector y ∈ Rl such that yi ∈ {1,-1}, C − B. Naı̈ve Bayes Classification SV C (Boser et al. in [5]; Cortes and Vapnik in [9]) solves the A Naı̈ve Bayes classifier is a simple probabilistic classifier following primal optimization problem [8]. based on Bayes’ theorem and is particularly suited when the dimensionality of the inputs are high. In text classification, the l 1 X given document is assigned a class min wT w + C ξi (5) w,b,ξ 2 i=1 C ∗ = arg max p(c|d) subject to yi (wT φ(xi ) + b) ≥ 1 − ξi , c 153 ξi ≥ 0, i = 1, . . . , l otherwise known as the regression function. In the classifica- tion situation, if the set of possible values of Y is denoted by where φ(xi ) maps xi into a higher-dimensional space and C > Y, minimizing EXY (L(Y, f (X))) for zero-one loss gives 0 is the regularization parameter. Due to the possible high dimensionality of the vector variable w, usually we solve the f (x) = arg max P (Y = y|X = x) (11) y∈Y following dual problem. otherwise known as the Bayes rule [10]. Ensembles con- 1 struct f in terms of a collection of so-called “base learners” min αT Qα − eT α (6) α 2 h1 (x), . . . , hJ (x) and these base learners are combined to give the “ensemble predictor” f(x). In regression, the base learners subject to y T α = 0, are averaged J 0 ≤ αi ≤ C, i = 1, . . . , l 1X f (x) = hj (x) (12) J j=1 where e = [1, ..., 1]T is the vector of all ones, Q is an l by l positive semidefinite matrix, Qij ≡ yi yj K(xi , xj ), and while in classification, f(x) is the most frequently predicted K(xi , xj ) ≡ φ(xi )T φ(xj ) is the kernel function. class (“voting”) After problem (6) is solved, using the primal-dual relation- J X ship, the optimal w satisfies. f (x) = arg max I(y = hj (x)) (13) y∈Y l j=1 X w= yi αi φ(xi ) (7) In Random Forests the jth base learner is a tree denoted i=1 hj (X, Θj ), where Θj is a collection of random variables and and the decision function is the Θj ’s are independent for j = 1, . . . , J (Cutler et al. in ! [10]). l X sgn(wT φ(x) + b) = sgn yi αi K(xi , x) + b III. T HE METHOD FOR COMBINING RESULTS i=1 The method for combining results is presented in this (Chang and Lin in [8]) section. Proposed method is based on our introduced method (Algorithm for sentences) in paper [15]. We modified this D. Random Forests algorithm for using it with different machine learning Random Forests were introduced by Leo Breiman in [6] algorithms. This algorithm is presented below. who was inspired by earlier work by Amit and Geman [3]. Random Forest is a tree-based ensemble with each tree de- Algorithm for combining results pending on a collection of random variables. More formally, Input: Let us denote ML1 as the strongest classifier and for a p-dimensional random vector X = (X1 , . . . , Xp )T ML2 as the weakest classifier. representing the real-valued input or predictor variables and RML1 = {ML1 sent, p} – set of the first algorithm results, a random variable Y representing the real-valued response, obtained after performing machine learning algorithm ML1 we assume an unknown joint distribution PXY (X, Y ). The classification; ML1 sent – sentiment; goal is to find a prediction function f(X) for predicting Y. The p – the probability of classification; prediction function is determined by a loss function L(Y, f(X)) RML2 = {ML2 sent, v} – set of the second machine learning and defined to minimize the expected value of the loss ML2 classification results obtained after performing ML2; ML2 sent – sentiment; EXY (L(Y, f (X))) (8) v – ML2 results value, contains “positive” or “negative” sentiment; where the subscripts denote expectation with respect to the th2 = 0.8. The threshold value was selected by manually joint distribution of X and Y [10]. investigating the results; Intuitively, L(Y, f(X)) is a measure of how close f(X) is to th3 = min(RML1 {p}) + (σRML1 {p} \ 2) − 0.01 (used our Y; it penalizes values of f(X) that are a long way from Y. proposed formula), where σRML1 {p} is the standard deviation Typical choices of L are squared error loss L(Y, f (X)) = of RML1 {p}. (Y −f (X))2 for regression and zero-one loss for classification: ( Algorithm for results combining: 0 if Y = f (X) L(Y, f (X)) = I(Y 6= f (X)) = (9) 1) Find results which are the same in both ML1 and ML2. 1 otherwise. Results = RML1 ∩ RML2 = {x : x ∈ It turns out that minimizing EXY (L(Y, f (X))) for squared RML1 {ML1 sent} and x ∈ RML2 {ML2 sent}} error loss gives the conditional expectation 2) Find results which are different between ML1 and ML2. RML1 {ML1 sent}∆RML2 {ML2 sent} and f (x) = E(Y |X = x) (10) RML1 {p} < th2 154 ( Results ∪ RML1 , if |RML1 {p}| < th3 • max iter(Maximum number of iterations taken for the 3) Results = Results ∪ RML2 , if |RML1 {p}| ≥ th3 solvers to converge): int, default: 100 Output: set of classification results Results = • multi class: str, default: ‘ovr’. With ‘ovr’ a binary prob- {Sentence, Sentiment} and Accuracy (Korovkinas et al. lem is fit for each label. in [15]). • n jobs (Number of CPU cores used when parallelizing over classes if multi class=‘ovr’): int, default: 1 IV. E XPERIMENTS AND RESULTS • penalty (Used to specify the norm used in the penaliza- A. Dataset tion): str, ‘l1’ or ‘l2’, default: ‘l2’ In this paper are used two existing datasets: The Stanford • solver (Algorithm to use in the optimization problem): Twitter sentiment corpus (sentiment1401 ) dataset and Amazon ‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’, default: customer reviews dataset2 . The Stanford Twitter sentiment ‘liblinear’. corpus dataset is introduced by Go et al. in [12] and contains • tol (Tolerance for stopping criteria): float, default: 0.0001 1.6 million tweets automatically labeled as positive or negative Naı̈ve Bayes default parameters [18]: based on emotions. The dataset is splitted into training dataset • alpha (Additive (Laplace/Lidstone) smoothing parameter 70% (1.12M tweets) and testing dataset 30% (480K tweets). (0 for no smoothing)): float, optional (default=1.0) Amazon customer reviews dataset contains 4 million reviews • fit prior (Whether to learn class prior probabilities or and star ratings. The dataset is splitted into training dataset not): boolean, optional (default=True) 70% (2.8M reviews) and testing dataset 30% (1.2M reviews). • class prior (Prior probabilities of the classes): array-like, Training and testing data has been preprocessed and has size (n classes), optional (default=None) been cleaned before it was passed as the input of intelligent algorithm. It included removing redundant tokens such as Support Vector Machines default parameters [18]: hashtag symbols @, numbers, http for links, punctuation • C (Penalty parameter C of the error term): float, optional symbols, etc. After cleaning was performed all datasets were (default=1.0) checked and empty strings were removed. • kernel (Specifies the kernel type to be used in the algo- rithm): string, optional (default=‘rbf’). We used ‘linear’ B. Experiments kernel instead. In this paper are performed four experiments: two exper- • loss (Specifies the loss function): string, ‘hinge’ iments with The Stanford Twitter sentiment corpus (senti- or ‘squared hinge’ (default=‘squared hinge’). ment140) dataset and two experiments with Amazon customer ‘squared hinge’ is the square of the hinge loss. reviews dataset. • max iter (The maximum number of iterations to be run): In the first and second experiments are used above described int, (default=1000) datasets, using split into 70% for training and 30% for testing, • multi class (Determines the multi-class strategy if y and apply them to four machine learning algorithms: Logistic contains more than two classes): string, ‘ovr’ or ‘cram- Regression, Naı̈ve Bayes classification, Support Vector Ma- mer singer’ (default=‘ovr’). ‘ovr’ trains n classes one- chines and Random Forest. vs-rest classifiers. In the third and fourth experiments the best three machine • penalty (Specifies the norm used in the penalization): learning algorithms are selected, depending on results of the string, ‘l1’ or ‘l2’ (default=‘l2’) previous experiments, for the creating various combinations • tol (Tolerance for stopping criteria): float, optional (de- of two different single methods and apply them on above fault=0,0001) described datasets. Random Forest default parameters [18]: C. Experimental settings • n estimators (The number of trees in the forest): integer, Data cleaning and preparing are performed with R [24]. optional (default=10) The experiments are implemented with Python programming • max features (The number of features to consider when language and scikit-learn [18]: library for machine learning. looking for the best split): int, float, string or None, Machine learning algorithms are used with their default optional (default=“auto”) parameters. They are described below. • max depth (The maximum depth of the tree): integer or Logistic Regression default parameters [18]: None, optional (default=None) • C (Inverse of regularization strength): float, default: 1.0. • min samples split (The minimum number of samples • dual (Dual or primal formulation): bool, default: False required to split an internal node): int, float, optional • fit intercept (Specifies if a constant should be added to (default=2) the decision function): bool, default: True • min samples leaf (The minimum number of samples re- • intercept scaling: float, default 1 quired to be at a leaf node): int, float, optional (default=1) • min weight fraction leaf (The minimum weighted frac- 1 http://help.sentiment140.com/ 2 https://www.kaggle.com/bittlingmayer/amazonreviews/ tion of the sum total of weights (of all the input samples) required to be at a leaf node): float, optional (default=0.0) 155 • max leaf nodes (Grow trees with max leaf nodes in E. Results best-first fashion. Best nodes are defined as relative reduc- TABLE I contains the results of standard single machine tion in impurity): int or None, optional (default=None) learning algorithms with their default parameters. Results • min impurity decrease (A node will be split if this split show that Logistic Regression (LR) obtained the best accu- induces a decrease of the impurity greater than or equal racy (ACC) in both experiments 79,67% and 90,21%. Other to this value): float, optional (default=0.0) methods are arranged in the following order: SVM (ACC) • bootstrap (Whether bootstrap samples are used when – 79,16% and 90,00%, Naı̈ve Bayes classification (ACC) – building trees): boolean, optional (default=True) 76,72% and 84,18%, Random Forest (ACC) – 75,81% and • oob score (Whether to use out-of-bag samples to estimate 80,15%. the generalization accuracy): bool (default=False) The better accuracy obtained when was used Amazon • n jobs (The number of jobs to run in parallel for both fit reviews dataset, while it significantly bigger than sentiment140 and predict): integer, optional (default=1) dataset. This happened because tweets are very short, contain • verbose (Controls the verbosity of the tree building pro- noises, slangs, acronyms and etc. cess): int, optional (default=0) Logistic Regression and SVM provided more uniform • warm start : bool, optional (default=False) recognition of both classes; PPV, NPV, TPR, TNR, F1 , are • criterion (The function to measure the quality of a split): almost even, compared to other methods. string, optional (default=“gini”) Depending on results presented in TABLE I, for the further experiments were selected Logistic Regression, SVM and D. Effectiveness Naı̈ve Bayes. Various combinations of two different single Effectiveness is measured using statistical measures: accu- algorithms were performed in these experiments. racy (ACC), precision (PPV – positive predictive value and NPV – negative predictive value), recall (TPR – true positive TABLE I rate and TNR – true negative rate) and F1 (Harmonic mean of THE SINGLE METHODS EXPERIMENTS RESULTS PPV and TPR). Formulas are presented below (Sammut and ML Effectiveness (%) Webb in [20]): alg. ACC PPV NPV TPR TNR F1 Accuracy (ACC): Experiment No 1 LR 79,67 80,19 79,16 79,38 79,98 79,78 TP + TN NB 76,72 73,18 80,26 78,76 74,95 75,87 ACC = SVM 79,16 79,49 78,82 78,97 79,35 79,23 TP + TN + FP + FN RF 75,81 70,12 81,49 79,13 73,17 74,35 Experiment No 2 Positive predictive value (PPV): LR 90,21 90,19 90,24 90,24 90,19 90,21 NB 84,18 81,46 86,89 86,14 82,42 83,74 TP SVM 90,00 90,03 89,98 89,98 90,03 90,01 PPV = TP + FP RF 80,15 73,05 87,25 85,14 76,40 78,63 Negative predictive value (NPV): Table II shows that using proposed method (see Section TN NPV = III) for combination of two single methods let us to obtain the TN + FN better accuracy to compare with a single method. True positive rate (TPR): TABLE II TP THE COMBINED METHODS EXPERIMENTS RESULTS TPR = TP + FN ML Effectiveness (%) True negative rate (TNR): alg. ACC PPV NPV TPR TNR F1 Experiment No 3 TN LR-NB 79,81 79,49 80,12 80,00 79,62 79,75 TNR = SVM-NB 79,26 78,01 80,51 80,02 78,54 78,99 TN + FP LR-SVM 81,83 79,98 83,69 83,06 80,69 81,49 Experiment No 4 Harmonic mean of PPV and TPR (F1 ): LR-NB 90,22 90,06 90,37 90,34 90,09 90,20 SVM-NB 89,98 89,81 90,15 90,12 89,84 89,96 2 LR-SVM 90,22 90,22 90,21 90,21 90,22 90,22 F1 = 1 PPV + T P1 R where TP – count of correctly classified “positive” senti- LR-SVM (Logistic Regression and SVM combination) ments, TN – count of correctly classified “negative” senti- shows the better accuracy (ACC) 81,83% and 90,22%, while ments. FP – count of incorrectly classified “positive” senti- (ACC) of other combinations are smaller: LR-NB (Logistic ments. FN – count of incorrectly classified “negative” senti- Regression and Naı̈ve Bayes combination) – 79,81% and ments. 90,22%, SVM-NB (SVM and Naı̈ve Bayes combination) – 156 79,26% and 89,98%. Our introduced method also outper- [11] D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning formed single LR algorithm in all experiments, except the using twitter hashtags and smileys. In Proceedings of the 23rd interna- tional conference on computational linguistics: posters, pages 241–249. fourth experiment where SVM-NB obtained accuracy (ACC) Association for Computational Linguistics, 2010. 89,98% to compare with Logistic Regression 90,21%. [12] A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using Our method also provided more uniform recognition of both distant supervision. CS224N Project Report, Stanford, 1(12), 2009. [13] J. Jayalekshmi and T. Mathew. Facial expression recognition and classes PPV, NPV, TPR, TNR, F1 . emotion classification system for sentiment analysis. In Networks & Advances in Computational Technologies (NetACT), 2017 International V. C ONCLUSIONS AND FUTURE WORK Conference on, pages 1–8. IEEE, 2017. [14] V. Kharde, P. Sonawane, et al. Sentiment analysis of twitter data: a The main idea of this paper was to select two single intel- survey of techniques. arXiv preprint arXiv:1601.06971, 2016. ligent algorithms to create a combined method for sentiment [15] K. Korovkinas, P. Danėnas, and G. Garšva. Svm and naı̈ve bayes classification. classification ensemble method for sentiment analysis. Baltic Journal of Modern Computing, 5(4):398–409, 2017. Results show that combination of two almost equal intel- [16] T. Maheshwari, A. N. Reganti, S. Gupta, A. Jamatia, U. Kumar, ligent methods, which shown the best results (in our case B. Gambäck, and A. Das. A societal sentiment analysis: Predicting Logistic Regression and SVM) can obtain the bigger accuracy the values and ethics of individuals by analysing social media content. In Proceedings of the 15th Conference of the European Chapter of (ACC) 81,83% and 90,22% to compare with the best results the Association for Computational Linguistics: Volume 1, Long Papers, obtained single method like Logistic Regression 79,67% and volume 1, pages 731–741, 2017. 90,21%. [17] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classi- fication using machine learning techniques. In Proceedings of the ACL- Combination between the strongest and the weakest method 02 conference on Empirical methods in natural language processing- (in our case Naı̈ve Bayes classification with accuracy (ACC) Volume 10, pages 79–86. Association for Computational Linguistics, 79,81% and 90,22%) also outperform the best results obtained 2002. [18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, single method Logistic Regression. O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. The main advantage of methods combination is that com- Scikit-learn: Machine learning in python. Journal of machine learning bined method provided more uniform recognition of both research, 12(Oct):2825–2830, 2011. [19] T. Pranckevičius and V. Marcinkevičius. Comparison of naive bayes, classes PPV, NPV, TPR, TNR, F1 to compare with Naı̈ve random forest, decision tree, support vector machines, and logistic Bayes and Random Forest. regression classifiers for text reviews classification. Baltic Journal of Such results let to conclude that Logistic Regression and Modern Computing, 5(2):221, 2017. [20] C. Sammut and G. I. Webb. Encyclopedia of machine learning. Springer SVM, and combination of these methods fit the best for our Science & Business Media, 2011. further work. Our method presented in [15] can be applied [21] P. Singh, R. S. Sawhney, and K. S. Kahlon. Forecasting the 2016 us with different algorithms and obtain the better classification presidential elections using sentiment analysis. In Conference on e- Business, e-Services and e-Society, pages 412–423. Springer, 2017. accuracy. The goal of this approach was to test proposed [22] J. T. Starczewski, S. Pabiasz, N. Vladymyrska, A. Marvuglia, C. Napoli, method with existing datasets to be able in the future continue and M. Woźniak. Self organizing maps for 3d face understanding. In work with real-world data. International Conference on Artificial Intelligence and Soft Computing, pages 210–217. Springer, 2016. [23] P. M. Tayade, S. S. Shaikh, and S. Deshmukh. To discover trolling R EFERENCES patterns in social media: Troll filter. 2017. [1] M. Ahmad, S. Aftab, S. S. Muhammad, and S. Ahmad. Machine learning [24] R. C. Team et al. R: A language and environment for statistical techniques for sentiment analysis: A review. Int. J. Multidiscip. Sci. Eng, computing. 2015. 8(3):27–32, 2017. [25] F. Tian, F. Wu, K.-M. Chao, Q. Zheng, N. Shah, T. Lan, and J. Yue. A [2] E. Ahmed, M. A. U. Sazzad, M. T. Islam, M. Azad, S. Islam, and M. H. topic sentence-based instance transfer method for imbalanced sentiment Ali. Challenges, comparative analysis and a proposed methodology classification of chinese product reviews. Electronic Commerce Research to predict sentiment from movie reviews using machine learning. In and Applications, 16:66–76, 2016. Big Data Analytics and Computational Intelligence (ICBDAC), 2017 [26] H. Trevor, T. Robert, and F. JH. The elements of statistical learning: International Conference on, pages 86–91. IEEE, 2017. data mining, inference, and prediction, 2009. [3] Y. Amit and D. Geman. Shape quantization and recognition with [27] A. Venckauskas, A. Karpavicius, R. Damaševičius, R. Marcinkevičius, randomized trees. Neural computation, 9(7):1545–1588, 1997. J. Kapočiūte-Dzikiené, and C. Napoli. Open class authorship attribution [4] M. Ashok, S. Rajanna, P. V. Joshi, and S. Kamath. A personalized of lithuanian internet comments using one-class classifier. In Federated recommender system using machine learning based sentiment analysis Conference on Computer Science and Information Systems (FedCSIS), over social data. In Electrical, Electronics and Computer Science pages 373–382. IEEE, 2017. (SCEECS), 2016 IEEE Students’ Conference on, pages 1–6. IEEE, 2016. [5] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152. ACM, 1992. [6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [7] E. Brito, R. Sifa, K. Cvejoski, C. Ojeda, and C. Bauckhage. Towards german word embeddings: A use case with predictive sentiment analysis. In Data Science–Analytics and Applications, pages 59–62. Springer, 2017. [8] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):27, 2011. [9] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995. [10] A. Cutler, D. R. Cutler, and J. R. Stevens. Random forests. In Ensemble machine learning, pages 157–175. Springer, 2012. 157