=Paper=
{{Paper
|id=Vol-2786/Paper19
|storemode=property
|title=Semantic Analysis of Sentiments through Web-Mined Twitter Corpus
|pdfUrl=https://ceur-ws.org/Vol-2786/Paper19.pdf
|volume=Vol-2786
|authors=Satish Chandra,Mahendra Kumar Gourisaria,Harshvardhan GM,Siddharth Swarup Rautaray,Manjusha Pandey,Sachi Nandan Mohanty
|dblpUrl=https://dblp.org/rec/conf/isic2/ChandraGGRPM21
}}
==Semantic Analysis of Sentiments through Web-Mined Twitter Corpus==
122 Semantic Analysis of Sentiments through Web-Mined Twitter Corpus Satish Chandraa, Mahendra Kumar Gourisariaa, Harshvardhan GMa, Siddharth Swarup Rautaraya, Manjusha Pandeya and Sachi Nandan Mohantyb a School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar-751024, Odisha, India b Dept of Computer Science & Engineering, ICFAITech, ICFAI Foundation for Higher Education, Hyderabad-500082, India Abstract A huge amount of textual data is generated due to the boom of microblogging. Microblogging sites such as Facebook, Twitter and Google+ are used by millions of people to express their views and emotions on different subjects. In this paper, we discuss sentiment analysis on a Twitter dataset having various tweets from different users. Sentiment analysis is useful for gaining the opinion of people using large volumes of text data where texts are highly unstructured and heterogeneous. In this paper, different classification techniques like Support Vector Machine, Logistic Regression, Logistic Regression with Stochastic Gradient Descent optimizer, Decision Tree Classification, Naive Bayes, Bidirectional LSTM and Random Forest Classification have been applied to analyze the sentiment of people, i.e., whether their tweets are positive or negative. The corpus has been analyzed by plotting descriptive insights such as the word cloud and frequency of positive and negative tweets. The best classifier was selected by comparing the results of accuracy, recall, precision, F1 score, AUC score and ROC curve. Keywords Sentiment Analysis, Twitter, Natural Language Processing, Word2Vec, Support Vector Machine, Logistic Regression, Random Forest. 1. Introduction what they feel and think about their products [2]. As a result, sentiment analysis on Twitter is an effective way of reckoning public opinion. With the universality of microblogging and Sentiment analysis provides the potential of social networking sites, Twitter, with 319 observing numerous social networking sites in million monthly users has now become a real-time. valuable resource for several individuals and organizations for posting blogs and expressing Twitter has a limitation of 140 characters [3] their views and opinions on different subjects in each tweet, which causes individuals to use like politics, sports, movies, etc. [1]. Stimulated phrases in their tweets. Sentiment Analysis by the growth of social media, many companies automatically detects whether a text section and media organizations are trying to mine contains emotions or opinioned content. It also Twitter to observe people’s views to understand determines the polarity of the text. Generally, the dataset consists of a group of tweets where ISIC’21: International Semantic Intelligence Conference, each tweet is interpreted with a sentiment label. February 25–27, 2021, Delhi, India EMAIL: schandra1.sc@gmail.com (S. Chandra); Commonly sentiments are labeled positive, mkgourisaria2010@gmail.com (M. K. Gourisaria); negative or neutral. However, some datasets harrshvardhan@gmail.com (H. GM); siddharthfcs@kiit.ac.in (S.S. Rautaray); manjushafcs@kiit.ac.in (M. Pandey); have mixed or irrelevant tags too, which ranges sachinandan09@gmail.com (S.N. Mohanty) from -5 to 5 and depicting negative to positive ORCID: 0000-0002-6881-2668 (S. Chandra); 0000-0002-1785- polarity [4]. Twitter sentiment analysis is 8586 (M. K. Gourisaria);0000-0003-3592-2931(H. GM.); 0000- 0002-3864-2127 (S.S. Rautaray); 0000-0002-6077-5794 (M. helpful to understand public temperament about Pandey); 0000-0002-4939-0797 (S.N. Mohanty) different social or cultural events and ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). forecasting the inconsistency within the stock CEUR Workshop Proceedings (CEUR-WS.org) exchange [5]. 123 Sentiment analysis on Twitter is a sort of ensemble by combining various classification challenge due to its short length. The techniques and feature sets. They used two unstructured and heterogeneous data compelled types of feature sets: word-relations and part- us to apply the preprocessing step before of-speech information and three types of feature extraction [6]. The various classifiers like maximum entropy, Support preprocessing steps include URLs removal, Vector Machines and Naïve Bayes to form the replacing negation, stopwords removal, ensemble framework. Weighted combination, removing numbers and expanding acronyms. fixed combination and meta-classifier The preprocessing has been done with the help ensemble techniques were used for sentiment of the Natural Language Tool Kit (NLTK). analysis and better accuracy was attained [9]. Then feature extraction is of two phases. First, People on social networking sites give their the normal text was formed by eliminating the opinion about anything and everything. It was a Twitter-specific features and then feature challenge to recognize all types of data for extraction was accomplished to extract more training. Therefore, [2] proposed a model to features [1]. study the sentiment from the hash tagged (HASH) data set, iSieve data set and the This research paper is organized into emoticon (EMOT) dataset. The authors trained different segments as follows. Section 2 briefs their model on a variety of feature extraction about related works of sentiment analysis. In techniques like lexicon features, part-of-speech Section 3 we talk about the methodology and (POS) features, n-gram features and materials which explains the data exploration, microblogging features. They concluded that in data preprocessing and feature extraction. We the microblogging domain, the POS feature have also described the different classification may not be useful and the benefits of the algorithms used in the implementation namely Emoticon dataset are also lessened when Support Vector Machine, Logistic Regression, microblogging features are included [2]. Logistic Regression - Stochastic Gradient Descent, Decision Tree, Naïve Bayes, The authors from the paper [10] discussed Bidirectional Long Short-Term Memory about social network analysis and Twitter being (BiLSTM) and Random Forest. In Section 4 we a rich source for sentiment analysis and show the results, analyses and comparison of proposed a model to implement Twitter models. Section 5 comprises the conclusion and sentiment analysis by fetching the data from future work. Twitter APIs. Their analysis is based on different queries of job opportunities. The 2. Related works dataset has positive, negative, and neutral labels. They noted that the neutral sentiments are high in comparison to positive or negative With the advancement of Natural Language which shows that there is a need to improve Processing (NLP), research on Sentiment Twitter sentiment analysis [10]. Twitter has Analysis ranges from document-level become increasingly popular in the field of classification [7] to words and phrase-level politics. A real-time sentiment analyzer classification [8]. The method to retrieve towards the incumbent of Ex. president Barack semantic information from a large corpus was Obama and the nine other challengers have presented by Hatzivassiloglou and McKeown. been designed by [11]. They used IBM’s This method separates domain-dependent InfoSphere Streams platform (IBM, 2012) for details and conforms to a novel domain when speed and accuracy and pipelining real-time the corpus is substituted. Their model focuses data. Using the Twitter “firehouse” they on adjectives, intending to identify near- constructed logical keyword combinations to synonyms and antonyms from their model. recover relatable tweets about candidates and For increasing the efficiency and accuracy events. They achieved an accuracy of 59% [11]. of the model [9] used the ensemble framework Some researchers have tried to determine for sentiment analysis. They utilized movies the public point of view on different subjects reviews and multi domain datasets extracted like politics, movies, news, etc. from the from Amazon product reviews which includes Twitter posts [12]. The authors of the paper [13] reviews of Books, Electronics, DVD and used IMDB, a popular Internet database Kitchen. They succeeded in framing the containing movie information and Blippr, a 124 social networking site where reviews are in the management. In this regard, [14] classified the form of ‘Iblips’. Their analysis gave the F-score public reviews of a hotel into positive and as high as 0.9 using SVM and demonstrated negative. They collected 800 reviews from domain adaptation as a useful technique for TripAdvisor and performed the preprocessing sentiment analysis. They introduced a new step by NLTK in Python. They used various feature reduction technique, Relative classifiers like Logistic Regression, Random Information Index (RII), which combines with Forest, Stochastic Gradient Descent Classifier, another popular technique ‘thresholding’ to Naïve Bayes and Support Vector Machine. form a good feature reduction technique that Their analysis was that Naïve Bays classifier not only reduces the features but also improves was best among them but Stochastic Gradient the F-score [13]. The importance of sentiment classifier also worked well. The analysis was analysis has increased so much that it has been based on the results of accuracy, recall, in use in various industries, such as hotel precision and F1-score [14]. Table 1 Tabular presentation of the related work Authors Year Dataset used Models implemented Observation/ Results [9] 2011 Movies review, They used two types of They observed that Multi domain feature sets: word-relations ensemble technique was dataset from and part-of-speech. very much efficient in extracted from Maximum entropy, Support obtaining the accurate Amazon which Vector Machines and Naïve results. includes reviews of Bayes. Weighted Books, DVD, combination, fixed Electronics and combination and meta- Kitchen. classifier ensemble techniques were also used. [2] 2011 Hash tagged The model was trained on The best result was (HASH) data set, a variety of feature extraction obtained from n-gram iSieve data set and techniques like lexicon features along with lexicon the Emoticon features, n-gram features, features. POS features may (EMOT) dataset part-of-speech (POS) features not be useful in and microblogging features. microblogging domain [10] 2019 The data was They used NLTK for find The concluded that the obtained from the different categories of the neutral tweets are Twitter API for tweets like positive, weakly significantly high in most different job positive, strongly positive, of the queries. Thereby opportunities neutral, negative, strongly showing the improvement queries. negative, weakly negative. in Sentiment Analysis. [11] 2012 The data was Designed a real-time They achieved an obtained from sentiment analyzer towards accuracy of 59%. Twitter API during the incumbent of Ex. the US presidential president Barack Obama and election in 2012. the nine other challengers. They used IBM’s InfoSphere Streams platform (IBM, 2012) for speed and accuracy and pipelining real-time data. Using the Twitter “firehouse” they constructed logical keyword combinations to recover relatable tweets about candidates and events. 125 [13] 2011 IMDB, a popular They used SVM and Their analysis gave the Internet database introduced a new feature F-score as high as 0.9 using containing movie reduction technique, Relative SVM and demonstrated information and Information Index (RII), domain adaptation as a Blippr, a social which combines with another useful technique for networking site popular technique sentiment analysis. where reviews are ‘thresholding’ to form a good in the form of feature reduction technique ‘Iblips’. that not only reduces the features but also improves the F-score [14] 2018 They classified They used various Their analysed that the public reviews classifiers like Logistic Naïve Bays classifier was of a hotel into Regression, Random Forest, best among them but positive and Stochastic Gradient Descent Stochastic Gradient negative by Classifier, Naïve Bayes and classifier also worked well. collecting 800 Support Vector Machine. The analysis was based on reviews from the results of accuracy, TripAdvisor recall, precision and F1- score. 3. Materials and methods The study of computer algorithms that improves automatically by learning from itself is known as machine learning. The data and output are fed into the machine learning model and the machine creates its programming logic to predict the result. The dataset is split into two halves i.e., training part, which contains input feature vectors and their labels, and the testing part. A classification model with the help of a specific algorithm is developed using the training part to observe a pattern. The testing Figure 1: Workflow for Twitter sentiment part is used to obtain the accuracy of the model, analysis which tells whether a model is a good fit, underfit or overfit. Exploring the data has a key role in machine learning as it helps us to visualize the types and statistics of data [16]. Here, the dataset consists 3.1. Data exploration of 0.8M positive and 0.8M negative tweets shown in Fig. 2 (a). As it is text data, the word The dataset used in this work was taken cloud can also be visualized, as shown in Fig. 2 from UCI/Kaggle [15] in csv (comma separated (b). values) which contains 1.6 million tweets. Preprocessing the data was done which includes tokenization, stemming, stopword removal to clean the text. A feature vector was created using relevant features. Data mining classification algorithms such as Decision Tree, Logistic Regression, Random Forest, SVM, Naive Bayes and LR-SGD classifiers were used to gather the accuracy by classifying the tweets into positive or negative tweets. Fig. 1 shows the algorithm adopted for sentiment analysis. 126 In feature extraction, the vector space model is used for document representation. A vector is created whose dimension is equal to the size of English vocabulary and each element is initially initialized to 0. If a text data features that vocab word, one ‘1’ will be put in that dimension, as shown in Eqn. 1, Eqn. 2 and Eqn. 3. Every time, a text that features the vocab word is encountered, the count will be increased, leaving 0’s everywhere for the words which were not found even once. The 2 nd Edition of the Oxford dictionary contains 171,476 words [17] in current use. So, if a vector is made with all these words the model will be of high variance and here feature selection comes into account. For proper weighting and feature extraction, the count vectorizer method was used which keeps track of the frequent terms as well as rare words. The vector space model Figure 2: (a) Statistics of positive and negative improves the accuracy. The feature extraction dataset (b) Word cloud of the dataset method is used for dimensionality reduction by removing the non-informative words and rare words. Bag of Words model is created which 3.2. Data preprocessing contains the most frequent words from the feature vector to improve the accuracy [1]. As the Twitter datasets are composed of unstructured, heterogeneous, ill-formed words, 𝐿𝑜𝑣𝑒 = [0,0,0,1,0,0,0 … … … … . .0] (1) irregular grammar and non-dictionary terms, the tweets were cleaned by various NLTK 𝐺𝑜𝑜𝑑 = [0,0,0,0,2,0,0 … … … … … . .0] (2) methods before feature extraction [1]. Preprocessing steps are [12] - (3) 𝐷𝑎𝑦 = [0,0,0,0,0,5,0 … … … … … … 0] • Eliminating all non-English characters and non-ASCII from the text. Other than Bag-of words model • Removal of all URL links as they do not Tokenization was also used for Bidirectional provide any information about the sentiment. Long Short-Term Memory, in which raw texts • Numbers are removed as they are not useful are broken up into unique texts i.e., tokens. in finding sentiment. Each of the tokens has its unique token id’s. In • Stop words are the most frequent words in a tokenisation, a vector is created with a size language, such as "as", "an", "about”, “any" equivalent to the number of unique words in the etc. There are many stopwords in English corpora. A sequence of tokens is created and literature. These stopwords do not play any they are represented as a vector as shown in role in finding the sentiments so they are Eqn. 4 and Eqn. 5. As each of the tweets has a removed from the dataset. different length so its token represented • Stopwords also contain “not”, but are not sequence has also a different length which removed from the tweets as they are crucial makes it difficult to feed into the Deep Learning in analyzing negative reviews. algorithms as it requires sequences of the same • Stemming is the process to bring back the length [18] . To counter this problem, padding words into their original form such as and truncating steps come into account where “loved” becomes “love”, “worst” becomes the length of the padded sequence is defined. If “bad” and so on. the length of the tokenised sequence is larger than the padded sequence then the tokens of the sequence after the length of the tokenised 3.3. Feature extraction sequence would be truncated, i.e., they are removed. If the length of the tokenised 127 sequence is smaller than the padded sequence is chosen to be 6 then Eqn. 4 will be truncated then the tokens of the sequence after the length as shown in Eqn. 6 and Eqn. 5 will be padded of the tokenized sequence would be padded as shown in Eqn. 7. with “0”. If the length of the padded sequence What consumes your mind controls your life = [32,13,21,122,781,45,23] (4) 𝑝𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑚𝑎𝑘𝑒𝑠 𝑎 𝑚𝑎𝑛 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 = [53,321,32,48,44] (5) What consumes your mind controls your life = [32,13,21,122,781,45] (6) 𝑝𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑚𝑎𝑘𝑒𝑠 𝑎 𝑚𝑎𝑛 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 = [53,321,32,48,44,0] (7) 3.4. Classification Algorithms Classification algorithms are the most important part of supervised learning in machine learning. The classification algorithm is used to indicate the class of the data. In this paper, classification algorithms play a crucial role in labeling the tweets positive or negative. 3.4.1. Bidirectional Long Short-Term Figure 3: Memory cell of LSTM Memory (BiLSTM) 𝑖𝑡 = 𝜎(𝑊𝑦𝑖 𝑦𝑡 + 𝑊𝑘𝑖 𝑘𝑡−1 + 𝑊𝑐𝑖 𝑐𝑡−1 (8) A traditional neural network can’t remember + 𝑏𝑖 ) the previous inputs, for predicting the next word previous information is a must. Recurrent 𝑜𝑡 = 𝜎(𝑊𝑦𝑜 𝑦𝑡 + 𝑊𝑘𝑜 𝑘𝑡−1 + 𝑊𝑐𝑜 𝑐𝑡−1 (9) Neural Network (RNN) has the potential of + 𝑏𝑜 ) remembering everything from the past as they have the loop and hidden layer in them. The 𝑓𝑡 = 𝜎(𝑊𝑦𝑓 𝑦𝑡 + 𝑊𝑘𝑓 𝑘𝑡−1 + 𝑊𝑐𝑓 𝑐𝑡−1 (10) loops in RNN allows the network to persist + 𝑏𝑓 ) information. Recurrent neural network translates the independent activations to 𝑐𝑡 = 𝑓𝑡 𝑐𝑡−1 + 𝑖𝑡 𝑡𝑎𝑛ℎ(𝑊𝑦𝑐 𝑦𝑡 (11) dependent activations by furnishing equal + 𝑊𝑘𝑐 𝑘𝑡−1 + 𝑏𝑐 ) biases and weights to complete layers, thus the complexity of increasing the parameters is 𝑘𝑡 = 𝑜𝑡 tanh (𝑐𝑡 ) (12) reduced and the result of one layer is the input Where 𝜎 represents a logistic sigmoid to the following hidden layers [19]. Long Short- function, c, o, i and f represent cell vectors, Term Memory (LSTM) is a special form of output, input and forget gate. These have the Recurrent Neural Network (RNN) which has same dimension as the hidden vector k [19]. the potential to learn long-term dependencies. LSTMs are accomplished to abstain from the Bidirectional Long Short-Term Memory long dependencies problem. In LSTM the (BiLSTM) is an extension of LSTM, which can hidden layer of RNN is restored by the Long be designed by putting two independent LSTM. Short-Term Memory cell. The LSTM memory The structure permits the neural network to cell can be achieved by the Eqn. 8-12. have both forward and backward information at every time step. This will run the data in two ways, one from future to past and one from past to future so by this method the model will be able to preserve information from both the 128 future and past. Fig. 4 shows the Bidirectional LSTM [20]. Figure 4: A Bidirectional LSTM Network 3.4.2. Logistic regression Figure 6: Logistic Regression curve Logistic regression is an example of a linear classifier that is used to classify the class of 3.4.3. Logistic Regression-Stochastic data. Logistic regression determines the link between the independent and dependent Gradient Descent Classifier variables by estimating probabilities [16]. It returns the probability by transforming the Logistic Regression-Stochastic Gradient output with the help of the logistic sigmoid Descent (LR-SGD) is a type of linear model, function. Fig. 5 shows the linear regression known as Incremental Gradient Descent [14]. graph and its equation is given by Eqn. 13 as, Logistic Regression-Stochastic Gradient Descent (LR-SGD) classifier is an effective 𝑌 = 𝐵0 + 𝐵1 𝑋 (13) way to selective learning of linear classifiers The equation of sigmoid function [22] is, under different loss functions and penalties such as Logistic Regression and Support Vector 1 (14) 𝑃= Machines. The ‘log’ loss function is used to 1+𝑒 −𝑦 optimize Logistic Regression while the ‘hinge’ Now, applying Eqn. 14 to Eqn. 13 and solving loss function is used for optimizing the Support for 𝑦 to get Eqn. 15 i.e., logistic regression Vector Machine. LR-SGD Classifier has equation recently gained much significance in the field of large-scale learning although it has been 𝑃 (15) ln ( ) = 𝐵0 + 𝐵1 𝑋 around in the machine learning association for 1−𝑃 a long time [21]. The sparse and large-scale The graph is now converted into a logistic machine learning problems, which can be regression graph shown in Fig. 6. encountered in sentiment analysis, often make use of the LR-SGD classifier and this fact motivated us to use the LR-SGD classifier in our problem with 1.6M tweets [22]. One of the strengths of the LR-SGD classifier is the hyperparameter tuning which can be used to solve error functions also called the cost function. 3.4.4. Support Vector Machine The Support Vector Machine can be regarded as a linear model for regression and classification tasks [23]. The Support Vector Figure 5: Linear regression graph Machine finds the optimal separable hyperplane to separate the tweets into two parts [24]. It is applied to noisy data. The hyperplane line separates the tweets in a very efficient way shown in Fig. 7. Support Vectors are the 129 locations which are quite close to the line from model in classifier the equation is [16], [29] both the classes. The distance between them is given by Eqn. 18 as, often called a margin [25]. The Support Vector 𝑛 Machine is easier to implement and scales well 𝑁𝑖 𝑀 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑀 𝑃(𝑀) ∏ 𝑃( ) (18) for high dimensional data. It is implemented 𝑀 𝑖=1 with kernels that transform non-separable problems into separable problems by adding more dimensions to it. The most commonly 3.4.6. Decision Tree Classifier used kernel is the Radial Basis Function (RBF) kernel. Mathematically, it can be defined by A feasible approach to the multistage Eqn. 16, decision is to use the Decision Tree classifier [30]. In the multistage approach complex 2 𝑃(𝑦, 𝑦𝑖 ) = 𝑒 (−𝑔𝑎𝑚𝑚𝑎∗𝑠𝑢𝑚(𝑦−𝑦𝑖 ) ) (16) decisions are broken up into several simple decisions to obtain the desired solution. A complete multistage recognition has been reviewed by [31]. It is used where data is regularly split. Decision Tree can be applied for both - regression models to predict the continuous value and classification models for predicting probability. As our model is a binary classifier having positive and negative labels, the Decision Tree classifier has been implemented [32]. It is robust, easy and simple to implement and not sensitive to irrelevant features [33]. Fig. 8 (a) shows how the dataset Figure 7: SVM classifier graph showing was split into different categories using the hyperplane Decision Tree classifier and (b) demonstrates a general Decision Tree. 3.4.5. Naïve Bayes Classifier Naïve Bayes [26] is the most common supervised machine learning technique for classification. It is also known as the probabilistic classification technique as it is based on probability [27]. It is completely dependent on the famous probability theorem i.e., Bayes’ theorem. Bayes’ theorem is correlated to conditional probability. It finds the probability of an occurring event when the probability of another occurred event is already given [27]. Mathematically, it can be stated by Eqn. 17, 𝑀 𝑃(𝑀)𝑃(𝑁⁄𝑀) 𝑃( ) = (17) 𝑁 𝑃(𝑁) 𝑀 Where, 𝑃 ( ) refers to posterior i.e. 𝑁 probability of M when N is given, 𝑃(𝑁⁄𝑀) represents likelihood i.e., probability of N when M true, P(M) is the prior i.e., probability of M Figure 8: (a) Portioning of a two-dimensional and P(N) represents marginalization i.e. feature space (b) Overview of a Decision Tree probability of N [28]. After implementing the 3.4.7. Random Forest Classifier 130 The Random Forest classifier is a supervised been evaluated to validate and verify the quality ML technique and a very popular classifier. Just of the results [35], [36]. The confusion matrices like the Decision Tree, it can also be for various classifiers have been shown in Fig. implemented on both classification and 10, Fig. 11, Fig. 12 and Fig. 13. Table 2 regression models. It is an ensemble learning compares the different classification models method of classification that builds a set of based on these evaluating metrics. Fig. 14 multiple decision trees from the training data graphically depicts the performance of the and outputs mode of class [34]. It is used in different classifiers concerning the accuracy, applications like search engines, image recall, precision, F1-score and AUC score. classification, etc. It constructs a decision tree from each sample and gives the output. The best solution is selected by voting. It is easier to implement, fast and scalable but it easily overfits the data [34]. Fig. 9 shows the complete sketch of the Random Forest classifier. Figure 10: Confusion matrix of (a) Logistic Regression (b) Support Vector Machine Figure 9: Overview of Random Forest classifier Figure 11: Confusion matrix of (a) Naïve Bayes (b) LR-SGD classifier 4. Implementation and result The dataset was collected from Kaggle. Implementation was done on Python and NLTK was used for cleaning and training the model. The various classifiers used are Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, LR-SGD classifier, Bidirectional Long Short-Term Memory and Figure 12: Confusion matrix of (a) Random Decision Tree. The dataset consists of 1.6 Forest (b) Decision Tree classifier million out of which 1,280,000 were used for training and 320,000 for testing [15]. Evaluating the models is very important for observing the performance and correctness of the different models on the test data and finding the best among them. The performance of a classifier can be described by the confusion matrix on a set of data for which true values are known. With the help of the confusion matrix, Figure 13: Confusion matrix of Bidirectional different evaluating metrics such as accuracy, Long Short-Term Memory recall, precision, F1-score and AUC score have Table 2 Performance measure of various classifiers 131 Accuracy Recall Precision F-1 Score AUC Score Bidirectional LSTM 0.7890 0.7889 0.7891 0.7889 0.78904 Logistic Regression 0.7249 0.7249 0.7272 0.7242 0.72489 Naïve Bayes 0.7124 0.7124 0.7133 0.7121 0.71239 LR-SGDC 0.7209 0.7208 0.7274 0.7189 0.72082 SVM 0.7245 0.7244 0.7274 0.7236 0.72445 Decision Tree 0.6849 0.6849 0.685 0.6849 0.6849 Random Forest 0.7129 0.7129 0.7131 0.7128 0.71286 Accuracy: It is the percentage of tweets that have been classified correctly by the model. The accuracy of the model can be calculated using Eqn. 19. 𝑇𝑃 + 𝑇𝑁 (19) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 Precision: It is the ratio of actual positive tweets to predicted positive tweets. The precision of the model can be calculated using Eqn. 20. Figure 14: Performance graph of different 𝑇𝑃 (20) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = classifiers 𝑇𝑃 + 𝐹𝑃 Recall: It is the ratio of predicted positive tweets to total positive tweets. The recall of the model can be calculated using Eqn. 21. 𝑇𝑃 (21) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 + 𝐹𝑁 F1-score: F1-score can be defined as the harmonic mean of recall and precision. The F- measure of the model can be calculated using Eqn. 22. 𝑃∗𝑅 (22) 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 𝑃+𝑅 Figure 15: ROC Curve of various Classifiers Where, TP is the True Positive, TN refers to True Negative, FP is the False Positive, FN The Receiver Operating Characteristic means False Negative, P refers to Precision and curve (ROC) curve is a tool which predicts the R is the Recall. probabilistic value of binary outcome [37]. The relationship between the sensitivity which is the AUC score: AUC score can be calculated by true positive rate and the specificity which is the finding the area under the ROC curve [11]. The false positive rate is represented graphically by AUC score of the model can be calculated using the ROC curve. It is a significant metric as it Eqn. 23. covers the whole spectrum between zero and 𝑆𝑃 − 𝑃𝐸 (𝑁𝑂 + 1)⁄2 one. The true positive rate is exactly equal to (23) 𝐴𝑈𝐶 = the false positive rate at 0.5, and this represents 𝑃𝐸 ∗ 𝑁𝑂 a random or no skilled classifier [38]. The AUC Where, SP is the Sum of positive score can be calculated by finding the area observations, PE refers to Positive Examples under the ROC curve. The ROC curves for and NO is the Negative Observations. different classifiers have been plotted in Fig. 15. With the help of the confusion matrix of the various classifiers showing the values of a true negative, true positive, false negative and false 132 positive, we have calculated precision, There are various methods of machine accuracy, F1-score, recall and roc-auc score as learning, symbolic and deep learning for the shown in Table 2. In this paper, we have analysis of the tweets or reviews. But machine compared various classifiers like Random learning techniques are most common, efficient Forest, Logistic regression, Support Vector and simpler than others. In this paper, machine Machine, Decision Tree, LR-SGDC and Naïve learning techniques were used for the analysis Bayes with the state-of-the-art approach Bi- of tweets on a Twitter dataset. The tweets were LSTM. On observing the results of Table 2 it cleaned in the preprocessing step by removing had been found that the Bidirectional LSTM the stopwords, URL, numbers and various was the best classifier with an accuracy of Twitter-specific features with the help of 78.90%, and decision tree came out as runner NLTK. To deal with the miss-spelling and non- up with an accuracy of 72.49%, followed by informative words, feature extraction was done Support Vector Machine and LR-SGDC and a Bag of Words model was created with the Classifier with an accuracy of 72.45% and most frequent words. The tweets were, then, 72.09% respectively. Random Forest and Naïve classified into positive and negative by various Bayes also predicted well with an accuracy of classifiers like LR-SGD Classifier, Naïve 71.29% and 71.24%. it was also observed that Bayes, Random Forest, Logistic Regression, decision tree classifier didn’t came up to the SVM, Bidirectional LSTM and Decision Tree. expectation with just an accuracy of 68.49%. By observing the ROC curve and accuracy On examining carefully, it can be observed that score, it was clear that Bidirectional LSTM is prediction of true positive class with respect to the best classifier with an accuracy of 78.90%. predicted positive class i.e., precision score of Hence, it was found that Bidirectional LSTM is Bi-LSTM was also highest among all with a very useful in finding sentiment analysis. precision score of 78.91%. LR-SGDC and The model can be implemented in a website SVM classifier were the runner ups with a or Android applications for classifying the precision score of 72.74% for each, followed by sentiments of people on different subjects. As Logistic Regression with 72.72% precision the microblogging sites are blooming, score. Naïve Bayes and Random Forest sentiment analysis is very important for many classifier also predicted the positive class well organizations in implicating social intelligence with a precision score of 71.33% and 71.31% and social media analytics. respectively. The precision score of decision tree classifier was least with a score of 68.5%. The future of this research paper is to Prediction of true positive class with respect to explore the data on a wider genre of different actual positive class i.e., recall score of Bi- social networking sites and e-commencing sites LSTM was best with score of 78.89%, with where people do online shopping for many Logistic Regression as the runner up with sore things like books, games, etc. Accuracy rates of of 72.49% followed by SVM, LR-SGDC, these products can be found by sentiment Random Forest and Naïve Bayes with score of analysis. It can also be implemented to build the 72.44%, 72.08%, 71.29% and 71.24% human confidence model. respectively. Even here, Decision Tree was not as good with precision score of 68.49%. The 6. Conflict of interest F1-score and AUC score of Bi-LSTM was best of among all the classifiers. All these results of There is no conflict of interest. various classifiers can be visualized graphically as shown in the Fig. 14. Fig. 15 depicts the ROC curve of all the classifiers implemented in our 7. Acknowledgement experiments, which also shows that Bi-LSTM is the best classifier. The model can also be very I would like to express my heartiest gratitude to useful for analyzing the tweets related to all the co-authors and special thanks to Prof. medical data [39], [40],[41], [42], [43], [44]. Mahendra Kumar Gourisaria and Mr. Harshvardhan GM who have been a constant 5. Conclusion source of knowledge, inspiration and support. I would equally thank my parents and friends who inspired me to remain focused and helped me to complete this research paper. 133 8. References Technol., vol. 6, no. 11, pp. 2344–2350, 2019. [11] H. Wang, D. Can, A. Kazemzadeh, F. [1] Z. Jianqiang, G. Xiaolin, and Z. Xuejun, Bar, and S. Narayanan, “A System for “Deep Convolution Neural Networks Real-time Twitter Sentiment Analysis for Twitter Sentiment Analysis,” IEEE of 2012 U.S. Presidential Election Access, vol. 6, pp. 23253–23260, 2018, Cycle,” Proc. 50th Annu. Meet. Assoc. doi: 10.1109/ACCESS.2017.2776930. Comput. Linguist., no. July, pp. 115– [2] E. Kouloumpis, T. Wilson, and J. 120, 2012, doi: Moore, “Twitter sentiment analysis: 10.1145/1935826.1935854. The good the bad and the omg!. In Fifth [12] M. S. Neethu and R. Rajasree, International AAAI conference on “Sentiment analysis in twitter using weblogs and social media,” in machine learning techniques,” 2013 4th Proceedings of the Fifth International Int. Conf. Comput. Commun. Netw. AAAI Cinference on Weblogs and Social Technol. ICCCNT 2013, 2013, doi: Media, 2011, pp. 538–541. 10.1109/ICCCNT.2013.6726818. [3] A. Hassan, A. Abbasi, and D. Zeng, [13] V. M. K. Peddinti and P. Chintalapoodi, “Twitter Sentiment Analysis: A “Domain adaptation in sentiment Bootstrap Ensemble Framework,” in analysis of twitter,” AAAI Work. - Tech. 2013 International Conference on Rep., vol. WS-11-05, pp. 44–49, 2011. Social Computing, Sep. 2013, pp. 357– [14] N. Lokeswari and K. Amaravathi, 364, doi: 10.1109/SocialCom.2013.56. “Comparative Study of Classification [4] M. Thelwall, K. Buckley, and G. Algorithms in Sentiment Analysis,” Int. Paltoglou, “Sentiment strength Res. J. Sci. Eng. Technol., vol. 4, no. 8, detection for the social web,” J. Am. pp. 31–39, 2018. Soc. Inf. Sci. Technol., vol. 63, no. 1, pp. [15] Kaggle.com, “Sentiment140 dataset 163–173, 2012, doi: 10.1002/asi.21662. with 1.6 million tweets,” 2015. [5] A. Mittal and A. Goel, “Stock prediction [Online]. Available: using twitter sentiment analysis,” 2012. https://www.kaggle.com/kazanova/sent [6] Z. Jianqiang and G. Xiaolin, iment140. “Comparison research on text pre- [16] S. Das, R. Sharma, M. K. Gourisaria, S. processing methods on twitter sentiment S. Rautaray, and M. Pandey, “Heart analysis,” IEEE Access, vol. 5, no. c, pp. disease detection using core machine 2870–2879, 2017, doi: learning and deep learning techniques: 10.1109/ACCESS.2017.2672677. A comparative study,” Int. J. Emerg. [7] B. Pang and L. Lee, “Opinion mining Technol., vol. 11, no. 3, pp. 531–538, and sentiment analysis,” Found. Trends 2020. Inf. Retr., vol. 2, no. 1–2, pp. 1–135, [17] Wil, “How many words are in the 2008, doi: 10.1561/1500000011. English language?,” English Live, 2018. [8] V. Hatzivassiloglou and K. R. [Online]. Available: McKeown, “Predicting the semantic https://wordcounter.io/blog/how-many- orientation of adjectives,” in words-are-in-the-english-language/. Proceedings of the eighth conference on [18] Z. Jiang, L. Li, D. Huang, and L. Jin, European chapter of the Association for “Training word embeddings for deep Computational Linguistics -, 1997, pp. learning in biomedical text mining 174–181, doi: 10.3115/979617.979640. tasks,” Proc. - 2015 IEEE Int. Conf. [9] R. Xia, C. Zong, and S. Li, “Ensemble Bioinforma. Biomed. BIBM 2015, pp. of feature sets and classification 625–628, 2015, doi: algorithms for sentiment classification,” 10.1109/BIBM.2015.7359756. Inf. Sci. (Ny)., vol. 181, no. 6, pp. 1138– [19] Z. Huang, W. Xu, and K. Yu, 1152, 2011, doi: “Bidirectional LSTM-CRF Models for 10.1016/j.ins.2010.11.023. Sequence Tagging,” 2015, [Online]. [10] A. Baweja and P. Garg, “Sentimental Available: Analysis of Twitter Data for Job http://arxiv.org/abs/1508.01991. Opportunities,” Int. Res. J. Eng. [20] A. Graves and J. Schmidhuber, 134 “Framewise phoneme classification models in machine learning,” Comput. with bidirectional LSTM and other Sci. Rev., vol. 38, 100285, Nov. 2020, neural network architectures,” Neural doi: 10.1016/j.cosrev.2020.100285. Networks, vol. 18, no. 5–6, pp. 602–610, [30] S. Rasoul and L. David, “A Survey of 2005, doi: Decision Tree Classifier Methodology,” 10.1016/j.neunet.2005.06.042. IEEE Trans. Syst. Man. Cybern., vol. [21] S. Shalev-Shwartz and S. Ben-David, 21, no. 3, pp. 660–674, 1991. Understanding machine learning: From [31] G. R. Dattatreya and L. N. Kanal, theory to algorithms. Cambridge “Decision Trees in Pattern University Press, 2014. Recognition,” in Progress in pattern [22] M. Thenuwara and H. R. K. recognition 2, 1985, pp. 189–239. Nagahamulla, “Offline handwritten [32] S. Nayak, M. K. Gourisaria, M. Pandey, signature verification system using and S. S. Rautaray, “Prediction of Heart random forest classifier,” in 17th Disease by Mining Frequent Items and International Conference on Advances Classification Techniques,” in 2019 in ICT for Emerging Regions, ICTer International Conference on Intelligent 2017 - Proceedings, 2017, vol. 2018- Computing and Control Systems Janua, pp. 191–196, doi: (ICCS), May 2019, pp. 607–611, doi: 10.1109/ICTER.2017.8257828. 10.1109/ICCS45141.2019.9065805. [23] W. Yu, T. Liu, R. Valdez, M. Gwinn, [33] Wikipedia contributors. Decision tree and M. J. Khoury, “Application of learning. Wikipedia, The Free support vector machine modeling for Encyclopedia,https://en.wikipedia.org/ prediction of common diseases: The wiki/Decision_tree_learning, Last case of diabetes and pre-diabetes,” BMC accessed 2020/8/30. Med. Inform. Decis. Mak., vol. 10, no. 1, [34] A. Gupte, S. Joshi, P. Gadgul, and A. 2010, doi: 10.1186/1472-6947-10-16. Kadam, “Comparative Study of [24] S. Nayak, M. Kumar Gourisaria, M. Classification Algorithms used in Pandey, and S. Swarup Rautaray, “Heart Sentiment Analysis,” Int. J. Comput. Disease Prediction Using Frequent Item Sci. Inf. Technol., vol. 5, no. 5, pp. Set Mining and Classification 6261–6264, 2014. Technique,” Int. J. Inf. Eng. Electron. [35] A. Giachanou and F. Crestani, “Like It Bus., vol. 11, no. 6, pp. 9–15, 2019, doi: or Not,” ACM Comput. Surv., vol. 49, 10.5815/ijieeb.2019.06.02. no. 2, pp. 1–41, Nov. 2016, doi: [25] S. Ghumbre, C. Patil, and A. Ghatol, 10.1145/2938640. “Heart Disease Diagnosis using Support [36] G. Gautam and D. Yadav, “Sentiment Vector Machine,” Int. Conf. Comput. analysis of twitter data using machine Sci. Inf. Technol., pp. 84–88, 2011. learning approaches and semantic [26] S. Nayak, M. K. Gourisaria, M. Pandey, analysis,” in 2014 7th International and S. S. Rautaray, “Comparative Conference on Contemporary Analysis of Heart Disease Classification Computing, IC3 2014, 2014, pp. 437– Algorithms Using Big Data Analytical 442, doi: 10.1109/IC3.2014.6897213. Tool,” 2020, pp. 582–588. [37] B. Jason, “How to Use ROC Curves and [27] V. S and D. S, “Data Mining Precision-Recall Curves for Classification Algorithms for Kidney Classification in Python,” Machine Disease Prediction,” Int. J. Cybern. Learning Mastery. pp. 1–48, 2018. Informatics, vol. 4, no. 4, pp. 13–25, [38] A. H. Hossny, L. Mitchell, N. Lothian, 2015, doi: 10.5121/ijci.2015.4402. and G. Osborne, “Feature selection [28] Wikipedia contributors. "Bayes methods for event detection in Twitter: Theorem". Wikipedia, The Free a text mining approach,” Soc. Netw. Encyclopedia. Anal. Min., vol. 10, no. 1, 2020, doi: https://en.wikipedia.org/wiki/Bayes'_th 10.1007/s13278-020-00658-3. eorem, Last accessed 2020/8/28. [39] S. Dey, M. K. Gourisaria, S. S. Rautray, [29] H. GM, M. K. Gourisaria, M. Pandey, and M. Pandey, “Segmentation of and S. S. Rautaray, “A comprehensive Nuclei in Microscopy Images Across survey and analysis of generative Varied Experimental Systems,” 2021, 135 pp. 87–95. [40] R. Sharma, S. Das, M. K. Gourisaria, S. [43] S. Sharma, M. K. Gourisaria, S. S. S. Rautaray, and M. Pandey, “A Model Rautray, M. Pandey, and S. S. Patra, for Prediction of Paddy Crop Disease “ECG Classification using Deep Using CNN,” 2020, pp. 533–543. Convolutional Neural Networks and [41] M. K. Gourisaria, S. Das, R. Sharma, S. Data Analysis,” Int. J. Adv. Trends S. Rautaray, and M. Pandey, “A deep Comput. Sci. Eng., no. 9, pp. 5788– learning model for malaria disease 5795, 2020. detection and analysis using deep [44] G. Jee, H. GM and M. K. convolutional neural networks,” Int. J. Gourisaria, “Juxtaposing inference Emerg. Technol., vol. 11, no. 2, pp. 699– capabilities of deep neural models over 704, 2020. posteroanterior chest radiographs [42] S. S. Rautaray, S. Dey, M. Pandey, and facilitating COVID-19 detection,” J. of M. K. Gourisaria, “Nuclei segmentation Interdisciplinary Mathematics, pp. 1- in cell images using fully convolutional 27, 2021, neural networks,” Int. J. Emerg. doi: 10.1080/09720502.2020.1838061 Technol., vol. 11, no. 3, pp. 731–737, 2020.