<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Analysis of Sentiments through Web-Mined Twitter Corpus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Satish Chandra</string-name>
          <email>schandra1.sc@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahendra Kumar Gourisaria</string-name>
          <email>mkgourisaria2010@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harshvardhan Rautaray</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manjusha Pandey</string-name>
          <email>manjushafcs@kiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sachi Nandan Mohanty</string-name>
          <email>sachinandan09@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Siddharth Swarup</string-name>
          <email>siddharthfcs@kiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Engineering, KIIT Deemed to be University</institution>
          ,
          <addr-line>Bhubaneswar-751024, Odisha</addr-line>
          ,
          <institution>India Dept of Computer Science &amp; Engineering, ICFAITech, ICFAI Foundation for Higher Education</institution>
          ,
          <addr-line>Hyderabad-500082</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>122</fpage>
      <lpage>135</lpage>
      <abstract>
        <p>A huge amount of textual data is generated due to the boom of microblogging. Microblogging sites such as Facebook, Twitter and Google+ are used by millions of people to express their views and emotions on different subjects. In this paper, we discuss sentiment analysis on a Twitter dataset having various tweets from different users. Sentiment analysis is useful for gaining the opinion of people using large volumes of text data where texts are highly unstructured and heterogeneous. In this paper, different classification techniques like Support Vector Machine, Logistic Regression, Logistic Regression with Stochastic Gradient Descent optimizer, Decision Tree Classification, Naive Bayes, Bidirectional LSTM and Random Forest Classification have been applied to analyze the sentiment of people, i.e., whether their tweets are positive or negative. The corpus has been analyzed by plotting descriptive insights such as the word cloud and frequency of positive and negative tweets. The best classifier was selected by comparing the results of accuracy, recall, precision, F1 score, AUC score and ROC curve.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment Analysis</kwd>
        <kwd>Twitter</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Word2Vec</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Logistic Regression</kwd>
        <kwd>Random Forest</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the universality of microblogging and
social networking sites, Twitter, with 319
million monthly users has now become a
valuable resource for several individuals and
organizations for posting blogs and expressing
their views and opinions on different subjects
like politics, sports, movies, etc. [1]. Stimulated
by the growth of social media, many companies
and media organizations are trying to mine
Twitter to observe people’s views to understand
what they feel and think about their products
[2]. As a result, sentiment analysis on Twitter is
an effective way of reckoning public opinion.
Sentiment analysis provides the potential of
observing numerous social networking sites in
real-time.</p>
      <p>Twitter has a limitation of 140 characters [3]
in each tweet, which causes individuals to use
phrases in their tweets. Sentiment Analysis
automatically detects whether a text section
contains emotions or opinioned content. It also
determines the polarity of the text. Generally,
the dataset consists of a group of tweets where
each tweet is interpreted with a sentiment label.
Commonly sentiments are labeled positive,
negative or neutral. However, some datasets
have mixed or irrelevant tags too, which ranges
from -5 to 5 and depicting negative to positive
polarity [4]. Twitter sentiment analysis is
helpful to understand public temperament about
different social or cultural events and
forecasting the inconsistency within the stock
exchange [5].</p>
      <p>Sentiment analysis on Twitter is a sort of
challenge due to its short length. The
unstructured and heterogeneous data compelled
us to apply the preprocessing step before
feature extraction [6]. The various
preprocessing steps include URLs removal,
replacing negation, stopwords removal,
removing numbers and expanding acronyms.
The preprocessing has been done with the help
of the Natural Language Tool Kit (NLTK).
Then feature extraction is of two phases. First,
the normal text was formed by eliminating the
Twitter-specific features and then feature
extraction was accomplished to extract more
features [1].</p>
      <p>This research paper is organized into
different segments as follows. Section 2 briefs
about related works of sentiment analysis. In
Section 3 we talk about the methodology and
materials which explains the data exploration,
data preprocessing and feature extraction. We
have also described the different classification
algorithms used in the implementation namely
Support Vector Machine, Logistic Regression,
Logistic Regression - Stochastic Gradient
Descent, Decision Tree, Naïve Bayes,
Bidirectional Long Short-Term Memory
(BiLSTM) and Random Forest. In Section 4 we
show the results, analyses and comparison of
models. Section 5 comprises the conclusion and
future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>With the advancement of Natural Language
Processing (NLP), research on Sentiment
Analysis ranges from document-level
classification [7] to words and phrase-level
classification [8]. The method to retrieve
semantic information from a large corpus was
presented by Hatzivassiloglou and McKeown.
This method separates domain-dependent
details and conforms to a novel domain when
the corpus is substituted. Their model focuses
on adjectives, intending to identify
nearsynonyms and antonyms from their model.</p>
      <p>
        For increasing the efficiency and accuracy
of the model [
        <xref ref-type="bibr" rid="ref1">9</xref>
        ] used the ensemble framework
for sentiment analysis. They utilized movies
reviews and multi domain datasets extracted
from Amazon product reviews which includes
reviews of Books, Electronics, DVD and
Kitchen. They succeeded in framing the
ensemble by combining various classification
techniques and feature sets. They used two
types of feature sets: word-relations and
partof-speech information and three types of
classifiers like maximum entropy, Support
Vector Machines and Naïve Bayes to form the
ensemble framework. Weighted combination,
fixed combination and meta-classifier
ensemble techniques were used for sentiment
analysis and better accuracy was attained [
        <xref ref-type="bibr" rid="ref1">9</xref>
        ].
People on social networking sites give their
opinion about anything and everything. It was a
challenge to recognize all types of data for
training. Therefore, [2] proposed a model to
study the sentiment from the hash tagged
(HASH) data set, iSieve data set and the
emoticon (EMOT) dataset. The authors trained
their model on a variety of feature extraction
techniques like lexicon features, part-of-speech
(POS) features, n-gram features and
microblogging features. They concluded that in
the microblogging domain, the POS feature
may not be useful and the benefits of the
Emoticon dataset are also lessened when
microblogging features are included [2].
      </p>
      <p>The authors from the paper [10] discussed
about social network analysis and Twitter being
a rich source for sentiment analysis and
proposed a model to implement Twitter
sentiment analysis by fetching the data from
Twitter APIs. Their analysis is based on
different queries of job opportunities. The
dataset has positive, negative, and neutral
labels. They noted that the neutral sentiments
are high in comparison to positive or negative
which shows that there is a need to improve
Twitter sentiment analysis [10]. Twitter has
become increasingly popular in the field of
politics. A real-time sentiment analyzer
towards the incumbent of Ex. president Barack
Obama and the nine other challengers have
been designed by [11]. They used IBM’s
InfoSphere Streams platform (IBM, 2012) for
speed and accuracy and pipelining real-time
data. Using the Twitter “firehouse” they
constructed logical keyword combinations to
recover relatable tweets about candidates and
events. They achieved an accuracy of 59% [11].</p>
      <p>Some researchers have tried to determine
the public point of view on different subjects
like politics, movies, news, etc. from the
Twitter posts [12]. The authors of the paper [13]
used IMDB, a popular Internet database
containing movie information and Blippr, a
social networking site where reviews are in the
form of ‘Iblips’. Their analysis gave the F-score
as high as 0.9 using SVM and demonstrated
domain adaptation as a useful technique for
sentiment analysis. They introduced a new
feature reduction technique, Relative
Information Index (RII), which combines with
another popular technique ‘thresholding’ to
form a good feature reduction technique that
not only reduces the features but also improves
the F-score [13]. The importance of sentiment
analysis has increased so much that it has been
in use in various industries, such as hotel
management. In this regard, [14] classified the
public reviews of a hotel into positive and
negative. They collected 800 reviews from
TripAdvisor and performed the preprocessing
step by NLTK in Python. They used various
classifiers like Logistic Regression, Random
Forest, Stochastic Gradient Descent Classifier,
Naïve Bayes and Support Vector Machine.
Their analysis was that Naïve Bays classifier
was best among them but Stochastic Gradient
classifier also worked well. The analysis was
based on the results of accuracy, recall,
precision and F1-score [14].</p>
      <sec id="sec-2-1">
        <title>Their analysis gave the</title>
        <p>F-score as high as 0.9 using
SVM and demonstrated
domain adaptation as a
useful technique for
sentiment analysis.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Their analysed that</title>
        <p>Naïve Bays classifier was
best among them but
Stochastic Gradient
classifier also worked well.
The analysis was based on
the results of accuracy,
recall, precision and
F1score.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>The study of computer algorithms that
improves automatically by learning from itself
is known as machine learning. The data and
output are fed into the machine learning model
and the machine creates its programming logic
to predict the result. The dataset is split into two
halves i.e., training part, which contains input
feature vectors and their labels, and the testing
part. A classification model with the help of a
specific algorithm is developed using the
training part to observe a pattern. The testing
part is used to obtain the accuracy of the model,
which tells whether a model is a good fit,
underfit or overfit.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Data exploration</title>
      <p>The dataset used in this work was taken
from UCI/Kaggle [15] in csv (comma separated
values) which contains 1.6 million tweets.
Preprocessing the data was done which
includes tokenization, stemming, stopword
removal to clean the text. A feature vector was
created using relevant features. Data mining
classification algorithms such as Decision Tree,
Logistic Regression, Random Forest, SVM,
Naive Bayes and LR-SGD classifiers were used
to gather the accuracy by classifying the tweets
into positive or negative tweets. Fig. 1 shows
the algorithm adopted for sentiment analysis.</p>
      <p>Exploring the data has a key role in machine
learning as it helps us to visualize the types and
statistics of data [16]. Here, the dataset consists
of 0.8M positive and 0.8M negative tweets
shown in Fig. 2 (a). As it is text data, the word
cloud can also be visualized, as shown in Fig. 2
(b).</p>
    </sec>
    <sec id="sec-5">
      <title>Data preprocessing</title>
      <p>As the Twitter datasets are composed of
unstructured, heterogeneous, ill-formed words,
irregular grammar and non-dictionary terms,
the tweets were cleaned by various NLTK
methods before feature extraction [1].
Preprocessing steps are [12]
• Eliminating all non-English characters and
non-ASCII from the text.
• Removal of all URL links as they do not
provide any information about the sentiment.
• Numbers are removed as they are not useful
in finding sentiment.
• Stop words are the most frequent words in a
language, such as "as", "an", "about”, “any"
etc. There are many stopwords in English
literature. These stopwords do not play any
role in finding the sentiments so they are
removed from the dataset.
• Stopwords also contain “not”, but are not
removed from the tweets as they are crucial
in analyzing negative reviews.
• Stemming is the process to bring back the
words into their original form such as
“loved” becomes “love”, “worst” becomes
“bad” and so on.
3.3.</p>
    </sec>
    <sec id="sec-6">
      <title>Feature extraction</title>
      <p>In feature extraction, the vector space model is
used for document representation. A vector is
created whose dimension is equal to the size of
English vocabulary and each element is initially
initialized to 0. If a text data features that vocab
word, one ‘1’ will be put in that dimension, as
shown in Eqn. 1, Eqn. 2 and Eqn. 3. Every time,
a text that features the vocab word is
encountered, the count will be increased,
leaving 0’s everywhere for the words which
were not found even once. The 2nd Edition of
the Oxford dictionary contains 171,476 words
[17] in current use. So, if a vector is made with
all these words the model will be of high
variance and here feature selection comes into
account. For proper weighting and feature
extraction, the count vectorizer method was
used which keeps track of the frequent terms as
well as rare words. The vector space model
improves the accuracy. The feature extraction
method is used for dimensionality reduction by
removing the non-informative words and rare
words. Bag of Words model is created which
contains the most frequent words from the
feature vector to improve the accuracy [1].</p>
      <p />
      <p>= [0,0,0,1,0,0,0 … … … … . .0]
= [0,0,0,0,2,0,0 … … … … … . .0]
= [0,0,0,0,0,5,0 … … … … … … 0]
(1)
(2)
(3)</p>
      <p>Other than Bag-of words model
Tokenization was also used for Bidirectional
Long Short-Term Memory, in which raw texts
are broken up into unique texts i.e., tokens.</p>
      <p>Each of the tokens has its unique token id’s. In
tokenisation, a vector is created with a size
equivalent to the number of unique words in the
corpora. A sequence of tokens is created and
they are represented as a vector as shown in
Eqn. 4 and Eqn. 5. As each of the tweets has a
different length so its token represented
sequence has also a different length which
makes it difficult to feed into the Deep Learning
algorithms as it requires sequences of the same
length [18] . To counter this problem, padding
and truncating steps come into account where
the length of the padded sequence is defined. If
the length of the tokenised sequence is larger
than the padded sequence then the tokens of the
sequence after the length of the tokenised
sequence would be truncated, i.e., they are
removed. If the length of the tokenised
sequence is smaller than the padded sequence
then the tokens of the sequence after the length
of the tokenized sequence would be padded
with “0”. If the length of the padded sequence
is chosen to be 6 then Eqn. 4 will be truncated
as shown in Eqn. 6 and Eqn. 5 will be padded
as shown in Eqn. 7.</p>
      <p>
        What consumes your mind controls your life = [
        <xref ref-type="bibr" rid="ref39">32,13,21,122,781,45,23</xref>
        ] (4)
What consumes your mind controls your life = [
        <xref ref-type="bibr" rid="ref39">32,13,21,122,781,45</xref>
        ]




 
 


= [53,321,32,48,44]
= [53,321,32,48,44,0]
(5)
(6)
(7)
3.4.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Classification Algorithms</title>
      <p>Classification algorithms are the most
important part of supervised learning in
machine learning. The classification algorithm
is used to indicate the class of the data. In this
paper, classification algorithms play a crucial
role in labeling the tweets positive or negative.
3.4.1. Bidirectional Long Short-Term
Memory (BiLSTM)</p>
      <p>A traditional neural network can’t remember
the previous inputs, for predicting the next
word previous information is a must. Recurrent
Neural Network (RNN) has the potential of
remembering everything from the past as they
have the loop and hidden layer in them. The
loops in RNN allows the network to persist
information. Recurrent neural network
translates the independent activations to
dependent activations by furnishing equal
biases and weights to complete layers, thus the
complexity of increasing the parameters is
reduced and the result of one layer is the input
to the following hidden layers [19]. Long
ShortTerm Memory (LSTM) is a special form of
Recurrent Neural Network (RNN) which has
the potential to learn long-term dependencies.
LSTMs are accomplished to abstain from the
long dependencies problem. In LSTM the
hidden layer of RNN is restored by the Long
Short-Term Memory cell. The LSTM memory
cell can be achieved by the Eqn. 8-12.</p>
      <p>Where  represents a logistic sigmoid
function, c, o, i and f represent cell vectors,
output, input and forget gate. These have the
same dimension as the hidden vector k [19].</p>
      <p>Bidirectional Long Short-Term Memory
(BiLSTM) is an extension of LSTM, which can
be designed by putting two independent LSTM.
The structure permits the neural network to
have both forward and backward information at
every time step. This will run the data in two
ways, one from future to past and one from past
to future so by this method the model will be
able to preserve information from both the
(11)
(12)
future and past. Fig. 4 shows the Bidirectional
LSTM [20].</p>
      <sec id="sec-7-1">
        <title>3.4.2. Logistic regression</title>
        <p>Logistic regression is an example of a linear
classifier that is used to classify the class of
data. Logistic regression determines the link
between
the
independent
and
dependent
variables by estimating probabilities [16]. It
returns the probability by transforming the
output with the help of the logistic sigmoid
function. Fig. 5 shows the linear regression
graph and its equation is given by Eqn. 13 as,
(13)
(14)
(15)
The equation of sigmoid function [22] is,
 =  0 +  1
 =</p>
        <p>1
1 +  −
Now, applying Eqn. 14 to Eqn. 13 and solving
for  to get Eqn. 15 i.e., logistic regression
equation
ln (</p>
        <p>1 −</p>
        <p>) =  0 +  1
The graph is now converted into a logistic
regression graph shown in Fig. 6.
3.4.3. Logistic Regression-Stochastic</p>
      </sec>
      <sec id="sec-7-2">
        <title>Gradient Descent Classifier</title>
        <sec id="sec-7-2-1">
          <title>Logistic</title>
        </sec>
        <sec id="sec-7-2-2">
          <title>Regression-Stochastic</title>
        </sec>
        <sec id="sec-7-2-3">
          <title>Gradient</title>
          <p>Descent (LR-SGD) is a type of linear model,
known as Incremental Gradient Descent [14].</p>
        </sec>
        <sec id="sec-7-2-4">
          <title>Logistic</title>
        </sec>
        <sec id="sec-7-2-5">
          <title>Regression-Stochastic</title>
        </sec>
        <sec id="sec-7-2-6">
          <title>Gradient</title>
          <p>Descent (LR-SGD) classifier is an effective
way to selective learning of linear classifiers
under different loss functions and penalties
such as Logistic Regression and Support Vector
Machines. The ‘log’ loss function is used to
optimize Logistic Regression while the ‘hinge’
loss function is used for optimizing the Support</p>
        </sec>
        <sec id="sec-7-2-7">
          <title>Vector</title>
        </sec>
        <sec id="sec-7-2-8">
          <title>Machine. LR-SGD Classifier has</title>
          <p>
            recently gained much significance in the field
of large-scale learning although it has been
around in the machine learning association for
a long time [
            <xref ref-type="bibr" rid="ref39">21</xref>
            ]. The sparse and large-scale
machine learning
problems,
which can be
encountered in sentiment analysis, often make
use of the LR-SGD classifier and this fact
motivated us to use the LR-SGD classifier in
our problem with 1.6M tweets [22]. One of the
strengths of the LR-SGD
classifier is the
hyperparameter tuning which can be used to
solve error functions also called the cost
function.
3.4.4. Support Vector Machine
The
          </p>
        </sec>
        <sec id="sec-7-2-9">
          <title>Support</title>
        </sec>
        <sec id="sec-7-2-10">
          <title>Vector</title>
        </sec>
        <sec id="sec-7-2-11">
          <title>Machine</title>
          <p>can
be
regarded as a linear model for regression and
classification tasks [23]. The Support Vector
Machine
finds
the
optimal
separable
hyperplane to separate the tweets into two parts
[24]. It is applied to noisy data. The hyperplane
line separates the tweets in a very efficient way
shown in Fig. 7. Support Vectors are the
locations which are quite close to the line from
both the classes. The distance between them is
often called a margin [25]. The Support Vector
Machine is easier to implement and scales well
for high dimensional data. It is implemented
with
kernels that transform
non-separable
problems into separable problems by adding
more dimensions to it. The most commonly
used kernel is the Radial Basis Function (RBF)
kernel. Mathematically, it can be defined by
Eqn. 16,
 ( ,   ) =  (−
∗
( −  2) )
(16)
Figure
7: SVM</p>
          <p>classifier graph showing
hyperplane
3.4.5. Naïve Bayes Classifier</p>
          <p>Naïve Bayes [26] is the most common
supervised</p>
          <p>machine learning technique for
classification. It is
also
known
as
the
probabilistic classification technique as it is
based on probability [27]. It is completely
dependent on the famous probability theorem
i.e.,</p>
        </sec>
        <sec id="sec-7-2-12">
          <title>Bayes’ theorem. Bayes’ theorem is</title>
          <p>correlated to conditional probability. It finds the
probability of an occurring event when the
probability of another occurred event is already
given [27]. Mathematically, it can be stated by
Eqn. 17,


 (
) =
 ( ) ( ⁄ )
 ( )
(17)
Where,  ( ) refers to

posterior i.e.
probability of M when N is given,  ( ⁄ )
represents likelihood i.e., probability of N when
M true, P(M) is the prior i.e., probability of M
and</p>
          <p>P(N)
represents
marginalization
i.e.
probability of N [28]. After implementing the
model in classifier the equation is [16], [29]
given by Eqn. 18 as,

= 
the</p>
        </sec>
        <sec id="sec-7-2-13">
          <title>Decision</title>
          <p>Tree
classifier
has
been
implemented [32]. It is robust, easy and simple
to implement and not sensitive to irrelevant
features [33]. Fig. 8 (a) shows how the dataset
was split into different categories using the
Decision Tree classifier and (b) demonstrates a
general Decision Tree.
feature space (b) Overview of a Decision Tree
3.4.7. Random Forest Classifier
The Random Forest classifier is a supervised
ML technique and a very popular classifier. Just
like the Decision Tree, it can also be
implemented on both classification and
regression models. It is an ensemble learning
method of classification that builds a set of
multiple decision trees from the training data
and outputs mode of class [34]. It is used in
applications like search engines, image
classification, etc. It constructs a decision tree
from each sample and gives the output. The best
solution is selected by voting. It is easier to
implement, fast and scalable but it easily
overfits the data [34]. Fig. 9 shows the complete
sketch of the Random Forest classifier.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4. Implementation and result</title>
      <p>The dataset was collected from Kaggle.
Implementation was done on Python and
NLTK was used for cleaning and training the
model. The various classifiers used are Logistic
Regression, Naïve Bayes, Support Vector
Machine, Random Forest, LR-SGD classifier,
Bidirectional Long Short-Term Memory and
Decision Tree. The dataset consists of 1.6
million out of which 1,280,000 were used for
training and 320,000 for testing [15].</p>
      <p>Evaluating the models is very important for
observing the performance and correctness of
the different models on the test data and finding
the best among them. The performance of a
classifier can be described by the confusion
matrix on a set of data for which true values are
known. With the help of the confusion matrix,
different evaluating metrics such as accuracy,
recall, precision, F1-score and AUC score have
been evaluated to validate and verify the quality
of the results [35], [36]. The confusion matrices
for various classifiers have been shown in Fig.
10, Fig. 11, Fig. 12 and Fig. 13. Table 2
compares the different classification models
based on these evaluating metrics. Fig. 14
graphically depicts the performance of the
different classifiers concerning the accuracy,
recall, precision, F1-score and AUC score.</p>
      <p>Accuracy: It is the percentage of tweets that
have been classified correctly by the model.
The accuracy of the model can be calculated
using Eqn. 19.
Precision: It is the ratio of actual positive tweets
to predicted positive tweets. The precision of
the model can be calculated using Eqn. 20.
+ 
+</p>
      <p>Recall: It is the ratio of predicted positive
tweets to total positive tweets. The recall of the
model can be calculated using Eqn. 21.</p>
      <p>F1-score: F1-score can be defined as the
harmonic mean of recall and precision. The
Fmeasure of the model can be calculated using
Eqn. 22.</p>
      <p>1 
= 2 ∗
 ∗ 
 +</p>
      <p>Where, TP is the True Positive, TN refers to
True Negative, FP is the False Positive, FN
means False Negative, P refers to Precision and</p>
      <p>R is the Recall.</p>
      <p>AUC score: AUC score can be calculated by
finding the area under the ROC curve [11]. The
AUC score of the model can be calculated using
Eqn. 23.
∗ 
+ 1)⁄2
(23)</p>
      <p>Where, SP is the Sum of positive
observations, PE refers to Positive Examples
and NO is the Negative Observations.</p>
      <p>The Receiver Operating Characteristic
curve (ROC) curve is a tool which predicts the
probabilistic value of binary outcome [37]. The
relationship between the sensitivity which is the
true positive rate and the specificity which is the
false positive rate is represented graphically by
the ROC curve. It is a significant metric as it
covers the whole spectrum between zero and
one. The true positive rate is exactly equal to
the false positive rate at 0.5, and this represents
a random or no skilled classifier [38]. The AUC
score can be calculated by finding the area
under the ROC curve. The ROC curves for
different classifiers have been plotted in Fig.
15.</p>
      <p>With the help of the confusion matrix of the
various classifiers showing the values of a true
negative, true positive, false negative and false
positive, we have calculated precision,
accuracy, F1-score, recall and roc-auc score as
shown in Table 2. In this paper, we have
compared various classifiers like Random
Forest, Logistic regression, Support Vector
Machine, Decision Tree, LR-SGDC and Naïve
Bayes with the state-of-the-art approach
BiLSTM. On observing the results of Table 2 it
had been found that the Bidirectional LSTM
was the best classifier with an accuracy of
78.90%, and decision tree came out as runner
up with an accuracy of 72.49%, followed by
Support Vector Machine and LR-SGDC
Classifier with an accuracy of 72.45% and
72.09% respectively. Random Forest and Naïve
Bayes also predicted well with an accuracy of
71.29% and 71.24%. it was also observed that
decision tree classifier didn’t came up to the
expectation with just an accuracy of 68.49%.
On examining carefully, it can be observed that
prediction of true positive class with respect to
predicted positive class i.e., precision score of
Bi-LSTM was also highest among all with a
precision score of 78.91%. LR-SGDC and
SVM classifier were the runner ups with a
precision score of 72.74% for each, followed by
Logistic Regression with 72.72% precision
score. Naïve Bayes and Random Forest
classifier also predicted the positive class well
with a precision score of 71.33% and 71.31%
respectively. The precision score of decision
tree classifier was least with a score of 68.5%.
Prediction of true positive class with respect to
actual positive class i.e., recall score of
BiLSTM was best with score of 78.89%, with
Logistic Regression as the runner up with sore
of 72.49% followed by SVM, LR-SGDC,
Random Forest and Naïve Bayes with score of
72.44%, 72.08%, 71.29% and 71.24%
respectively. Even here, Decision Tree was not
as good with precision score of 68.49%. The
F1-score and AUC score of Bi-LSTM was best
of among all the classifiers. All these results of
various classifiers can be visualized graphically
as shown in the Fig. 14. Fig. 15 depicts the ROC
curve of all the classifiers implemented in our
experiments, which also shows that Bi-LSTM
is the best classifier. The model can also be very
useful for analyzing the tweets related to
medical data [39], [40],[41], [42], [43], [44].</p>
    </sec>
    <sec id="sec-9">
      <title>5. Conclusion</title>
      <p>There are various methods of machine
learning, symbolic and deep learning for the
analysis of the tweets or reviews. But machine
learning techniques are most common, efficient
and simpler than others. In this paper, machine
learning techniques were used for the analysis
of tweets on a Twitter dataset. The tweets were
cleaned in the preprocessing step by removing
the stopwords, URL, numbers and various
Twitter-specific features with the help of
NLTK. To deal with the miss-spelling and
noninformative words, feature extraction was done
and a Bag of Words model was created with the
most frequent words. The tweets were, then,
classified into positive and negative by various
classifiers like LR-SGD Classifier, Naïve
Bayes, Random Forest, Logistic Regression,
SVM, Bidirectional LSTM and Decision Tree.
By observing the ROC curve and accuracy
score, it was clear that Bidirectional LSTM is
the best classifier with an accuracy of 78.90%.
Hence, it was found that Bidirectional LSTM is
very useful in finding sentiment analysis.</p>
      <p>The model can be implemented in a website
or Android applications for classifying the
sentiments of people on different subjects. As
the microblogging sites are blooming,
sentiment analysis is very important for many
organizations in implicating social intelligence
and social media analytics.</p>
      <p>The future of this research paper is to
explore the data on a wider genre of different
social networking sites and e-commencing sites
where people do online shopping for many
things like books, games, etc. Accuracy rates of
these products can be found by sentiment
analysis. It can also be implemented to build the
human confidence model.</p>
    </sec>
    <sec id="sec-10">
      <title>6. Conflict of interest</title>
      <sec id="sec-10-1">
        <title>There is no conflict of interest.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>7. Acknowledgement</title>
      <p>I would like to express my heartiest gratitude to
all the co-authors and special thanks to Prof.
Mahendra Kumar Gourisaria and Mr.
Harshvardhan GM who have been a constant
source of knowledge, inspiration and support. I
would equally thank my parents and friends
who inspired me to remain focused and helped
me to complete this research paper.
8. References
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [9] [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jianqiang</surname>
          </string-name>
          , G. Xiaolin, and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xuejun</surname>
          </string-name>
          , “
          <article-title>Deep Convolution Neural Networks for Twitter Sentiment Analysis</article-title>
          ,
          <source>” IEEE Access</source>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>23253</fpage>
          -
          <lpage>23260</lpage>
          ,
          <year>2018</year>
          , doi: 10.1109/ACCESS.
          <year>2017</year>
          .
          <volume>2776930</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Moore</surname>
          </string-name>
          , “
          <article-title>Twitter sentiment analysis: The good the bad and the omg!</article-title>
          .
          <source>In Fifth International AAAI conference on weblogs and social media,” in Proceedings of the Fifth International AAAI Cinference on Weblogs and Social Media</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>538</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abbasi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , “
          <string-name>
            <surname>Twitter Sentiment Analysis: A Bootstrap Ensemble</surname>
            <given-names>Framework</given-names>
          </string-name>
          ,” in 2013 International Conference on Social Computing, Sep.
          <year>2013</year>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>364</lpage>
          , doi: 10.1109/SocialCom.
          <year>2013</year>
          .
          <volume>56</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Soc. Inf. Sci. Technol.</surname>
          </string-name>
          , vol.
          <volume>63</volume>
          , no.
          <issue>1</issue>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          163-
          <fpage>173</fpage>
          ,
          <year>2012</year>
          , doi: 10.1002/asi.21662.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Goel</surname>
          </string-name>
          , “
          <article-title>Stock prediction using twitter sentiment analysis</article-title>
          ,”
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jianqiang</surname>
          </string-name>
          and G. Xiaolin, “
          <article-title>Comparison research on text preprocessing methods on twitter sentiment analysis</article-title>
          ,
          <source>” IEEE Access</source>
          , vol.
          <volume>5</volume>
          , no. c, pp.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          2870-
          <fpage>2879</fpage>
          ,
          <year>2017</year>
          , doi: 10.1109/ACCESS.
          <year>2017</year>
          .
          <volume>2672677</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , “
          <article-title>Opinion mining and sentiment analysis</article-title>
          ,
          <source>” Found. Trends Inf. Retr.</source>
          , vol.
          <volume>2</volume>
          , no.
          <issue>1-2</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          ,
          <year>2008</year>
          , doi: 10.1561/1500000011.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>McKeown</surname>
          </string-name>
          , “
          <article-title>Predicting the semantic orientation of adjectives,” in Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics -</article-title>
          ,
          <year>1997</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          174-
          <fpage>181</fpage>
          , doi: 10.3115/979617.979640.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          , “
          <article-title>Ensemble of feature sets and classification algorithms for sentiment classification,” Inf</article-title>
          . Sci. (Ny)., vol.
          <volume>181</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1138</fpage>
          -
          <lpage>1152</lpage>
          ,
          <year>2011</year>
          , doi: 10.1016/j.ins.
          <year>2010</year>
          .
          <volume>11</volume>
          .023.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Technol.</surname>
          </string-name>
          , vol.
          <volume>6</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>2344</fpage>
          -
          <lpage>2350</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Bar</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <article-title>“A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle,”</article-title>
          <source>Proc. 50th Annu. Meet. Assoc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Comput. Linguist.</surname>
          </string-name>
          , no.
          <source>July</source>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2012</year>
          , doi: 10.1145/1935826.1935854.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Neethu</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rajasree</surname>
          </string-name>
          , “
          <article-title>Sentiment analysis in twitter using machine learning techniques</article-title>
          <source>,” 2013 4th Int. Conf. Comput. Commun. Netw.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Technol. ICCCNT</surname>
          </string-name>
          <year>2013</year>
          ,
          <year>2013</year>
          , doi: 10.1109/ICCCNT.
          <year>2013</year>
          .
          <volume>6726818</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Rep.</surname>
          </string-name>
          , vol.
          <source>WS-11-05</source>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>49</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Res. J. Sci. Eng. Technol.</surname>
          </string-name>
          , vol.
          <volume>4</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          Kaggle.com, “
          <article-title>Sentiment140 dataset with 1.6 million tweets</article-title>
          ,”
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>[Online]. Available: https://www.kaggle.com/kazanova/sent iment140.</mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Technol.</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>538</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Wil</surname>
          </string-name>
          , “
          <article-title>How many words are in the English language</article-title>
          ?,” English Live,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Jin</surname>
          </string-name>
          , “
          <article-title>Training word embeddings for deep learning in biomedical text mining tasks</article-title>
          ,
          <source>” Proc. - 2015 IEEE Int. Conf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Bioinforma. Biomed. BIBM</surname>
          </string-name>
          <year>2015</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          625-
          <fpage>628</fpage>
          ,
          <year>2015</year>
          , doi: 10.1109/BIBM.
          <year>2015</year>
          .
          <volume>7359756</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          , “
          <article-title>Bidirectional LSTM-CRF Models for Sequence Tagging</article-title>
          ,”
          <year>2015</year>
          , [Online].
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          Available: http://arxiv.org/abs/1508.
          <year>01991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          , [
          <volume>21</volume>
          ] [22] [23] [24] [25] [26] [27] [28] [29] “
          <article-title>Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks</article-title>
          , vol.
          <volume>18</volume>
          , no.
          <issue>5-6</issue>
          , pp.
          <fpage>602</fpage>
          -
          <lpage>610</lpage>
          ,
          <year>2005</year>
          , doi: 10.1016/j.neunet.
          <year>2005</year>
          .
          <volume>06</volume>
          .042.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Shalev-Shwartz</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ben-David</surname>
          </string-name>
          ,
          <article-title>Understanding machine learning: From theory to algorithms</article-title>
          . Cambridge University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Nagahamulla</surname>
          </string-name>
          , “
          <article-title>Offline handwritten signature verification system using random forest classifier,” in 17th International Conference on Advances in ICT for Emerging Regions</article-title>
          ,
          <source>ICTer 2017 - Proceedings</source>
          ,
          <year>2017</year>
          , vol.
          <source>2018- Janua</source>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>196</lpage>
          , doi: 10.1109/ICTER.
          <year>2017</year>
          .
          <volume>8257828</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valdez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gwinn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Khoury</surname>
          </string-name>
          , “
          <article-title>Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes,” BMC Med</article-title>
          . Inform. Decis. Mak., vol.
          <volume>10</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2010</year>
          , doi: 10.1186/
          <fpage>1472</fpage>
          -6947-10-16.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Bus.</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2019</year>
          , doi: 10.5815/ijieeb.
          <year>2019</year>
          .
          <volume>06</volume>
          .02.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Sci. Inf. Technol.</surname>
          </string-name>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>88</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Gourisaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pandey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rautaray</surname>
          </string-name>
          , “
          <article-title>Comparative Analysis of Heart Disease Classification Algorithms Using Big Data Analytical Tool</article-title>
          ,”
          <year>2020</year>
          , pp.
          <fpage>582</fpage>
          -
          <lpage>588</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Informatics</surname>
          </string-name>
          , vol.
          <volume>4</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>25</lpage>
          ,
          <year>2015</year>
          , doi: 10.5121/ijci.
          <year>2015</year>
          .
          <volume>4402</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          https://en.wikipedia.org/wiki/Bayes'_th eorem,
          <source>Last accessed</source>
          <year>2020</year>
          /8/28.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Sci. Rev.</surname>
          </string-name>
          , vol.
          <volume>38</volume>
          ,
          <issue>100285</issue>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2020</year>
          , doi: 10.1016/j.cosrev.
          <year>2020</year>
          .
          <volume>100285</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          21, no.
          <issue>3</issue>
          , pp.
          <fpage>660</fpage>
          -
          <lpage>674</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Dattatreya</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Kanal</surname>
          </string-name>
          , “
          <article-title>Decision Trees in Pattern Recognition,” in Progress in pattern recognition 2</article-title>
          ,
          <year>1985</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Gourisaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pandey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rautaray</surname>
          </string-name>
          , “
          <article-title>Prediction of Heart Disease by Mining Frequent Items</article-title>
          and Classification Techniques,” in
          <source>2019 International Conference on Intelligent Computing and Control Systems (ICCS)</source>
          ,
          <source>May</source>
          <year>2019</year>
          , pp.
          <fpage>607</fpage>
          -
          <lpage>611</lpage>
          , doi: 10.1109/ICCS45141.
          <year>2019</year>
          .
          <volume>9065805</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <article-title>Wikipedia contributors. Decision tree learning</article-title>
          .
          <source>Wikipedia</source>
          , The Free Encyclopedia,https://en.wikipedia.org/ wiki/Decision_tree_learning,
          <source>Last accessed</source>
          <year>2020</year>
          /8/30.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>Sci. Inf. Technol.</surname>
          </string-name>
          , vol.
          <volume>5</volume>
          , no.
          <issue>5</issue>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Giachanou</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , “Like It or Not,
          <source>” ACM Comput. Surv.</source>
          , vol.
          <volume>49</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          , Nov.
          <year>2016</year>
          , doi: 10.1145/2938640.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Gautam</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yadav</surname>
          </string-name>
          , “
          <article-title>Sentiment analysis of twitter data using machine learning approaches and semantic analysis</article-title>
          ,
          <source>” in 2014 7th International Conference on Contemporary Computing, IC3</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>437</fpage>
          -
          <lpage>442</lpage>
          , doi: 10.1109/IC3.
          <year>2014</year>
          .
          <volume>6897213</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Jason</surname>
          </string-name>
          , “
          <article-title>How to Use ROC Curves and Precision-Recall Curves for Classification in Python,” Machine Learning Mastery</article-title>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <surname>Anal. Min.</surname>
          </string-name>
          , vol.
          <volume>10</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2020</year>
          , doi: 10.1007/s13278-020-00658-3.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Gourisaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rautray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pandey</surname>
          </string-name>
          , “
          <article-title>Segmentation of Nuclei in Microscopy Images Across Varied Experimental Systems</article-title>
          ,”
          <year>2021</year>
          , [
          <volume>40</volume>
          ] [41] [42] pp.
          <fpage>87</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Rautaray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pandey</surname>
          </string-name>
          , “
          <article-title>A Model for Prediction of Paddy Crop Disease Using CNN</article-title>
          ,”
          <year>2020</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>Emerg. Technol.</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>699</fpage>
          -
          <lpage>704</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <surname>Technol.</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>731</fpage>
          -
          <lpage>737</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <surname>Rautray</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pandey</surname>
            , and
            <given-names>S. S.</given-names>
          </string-name>
          <string-name>
            <surname>Patra</surname>
          </string-name>
          , “
          <article-title>ECG Classification using Deep Convolutional Neural Networks and Data Analysis,”</article-title>
          <string-name>
            <given-names>Int. J.</given-names>
            <surname>Adv</surname>
          </string-name>
          .
          <source>Trends Comput. Sci. Eng</source>
          .,
          <source>no. 9</source>
          , pp.
          <fpage>5788</fpage>
          -
          <lpage>5795</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>Gourisaria</surname>
          </string-name>
          , “
          <article-title>Juxtaposing inference capabilities of deep neural models over posteroanterior chest radiographs facilitating COVID-19 detection</article-title>
          ,”
          <source>J. of Interdisciplinary Mathematics</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>2021</year>
          , doi: 10.1080/09720502.
          <year>2020</year>
          .1838061
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>