=Paper= {{Paper |id=Vol-3142/PAPER_04 |storemode=property |title=Review and Analysis of Emotion Detection from Tweets using Twitter Datasets |pdfUrl=https://ceur-ws.org/Vol-3142/PAPER_04.pdf |volume=Vol-3142 |authors=Vikas Maheshkar,Sachin Kumar Sarin |dblpUrl=https://dblp.org/rec/conf/wac3/MaheshkarS22 }} ==Review and Analysis of Emotion Detection from Tweets using Twitter Datasets== https://ceur-ws.org/Vol-3142/PAPER_04.pdf
Review and Analysis of Emotion Detection from Tweets using
Twitter Datasets
Vikas Maheshkar a, Sachin Kumar Sarin a
a
    Netaji Subhas University of Technology, Sec-3 Dwarka, Delhi, India-110078


                 Abstract
                 Communication is a necessary part of our day-to-day lives, but understanding personal
                 communication with emotion is not that easy. With the rapid growth in the field of semantic
                 analysis and to find the sentiments in the text is quite a challenging job for the researchers.
                 Detecting emotions in the sentiment analysis area is one of the most important applications and
                 also serves as an advantage in the digital medium for efficient computing. In the current
                 scenario, sentimental analysis or opinion mining of the twitter emotion detection data-set has
                 derived much attention since the past 10 years. In this paper, Comparative study and Analysis
                 of Emotion Detection from Tweets using Twitter Dataset has been taken into considerations
                 for analysis purpose.

                 Keywords 1
                 Sentimental analysis, Emotion’s detection, Natural Language Processing, Emotions Lexicon

1. Introduction
   A language is a well-known tool for communicating and conveying information as well as
transmitting emotions. In the current scenario, emotional identification is currently being studied
extensively in psychiatry, psychology, cognitive sciences, computer sciences, and computational
sciences, and several collaborative online diaries, journals, and individual blogs have been integrated
into our daily lives, which helps meet critical social-interaction needs. Numerous social media sites
have enabled the exchange of opinions among users all over the world that has promoted the use of
popular social network site such as twitter, for communication. The users’ tweets are highly
unstructured, heterogeneous, and vulgar, and they cover a wide range of topics. So, to overcome that,
researchers have extracted the data in the form of emotion Analysis, which is the process of analyzing
or exploring tweets in order to enhance or add assistance to both primary and secondary communities.
The researchers’ aim is to improve users’ sentiment codification techniques using these tweets so that
they can predict implied attitudes in written text. From a structured input text, most common methods
detect a unique sentiment or attitude [7]. This study looks at the issue of detecting multiple emotions
from slang unstructured tweets data. This paper analysis uses Twitter and a case study to present a
hybrid method for multiple emotion classification and Binary validity and Pattern Recognition
techniques are used to observed these emotion classification models. The Binary significance technique
uses four sentimental analysis Method: Naive Bayes (NB), Support Vector Machine classifier (SVM),
and K Nearest Neighbour (KNN).

   Users express their thoughts and feelings in a variety of ways on today’s social networks, including
Twitter, Instagram, Facebook, and many others, where millions of customers give reviews to share
feelings, thoughts, and emotions on or around a specific topic in their daily lives. This provided an
excellent opportunity for the researchers to examine the feelings of social networking client’ behaviors.


WAC-2022: Workshop on Applied Computing, January 27 – 28, 2022, Chennai, India.
EMAIL: vikas.maheshkar@nsut.ac.in (Vikas Maheshkar)
ORCID: 0000-0001-9660-7517 (Vikas Maheshkar)
            © 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                  56
These massive amounts of data produced by social networks contain people’s daily thoughts, beliefs,
and emotions and various emotional analytical studies have been conducted on social media platforms
over the years. Since people have such a diverse range of opinions, so determining the unique sentiment
from social data can be difficult, therefore this emphasizes the importance of addressing these issues,
and it opens up several avenues for upcoming analysis into the secret detection of user sentiment in
general, or user emotions related to a particular subject. We investigated a Twitter dataset for emotion
analysis and sentiment classification in this research paper. We analyzed an emotion network focused
on user-posted texts by detecting emotions and feelings from tweets and their replies [27]. We identified
prominent customers for both good and bad sentiments using the sentiment examination. Following
that, we investigated how powerful individuals in an emotion network led to overall network shifts in
emotion. Finally, we analyze previous recommendations techniques to compute a trusted network based
on emotional likeness and impact. We observed text from reviews and feelings on specific recent topics
to build our recast opinions since there were no current Twitter datasets that included both tweets and
their replies. We observed both textual and consumer data. We performed analysis of our text of the
previous based on their feelings and emotions. To identify powerful users of sentiment and feeling
networks, characteristics that were based on text combined with some specific specification. To build
clusters, users are grouped together based on their feelings, Finally, the classification model used user
influence ratings to provide users with customized and generic recommendations. For a long time,
researchers have used the Twitter network for various measurements and analyses [4]. Different
researchers have experimented with emotions and feelings in tweets, influential user identification
(using retweets, links, favorites, and other methods), user effect and recommendation generation (based
on Twitter tweets). We analyze, in this article, we’ve come up with a few new concepts and mixed them
with some old ones. The inclusion of tweet answers and reply-based criteria is the paper’s main
innovation. We surveyed the sentiment and feeling conveyed in comments, as well as the accuracy (i.e.,
if the response coincided in conjunction with initial message or not), value obtained (i.e., if the reply
emotion matched the initial twitter message feeling or not), and feeling score (i.e., that whether response
sentiment met the original tweet emotion or not) (that is, whether they respond emotion matched the
initial tweet sentiment or otherwise). The re- searchers combined with some existing features in the
model, to measure user impact scores, which were then propagated to recommendation generation. We
will review and analyze previous work in this field, determine the research scope, comprehend the
mechanism, and model used, and at last, analyze the model that will assist us in detecting a feeling
conveyed through twitter messages [6]. We’ll be working with the AIT-2018 dataset and some datasets
too, and our approach is divided into several stages.

2. Sentiment Analysis
    Inference extraction or assumption investigation is the area of focus in web mining that comprehends
people’s opinions, opposing a key area, about any occasion, and so on. It generates a massive problem
area. There are also numerous names and tasks, such as concept inquiry, argument gathering, emotion
quarrying, hypothesis prospecting, impact examination, objectivity investigation, questionnaire
extraction, and etc. To identify powerful users of sentiment and feeling networks, characteristics that
were based on text combined with some specific specification. To build clusters, users are grouped
together based on their feelings, Finally, the classification model used user influence ratings to provide
users with customized and generic recommendations. Twitter serves as a tenacious backup repository
with a vast amount of data that can be used for conclusion analysis [22]. In view a great number of
texts, which are often freely available, and the ease with which they may be obtained when compared
to scraping websites from the internet, Twitter is quite useful for research. Using the Twitter API, data
from Twitter is gathered for analysis. Machine Learning and Dictionary Based Approaches are two
commonly used methodologies for the same. For deconstructing the concepts of documents supplied
by multiple clients, we use a dictionary-based methodology. The material is then organized in its most
extreme form. For example, following exams, Tweets are categorized into three groups: good, terrible,
and impartial.

3. Literature Review
                                                    57
   The data is mined using a variety of text mining tech- niques. Prabhsimran Singh, Ravindra Singh,
and Karanjeeet Singh Kalhon [4, 10] investigated the government policy of demonetization from the
perspective of common people, using a sentiment analysis method and Twitter data to collect Tweets
using a specific hashtag (demonetization). Geo-location-based analysis (group wise sentiment messages
are gathered). The meaning cloud emotion research API categorized into cheerful, sad, depressing,
enthusiastic, impartial, and no information are the six categories.

    Yuan and Huang [5] the issue was resolved problem of sentiment classification of polarity, which is
a single of fundamental issues in emotion analysis. This analysis makes use of data from twitter dataset
and online product reviews. This paper looks into sentence-level categorization as well as review- level
categorization. This research and analysis make use of the Scikit-learn programme. Scikit-learn is a
Python-based accessible software library. These classification techniques were chosen for
categorization: Nave Bayesian, Random Forest, and SVM. Geetika Gautam and Divakar Yadav [7, 22]
both contribute to the sentiment analysis for the classification of customer reviews. This task makes use
of Twitter data that has already been labelled. In this paper, they used three supervised techniques to
measure similarity: nave-Bayes, Max-entropy, and SVM, accompanied based on emotional analysis,
that was employed in conjunction use all three approaches. They trained and classified the following
models using Python and NLTK: naive- Bayes, Max-entropy, and SVM. The Naive-Byes approach
outperforms the Max-entropy approach, and SVM with the model in unigrams outperforms SVM alone.
Semantic analysis is used when the WordNet after the preceding is employed. The accuracy of the
process improves. In this paper [19, 20], Yang use a Machine Learning approach to analyze Twitter
data related to electronic goods. They created for a new function Vector categorizing messages and
determining people’s opinions on electronic items. As a result, Feature-Vector is made up of eight
related functions, Special code word, emoticon, and count of defeatist reviews, the total amount of
unfavorable keywords and the total positive comments keywords; emoji, and frequency of negative
key- words; existence of argument, pos tag, and positive comments tags; count of pessimistic hashtags;
and emblem of productive hashtags are the eight features that are used, MATLAB and built in functions
are used to enforce the Naive-Bayes and SVM classifiers[20, 23]. The Maximum-Entropy program is
used to enforce the Max-Entropy classifier. The output of all of the used classifiers is nearly identical.
In this paper [25], Robinson suggested a more accurate model of sentiment analysis of Twitter data
regarding upcoming Hollywood and Bollywood films [10]. We are correctly classifying these tweets
with the aid of classifiers and Feature-Vectors such as SVM and Naive-Bayes. For each tweet’s
sentiment [20, 21]. The precision of Naive-Bayes is higher than that of SVM, but the accuracy and
recall are slightly lower. SVM outperforms Naive Bayes in terms of precision. The Feature- Vector
performs better than the chosen classifier in terms of sentiment analysis. If the number of people using
the internet grows, the accuracy of classification may improve. The authors of [13] built a a collection
of Tweets messages annotated it using a corpus annotation study. For the learning model, SVM kernels
with several classes were employed. Unigrams, Bigrams, Personal, pronouns, and adjectives are among
the features available. Word-net Sentimental affect and dependency–parsing functions, as well as the
Word-net Affect emotion lexicon. To build a dataset, the authors in [5] first downloaded tweets from
Twitter [15]. Then they get a model with expanded features based on the goal. They used Nave Bayes
(NB), Support Vector Machine (SVM), Maximum Entropy (MaxEn), and Artificial Neural Networks
to train four different supervised classifiers (ANN). The highest precision is obtained by combining
SVM with Principal Component Analysis (PCA). The training dataset was first preprocessed and data
similarity measures were taken by the authors of [14]. All of the emotion-labeled corpus is then
clustered using semantic similarity. The authors used the SVM learning algorithm to train an emotion
classifier after representing, during the training phase, each word is used as a feature tensor/vector. The
first set of data is separated, and then features are extracted using the Porter stemming technique. The
Unigram, Bigram, and Trigram features were used by the writers. The Weighted Log-likelihood Score
technique is used to rank N- grams in relation to each Sentiment, as a result of which there is a feature
extraction table. In their procedure, the authors employed, as a classifier, Multinomial Naive Bayes is
used as a process that uses the highest-scoring n-grams and checks accuracy using several feature
vectors. The author of [24] demonstrated a composite model for emotion recognition and analysis. This
model incorporates features such as lexical keyword spotting, CRF-based emotion detection using NB,
MaxEn, and SVM, and more. The authors of [16] employed a Hidden Markov Model to assess the

                                                    58
emotional tone of the text. They viewed each sentence as a collection of short ideas, with every thought
representing an happening that could result in a state shift. The writers of [2] attempted to identify
statements on social media about a particular crisis. They chose rage as an example since this approach
can be used with a variety of emotions. They received 1192 replies to a brief poll asking participants to
share their thoughts on a piece of information via social media. They achieve a 90 percent accuracy in
classifying rage in their dataset using this as a training collection. They chose their features based on
logistic regression coefficients and used random forest as their key classifier [8].

4. General Strategy for Sentiment Analysis
a) Preparation of a data Model:
    Select a required dataset which includes all the necessary feature emotions for extraction that helps
in the sentimental analysis.

b) Data Preprocessing:
   Pre-processing a Tweet database requires removing all superfluous data, which including emoticons,
special symbols, and blank spaces.

c) Vectorization:
   Map words or phrases from vocabulary to a corresponding vector of real numbers which used to
find word predictions, word similarities/semantics.

d) Model Preparation:

   1.   Select a model type.
   2.   Choose the classification approach you want to use.
   3.   Transfer the information from your Twitter handle.
   4.   Use dataset for training your algorithm by tagging it.
   5.   Train the Classifier to test and validate.

e) Visualization:
    Visualization is very important step after your algorithm runs because it shows the result in a proper
and better way. In the area of sentimental analysis there are many visualization tools that helps to
structure your data in a better way. some tools are Talkwalker, HubSpot are used for visualization of
emotion analysis.

5. Dataset
a) AIT-2018 Dataset
   In the dataset (AIT-2018 Dataset), [26] the researchers used the SemEval-2018 Affect in Tweets
Distant Supervision Corpus. These tweets were pulled from Twitter using the Twitter API and contained
emotion-related words like’irate,’pique,’panic,’cheerful,’fondness,’amaze,’surprised,’ The researchers
used the following technique to construct an informative data of users comments affluent in a specific
emotion. The researchers selected 50 to 100 phrases that had been associated among each sentiment X
at distinct levels of energy. For example, words like mad, upset, bothered, anger, irritated, unhappy,
rage, animus, and so on were used. This dataset contains four emotion classes: rage, fear, joy, and
sadness. Anger and disgust have been described as frustration, while joy and sorrow have been
represented as joy. The dataset for the challenge was broken down into three languages: English, Arabic,
and Spanish. In each language, there are five sub-task datasets. Only the EI-oc information is used. In
which each review has an emotion associated with it, as well as the intensity of that tweet [3]. Customers
can send immediate messages known as pinch messages using Twitter, a blog and social platform


                                                   59
service. Tweets are 140-character messages. People employ initialism, forge precise omission, use
winkey, and other characters that express definite explication because of the nature of this
microblogging site (rapid and short messages). From a commercial source, they obtained 11,875
manually annotated Twitter data (tweets), They’ve made some of their information public. They
gathered the information by preserving the live broadcast. During the streaming process, there were no
restrictions on language, region, or anything else. In fact, the majority of the tweets in their database
are in other languages. Before the annotation process, they employ Google translate to turn it to English.
Each tweet is given a good, bad, indifferent, or rubbish grade by a human annotator. The term” junk”
denotes that the tweet is incomprehensible to a human annotator as positive, negative, neutral or junk.
Many of the tweets classified as” trash” were not correctly translated using Google translate, according
to a careful assessment of a random sample of them. They observed tweets with a rubbish categorization
for testing purposes. As a result, researchers observed an unbalanced sample of 8,753 tweets each from
classes positive, negative and neutral).

b) Emotic
    In our daily lives, it is critical to recognize people’s emotions based on their frame of reference. This
ability allows us to anticipate or forecast people’s forthcoming activities, engage with them
successfully, and be sympathetic and sensitive to them. As a result, in order to engage with humans
correctly, a machine should have a similar capability of comprehending people’s feelings. The
examination of facial expressions is the focus of current emotion recognition research. Recognizing
emotions, on the other hand, necessitates an awareness of the context in which a person is enmeshed.
Sentiment analysis in contextual research has been problematic due to a lack of suf- ficient data to
examine such a topic. As a result, the EMOTIC database [28] (from EMOTions in Context), which
seems to be a collection of images of individuals in natural settings captioned with their obvious
emotions is used. EMOTIC, or EMOTIon Recognition in Context, is a methodology for recognition of
emotions in perspective collection of photographs of actual humans and circumstances whose apparent
sentiments have been captured. It uses a long list of 26 emotional expres- sions to tag the photos, and it
blends these observations with following three ongoing components: Valence, Arousal, and
Dominance. The Amazon Mechanical Turk (AMT) platform is used to categorize images in the dataset.
The result is adatabase of 18,313 photos with 23,788 individuals captioned. This database may also
facilitate the creation of systems capable of recognising detailed information about people’s actual
feelings and emotions. The EMOTIC dataset is a collection of photographs featuring individuals in
actuality world locations that are labelled with their visible sentiments. It is entitled after EMOTions in
Context. There are 23,571 photos in the collection and 34,320 people who’ve been categorized. Several
of the images were actually handmade from the online platform utilizing Google’s web browser.

TABLE 1: Detail Initialisation of Twitter Reviews
     observation                                          Raw Tweet
     Actual Tweet                                         @Satisfying @TheAnimalVines I used my sense of
                                                          taste
                                                          to make Energy balls that is made with peanut butter
                                                          regular basis.
     Filtered Tweets                                      The peanut butter energy balls were something
                                                          I used to make.
                                                          My family had a great time all the time. My kitties, by
                                                          the way,Continue to adore them. delicious joy joy dishes


6. Comparative Analysis
a.      Data Collection
   Twitter is currently one among the most popular successful platforms for social networking. People
share their thoughts on various social, national, and international topics, as well as their everyday lives.
They express themselves in 140- character bursts and, on occasion, audio and video files. Tweets are


                                                     60
public postings. Posts can be liked, commented on, and retweeted by other users. On Twitter, users can
follow or friend one another. Unlike all the other social networking sites, Twitter [21, 23], permits at one
connection, that signifies that a single participant could join someone else without the latter responding.
These experiences form a communication network. The database we observed during our studies is made
up of a list of twitter posts, remarks, and other information, and retweets, as well as the user information
associated with them. Several text datasets for sentiment and text sentiment analysis were used in related
works, including ‘Emotion in Text data set [17]’, ‘ISEAR [1]’, ‘SemEval [19]’, ‘EmoBank [20]’, ‘TREC
[11]’, and so on. However, since most current databases have only friend/follower or tweets links, we
the users were unable to use them for our research. We analyzed an affective network based on the
substance of the individuals, not about who is following whom, for our analysis. In addition, we
observed the responses to those messages, as well as information about the people who replied and
reviewers. We analyze that the users are connected depending on their interests and emotion on a
particular issue for our emotion network. We investigated a few current events and problems to gather
tweets with different emotions for our survey research, #Australia, #obama, #movie, #Diwali2017,
#SummerBreak, #WinterBreak, #RoseDay2018, #intimidation, #WorldCup2018, #MensDay,
#Awards2018 were the top search terms. We surveyed that the dataset was generated in a few simple
steps: (i) gathering random reviews on a Identification, (ii) gathering user data (Customer Sno, location,
sex, count of posts, count of followers, count of followees, count of likes), (iii) accumulating respond
on each post, (iv) accumulating commenters’ user details, (v) collecting details on the tweet, (vi)
accumulate retweeters’ customer knowledge Both of these measures were carried out again for each
keyword [18]. While performing data collection by the researchers from Twitter, we observed a few
issues. There are some of them: Some tweets had photographs and videos but didn’t have much text. (ii)
Even though tweets were written in English, many people left comments in other languages. (iii) Many
comments were devoid of text, instead of sharing images or videos. (iv) In some situations, a tweeter
responded to commentators with a large number of comments. Some people responded to each comment
on their post, resulting in their tweets receiving twice the number of responses. (v) Some tweets
received messages from accounts belonging to news organizations or business people, rather than from
individuals. Those were essentially commercials for some kind of information. For illustration, in the
#WomensDaytweets tweets, there were a few advertisements from news organizations working for
gender equality, a few advertisements from business accounts promoting their cosmetics, and so forth.

   (vi) The majority of consumers don’t disclose their position. (vii) Although there were thousands of
comments from some customers, but none of them were very noteworthy. They simply reply to other
people’s messages. (viii) Some few constructive and positive little details other than a quick mention of a
few accounts. (ix) A few replies simply repeated the original tweet’s random term. (e) A few
communications and responses merely stated facts without expressing any emotions or sentiments.

    (xi) A few responses were solely of smileys with no other information. (xii) a number of responses
responded by asking non-emotional questions. (xii) If you’re seeking for a creative outlet, some of the
responses were utterly unexpected and out of context. For data collection, the pair of different kind of
reviews, API and real facts extraction. Table 1 lists the characteristic of information, while Table 2 lists the
properties of a customer data that can be uproot with the review’s facts. There are some form and framework
photos obtainable that aren’t included in the tables. Because of the Twitter API rate limit, only 15 API calls
are permitted each and every quarter-hour, limiting the group of information accumulate. From February 25
to March 8, 2018, we investigated 7246 tweets and answers. We investigated the information for 3607 users
based on the tweets and answers. The dataset had minimal data since we analyzed each message and respond
just as to feel, sentiment, and accuracy score. The text was tagged with agreement values of ‘Agreed,’
‘Disagreed,’ and ‘Random,’ depending on whether the reply text agreed with the initial tweet or not. The
text was tag along with appropriate sentiments such as” Positive,”” Negative,” and” Neutral,” as well as real
emotions such as” Anger,”” Disgust,”” Fear,”” Joy,”” Sadness,”” Surprise,” and” Neutral.” The lack of
proper data distribution among all sentiments and emotions was caused by the truancy of some further
respond data, user opinion, feeling, and acceptance reason by the columnist by scrutiny and crafted
annotating. By analyzing a different solution for the Twitter reviews and opinion lattice, we analyzed to take
the early tread toward a customize community lattice advocate in this paper [21].


                                                      61
b.    Pre-processing
    The process of constructing the dataset is depicted in Table 2. To begin pre-processing, we analyze and
does research on a Web browser to locate original tweets on a particular subject. While surveying the tweets,
we observed them on the basis of particular attributes and feature class and only investigated authentic
reviews (not retweets) on the subject. The first analysis phase yielded a specific review ID and review text,
which were then used by a second analysis phase to determine and count of likes, retweets, and review
period. There are no clear functions for collecting tweet responses. As a result, we began surveying tweet
answers using a different type of analysis. The latter examined the review given to the user being the time
of the review, using the tweet and user ID. It only saved texts that were valid for the criterion” in response
to position id.” Since no direct feature was used, analyzing tweet answers took the majority of the time. To
speed up the process of collecting tweet answers, the researchers used a Web page scraper that scraped the
tweet messages for responses. The researchers needed to collect user data as well for experiments. As a
result, the researchers used a Web page scraper to retrieve data from users’ Twitter accounts, including initial
tweet users, retweeters, and repliers. The counts of reviews, likes, followers, followee, and position was all
collected by them as user attributes (when available). Cleaning the collected data and annotating it according
to feelings and emotions were also part of the data pre-processing process as per our survey. There were a
lot of needless symbols and noise in the tweets and comments. The phases of pre-processing are depicted in
Table 1. The following measures were taken during the data cleaning process: (i) All customer bring up
(e.g., alice) were take-off from the facts and replaced with a connection between users in the network; (ii)
all hashtags (only the #symbol) were abolish; and (iii) all emotag(e.g., :-), :-(etc.) were removed.(iv) and (v)
all URLs were takeoff (i.e., http://a.com). After that, unsullied reviews and responses were annotated by
them. Pragmatic, pessimistic, and impartial emotions were apportioned to all tweets and comments by the
researchers. The researchers used AIT sentiment emotion model [9], that identifies some human feelings:
anger, frustration, fear, joy, unhappiness, and surprise. These six emotions were used to annotate all tweets
and comments. According to their agreement or disagreement on the tweet, the tweet responses were
explicated with acceptance or disagreement. Here the researchers require Natural Language Processing
(NLP) to process the text after annotating tweets and answers. The sentences’ words were tokenized. Then,
using a POS tagger, typical English stop words (such as am, as, the, and so on) were diminished, and words
were marked according to related Parts-of-Speech (POS). Collection of only Noun, Adjective, Verb and
Adverb (NAVA) words from all words is only preferred, since most of the contribution to a sentence is by
them. So therefore, they removed the content that had been pre-processed. Therefore, they kept both of the
NAVA and cleaned full text for further comparison in the classification process.

c.    Sentiment and emotion detection
    In prior investigations of sentiment identification from content, several researchers utilized various
methodologies. Machine learning has been used in recent studies of both supervised [12,13] and
unsupervised [4,24] classifiers. The field of Machine Learning is a good and risk-free option for a larger
dataset, and training of the classifier quite a comprehensive task than constructing sentiment word dictionary
definitions. Naive Bayes outperformed some other ML algorithms in the literature whenever it came to
determining sentiment and emotion from content. As a result, during our studies, we investigated the Naive
Bayes algorithm and we found that it is used to sort tweets into categories of respective sentiment and
emotion groups. In the pre-processed text (including tweets and replies) the emotion and sentiment
classifications are divided into training and test sets. Both of these characteristics, as well as the number of
reviews, retweets, followers, and followers’ followees, were used to compute the final effect score. A
recommendation system that is used by the researchers compute the score to build a count of individuals
who have indistinguishable feelings and thoughts on a specific topic. We analyzed the Naive Bayes method
on our pre-processed text, we surveyed 3-fold, 5-fold, and 10- fold cross-validation (tweets and replies). The
categorization was expanded to include both the NAVA and clean text. Words from each tweet and reaction
are included in the feature set used by Naive Bayes. On the same dataset, we analyzed different types of
classifiers like once to classify them based on their thoughts, and then again to classify them based on their
related emotion.




                                                      62
Table 2: Emotion Code Word
             Emoji                                   Emotion code-word

             :), : ), :-), (:, ( :, (-:, :’)         Twinkle

             :D, : D, :-D, xD, x-D, XD               Giggle

             :-(, : (, :(, ):, )-:                   Depressed

             :’(, :’(, :”(                           weep


7. Conclusion and Future Scope
   The analysis of various emotions and sentiments revealed some fascinating human characteristics.
Determining sentiment through word is challenging, and most reports suffer from a variety of flaws,
including ambiguous language, numerous emotion-bearing texts, and text devoid of emotional terminology,
to name a few. We have analyzed different datasets that contains the reviews of the Customers. These
datasets give clear information about the customers reviews so that analysis can perform in the better way.
Nonetheless, we’ve analyzed with a variety of methods for detecting emotion in tweets. Our analysis suggest
that EmoSenticNet lexicon is better than that of WordNet. But even better outcomes are yet achievable. An
algorithm that can automatically classify tweets would be a fascinating field of research for future work.

8. References
[1] R. Hirat and N. Mittal, “A Survey on Emotion Detection Techniques us- ing Text in Blogposts,”
      International Bulletin of Mathematical Research, vol. 2, no. 1, pp. 180–187, 2015.
[2] R. Sawyer and G.-M. Chen, “The Impact of Social Media on Intercul- tural Adaptation,” 2012. S.
      Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, “Semeval-2018 Task 1: Affect
      in Tweets,” in Proceedings of the 12th International Workshop on Semantic Evaluation, 2018, pp.
      1–17.R. C. Balabantaray, M. Mohammad, and N. Sharma, “Multi-class Twitter Emotion
      Classification: A New Approach,” International Journal of Applied Information Systems, vol. 4,
      no. 1, pp. 48–53, 2012.
[3]   M. Anjaria and R. M. R. Guddeti, “Influence factor-based opinion mining of Twitter data using
      supervised learning,” in 2014 Sixth International Conference on Communication Systems and
      Networks (COMSNETS), 2014, pp. 1–8.
[4]   S. Yuan, H. Huang, and L. Wu, “Use of Word Clustering to Improve Emotion Recognition from
      Short Text,” Journal of Computing Science and Engineering, vol. 10, no. 4, pp. 103– 110, 2016.
[5]   B. Thomas, P. Vinod, and K. A. Dhanya, “Multiclass Emotion Extraction from Sentences,”
      International Journal of Scientific amp; Engineering Research, vol. 5, no. 2, 2014.
[6]   H. Yang, A. Willis, A. D. Roeck, and B. Nuseibeh, “A Hybrid Model for Automatic Emotion
      Recognition in Suicide Notes,” Biomedical informatics insights, vol. 5, p. 8948, 2012.
[7]   D. T. Ho and T. H. Cao, “A High-order Hidden Markov Model for Emotion Detection from Textual
      Data,” in Pacific Rim Knowledge Acquisition Workshop, 2012, pp. 94–105.
[8]   M. Hasan, E. Rundensteiner, and E. Agu, “Automatic emotion detection in text streams by
      analyzing Twitter data,” International Journal of Data Science and Analytics, vol. 7, no. 1, pp. 35–
      51, 2019.
[9]   M. Hasan and E. Rundensteiner and E. Agu, “Emotex: Detecting Emotions in Twitter Messages,”
      2014. Seyeditabari, S. Levens, C. D. Maestas, S. Shaikh, J. I. Walsh,

                                                    63
[10] W. Zadrozny, C. Danis, and O. P. Thompson, “Cross Corpus Emotion Classification Using Survey
    Data,” This paper was presented at AISB, 2017.
[11] H. Binali, C. Wu, and V. Potdar, “Computational approaches for emotion detection in text,” in 4th
    IEEE International Conference on Digital Ecosystems and Technologies, 2010, pp. 172–177.
[12] C. Strapparava, A. Valitutti et al., “Wordnet-Affect: an Affective Exten- sion of WordNet,” in
    Lrec, vol. 4, 2004, p. 40.
[13] S. Poria, A. Gelbukh, A. Hussain, N. Howard, D. Das, and S. Bandy- opadhyay, “Enhanced
     SenticNet with Affective Labels for Concept- Based Opinion Mining,” IEEE Intelligent Systems,
     vol. 28, no. 2, pp. 31–38, 2013.
[14] E. Cambria, D. Olsher, and D. Rajagopal, “SenticNet 3: A Common and               Common-Sense
     Knowledge Base for Cognition-Driven Sentiment Analysis,” in Twenty-eighth AAAI conference
     on artificial intelligence, 2014.
[15] D. Effrosynidis, S. Symeonidis, and A. Arampatzis, “A Comparison of Pre-processing Techniques
     for Twitter Sentiment Analysis,” in Interna- tional Conference on Theory and Practice of Digital
     Libraries, 2017, pp. 394–406.
[16] G. Jivani et al., “A Comparative Study of Stemming Algorithms,” Int. J. Comp. Tech. Appl, vol.
     2, no. 6, pp. 1930–1938, 2011.
[17] “Slang Dict,” https://floatcode.files.wordpress.com/2015/11/slang dict. doc, accessed: 2019- 05-
     17.
[18] “Github - wolfgarbe/SymSpell,” https://github.com/wolfgarbe/SymSpell, accessed: 2019-05- 17.
[19] Y. Yang, X. Liu et al., “A re-examination of text categorization methods,” in Sigir, vol. 99, no. 8,
     1999, p. 99.
[20] H. Zhang, “The optimality of naive bayes,” AA, vol. 1, no. 2, p. 3, 2004. [22] G. Forman and I.
     Cohen, “Learning from little: Comparison of classifiers given little training,” in European
     Conference on Principles of Data Mining and Knowledge Discovery. Springer, 2004, pp. 161–172.
[21] Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers: A comparison of logistic
     regression and naive bayes,” in Advances in neural information processing systems, 2002, pp. 841–
     848.
[22] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, 1st ed. O’Reilly Media,
     Inc., 2009.
[23] D. Robinson, ” Text analysis of Trump’s tweets confirms he writes only the (angrier) Android
     half”, Variance explained, 2016.
[24] F. M. Shah, A. S. Reyadh, A. I. Shaafi, S. Ahmed and F. T. Sithil, ” Emotion Detection from
     Tweets using AIT-2018 Dataset,” 2019 5th In- ternational Conference on Advances in Electrical
     Engineering (ICAEE), 2019, pp. 575-580, doi: 10.1109/ICAEE48663.2019.8975433.
[25] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, ” The Extended Cohn-
     Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,”
     2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition -
     Workshops, 2010, pp. 94-101, doi: 10.1109/CVPRW.2010.5543262.
[26] R. Kosti, J. M. Alvarez, A. Recasens and A. Lapedriza, ” EMOTIC: Emotions in Context Dataset,”
     2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017,
     pp. 2309- 2317, doi: 10.1109/CVPRW.2017.285.
[27] R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Win- kler and N. Sebe, ” ASCERTAIN:
     Emotion and Personality Recog- nition Using Commercial Sensors,” in IEEE Transactions on
     Affec- tive Computing, vol. 9, no. 2, pp. 147-160, 1 April-June 2018, doi:
     10.1109/TAFFC.2016.2625250.




                                                   64