=Paper=
{{Paper
|id=Vol-2441/paper16
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-2441/paper6.pdf
|volume=Vol-2441
|dblpUrl=https://dblp.org/rec/conf/iir/PurpuraMSS19
}}
==None==
Feature Selection for Emotion Classification∗
Alberto Purpura Chiara Masiero Gianmaria Silvello Gian Antonio Susto
purpuraa@dei.unipd.it chiara.masiero@statwolf. silvello@dei.unipd.it sustogia@dei.unipd.it
University of Padua com University of Padua University of Padua
Padua, Italy Statwolf Data Science Padua, Italy Padua, Italy
Padua, Italy
ABSTRACT a document d, and a set of candidate emotion labels, the goal is
In this paper, we describe a novel supervised approach to extract to assign one label to d – sometimes more than one label can be
a set of features for document representation in the context of assigned, changing the task to multi-label classification. The most
Emotion Classification (EC). Our approach employs the coefficients used set of emotions in computer science is the set of the six Ek-
of a logistic regression model to extract the most discriminative man emotions [3] (i.e. anger, fear, disgust, joy, sadness, surprise).
word unigrams and bigrams to perform EC. In particular, we employ Traditionally, EC has been performed using dictionary-based ap-
this set of features to represent the documents, while we perform proaches, i.e. lists of terms which are known to be related to certain
the classification using a Support Vector Machine. The proposed emotions as in ANEW [2]. However, there are two main issues
method is evaluated on two publicly available and widely-used which limit their application on a large scale: (i) they cannot adapt
collections. We also evaluate the robustness of the extracted set of to the context or domain where a word is used (ii) they cannot
features on different domains, using the first collection to perform infer an emotion label for portions of text which do not contain
feature extraction and the second one to perform EC. We compare any of the terms available in the dictionary. A possible alterna-
the obtained results to similar supervised approaches for document tive to dictionary-based approaches are machine learning and deep
classification (i.e. FastText), EC (i.e. #Emotional Tweets, SNBC and learning models based on an embedded representation of words,
UMM) and to a Word2Vec-based pipeline. such as Word2Vec [5] or FastText [4]. These approaches however,
need lots of data to train an accurate model and they cannot eas-
CCS CONCEPTS ily adapt to low resource domains. For this reason, we present a
novel approach for feature selection and a pipeline for emotion
• Information systems → Content analysis and feature se-
classification which outperform state-of-the-art approaches with-
lection; Sentiment analysis; • Computing methodologies →
out requiring large amounts of data. Additionally, we show how
Supervised learning by classification;
the proposed approach generalizes well to different domains. We
KEYWORDS evaluate our approach on two popular and publicly available data
sets – i.e. the Twitter Emotion Corpus (TEC) [6] and SemEval 2007
Supervised Learning, Feature Selection, Emotion Classification, Affective Text Corpus (1,250 Headlines) [12] – and compare it to
Document Classification state of-the-art approaches for document representation – such
as Word2Vec and FastText – and classification – i.e. #Emotional
1 INTRODUCTION Tweets [6], SNBC [11] and UMM [1].
The goal of Emotion classification (EC) is to detect and categorize
the emotion(s) expressed by a human. We can find numerous exam- 2 PROPOSED APPROACH
ples in the literature presenting ways to perform EC on different The proposed approach exploits the coefficients of a multinomial
types of data sources such as audio [10] or microblogs [8]. Emo- logistic regression model to extract an emotion lexicon from a
tions have a large influence on our decision making. For this reason, collection of short textual documents. First, we extract all word
being able to understand how to identify them can be useful not unigrams and bigrams in the target collection after performing
only to improve the interaction between humans and machines stopwords removal. 1 Second, we represent the documents using
(i.e. with chatbots, or robots), but also to extract useful insights for the vector space model (TF-IDF). Then, we train a logistic regressor
marketing goals [7]. Indeed, EC is employed in a wide variety of model with elastic-net regularization to perform EC. This model is
contexts which include – but are not limited to – social media [8] characterized by the following loss function:
and online stores – where it is closely related to Sentiment Analy-
sis [9] – with the goal of interpreting emerging trends or to better " N K K
!#
1 Õ Õ T
ℓ({β 0k , β k }1K ) = − yi ℓ (β 0k + x iT β k ) − log( e β0k +x i βk )
Õ
understand the opinions of customers. In this work, we focus EC
N i =1
approaches which can be applied to textual data. The task is most k =1 k =1
p
" #
frequently tackled as a multi-class classification problem. Given + λ (1 − α ) | |β | |F /2 + α
2
Õ
| |β | |1 ,
∗ Extended abstract of the original paper published in [8]. j=1
(1)
This work was supported by the CDC-STARS project and co-funded by UNIPD.
where β is a (p+1)×K matrix of coefficients and βk refers to the k-
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
th column (for outcome category k). For last penalty term ||β ||1 , we
IIR 2019, September 16–18, 2019, Padova, Italy employ a lasso penalty on its coefficients in order to induce sparse
1 We employ a list of 170 English terms, see nltk v.3.2.5 https://www.nltk.org.
47
IIR 2019, September 16–18, 2019, Padova, Italy Alberto Purpura, Chiara Masiero, Gianmaria Silvello, and Gian Antonio Susto
Method Mean Precision Mean Recall Mean F1 Score
solution. To solve this optimization problem we use the partial
Proposed Approach 0.377 0.790 0.479
Newton algorithm by making a partial quadratic approximation of FastText 0.442 0.509 0.378
the log-likelihood, allowing only (β 0k , βk ) to vary for a single class Word2Vec + GNB 0.309 0.423 0.346
at a time. For each value of λ, we first cycle over all classes indexed #Emotional Tweets 0.444 0.353 0.393
UMM (ngrams + POS + CF) - - 0.410
by k, computing each time a partial quadratic approximation about
the parameters of the current class. 2 Finally, we examine the β- Table 2: Comparison with #Emotional Tweets, UMM (best
coefficients for each class of the trained model and keep the features pipeline on the dataset), FastText and Word2Vec+GNB on
(i.e. word unigrams and bigrams) associated to non-zero weights in 250 Headlines data set.
any of the classes. To evaluate the quality of the extracted features,
we perform EC using a Support Vector Machine (SVM). We consider
4 DISCUSSION AND FUTURE WORK
a vector representation of documents based on the set of features We presented and evaluated a supervised approach to perform fea-
extracted as described above, weighting them according to their ture selection for Emotion Classification (EC). Our pipeline relies
TF-IDF score. on a multinomial logistic regression model to perform feature se-
lection, and on a Support Vector Machine (SVM) to perform EC.
3 RESULTS We evaluated it on two publicly available and widely-used experi-
mental collections, i.e. the Twitter Emotion Corpus (TEC) [6] and
For the evaluation of the proposed approach we consider the TEC
SemEval 2007 (1,250 Headlines) [12]. We also compared it to sim-
and 1,250 Headlines collections. TEC is composed by 21,051 tweets
ilar techniques such as the one described in #Emotional Tweets
which were labeled automatically – according to the set of six Ek-
[6], FastText [4], SNBC [11], UMM [1] and a Word2Vec-based [5]
man emotions – using the hashtags they contained and removing
classification pipeline. We first evaluated our pipeline for EC on
them afterwards. We split the collection into a training and a test
documents from the same domain from which the features where
set of equal size to train the logistic regression model for feature
extracted (i.e. the TEC data set). Then, we employed it to perform
selection. Then, we perform a 5-fold cross validation to train an
EC on the 1,250 Headlines dataset using the features extracted from
SVM for EC using the previously extracted features and report in
TEC. In both experiments, our approach outperformed the selected
Table 1 the average of the results over all six classes, obtained in
baselines in almost all the performance measures. More information
the five folds. We also report in Table 1 the performance of FastText
to reproduce our experiments is provided in [8]. We also make our
– that we computed as in the previous case – and the one of SNBC
code publicly available. 4 We highlight that our approach might
as described in [11]. From the results in Table 1, we observe that
be applied to other document classification tasks, such as topic
Method Mean Precision Mean Recall Mean F1 Score labeling or sentiment analysis. Indeed, we are using a general ap-
Proposed Approach 0.509 0.477 0.490 proach adaptable to any task or applicative domain in the document
#Emotional Tweets 0.474 0.360 0.406 classification field.
FastText 0.504 0.453 0.461
SNBC 0.488 0.499 0.476
Table 1: Comparison with #Emotional Tweets, FastText and
REFERENCES
[1] A. Bandhakavi, N. Wiratunga, D. Padmanabhan, and S. Massie. 2017. Lexicon
SNBC on the TEC data set. based feature extraction for emotion text classification. Pattern Recognition Letters
93 (2017), 133–142.
the proposed classification pipeline outperforms almost all of the [2] M. M. Bradley and P. J. Lang. 1999. Affective norms for English words (ANEW):
selected baselines on the TEC data set. The only exception is SNBC, Instruction manual and affective ratings. Technical Report. Citeseer.
[3] P. Ekman. 1993. Facial expression and emotion. American psychologist 48, 4
where we achieve a slighlty lower Recall (-0.022). The 1,250 Head- (1993), 384.
lines data set is a collection of 1,250 newspaper headlines divided [4] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2016. Bag of Tricks for Efficient
Text Classification. (2016). arXiv:1607.01759 http://arxiv.org/abs/1607.01759
in a training (1000 headlines) and a test (250 headlines) set. We [5] T. Mikolov, I. Sutskever, Chen K., G. S Corrado, and J. Dean. 2013. Distributed
employ this data set to evaluate the robustness of the features that Representations of Words and Phrases and their Compositionality. In NIPS 2013.
we extracted from a randomly sampled subset of tweets equal to 3111–3119.
[6] S. M. Mohammad. 2012. # Emotional tweets. In Proc. of the First Joint Conference on
70% of the total size of TEC data set. 3 The results of this experiment Lexical and Computational Semantics. Association for Computational Linguistics,
are reported in Table 2. We report the performance of (i) a FastText 246–255.
model trained on the training subsed of the data set of 1,000 head- [7] B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations
and Trends in Information Retrieval 2, 1–2 (2008), 1–135.
lines, (ii) an EC classification pipeline based on Word2Vec and a [8] A. Purpura, C. Masiero, G. Silvello, and G. A. Susto. 2019. Supervised Lexicon
Gaussian Naive Bayes classifier (GNB) trained on the same training Extraction for Emotion Classification. In Companion Proc. of WWW 2019. ACM,
1071–1078.
subset of 1,000 headlines of the data set, (iii) #Emotional Tweets, [9] A. Purpura, C. Masiero, and G.A. Susto. 2018. WS4ABSA: An NMF-Based Weakly-
described in [6], and (iv) UMM, reported in [1]. From the results Supervised Approach for Aspect-Based Sentiment Analysis with Application
reported in Table 2, we see that our approach outperforms again to Online Reviews. In Discovery Science (Lecture Notes in Computer Science),
Vol. 11198. Springer International Publishing, Cham, 386–401.
all the selected baselines in almost all of the evaluations measures. [10] F. H. Rachman, R. Sarno, and C. Fatichah. 2018. Music emotion classification based
The approach presented in [6] is the only one to have a slightly on lyrics-audio using corpus based emotion. International Journal of Electrical
higher precision than our method (+0.002). and Computer Engineering 8, 3 (2018), 1720.
[11] A. G. Shahraki and O. R. Zaiane. 2017. Lexical and learning-based emotion mining
from text. In Proc. of CICLing 2017.
2 A Python implementation which optimizes the parameters of the model is: https:
[12] C. Strapparava and R. Mihalcea. 2007. Semeval-2007 task 14: Affective text. ACL,
//github.com/bbalasub1/glmnet_python/blob/master/docs/glmnet_vignette.ipynb. 70–74.
3 We restricted the training set for the multinomial logistic regressor because of the
limitations of the glmnet library we used for its implementation. 4 https://bitbucket.org/albpurpura/supervisedlexiconextractionforec/src/master/.
48