<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Feature Selection for Emotion Classification∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Purpura</string-name>
          <email>purpuraa@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Masiero</string-name>
          <email>chiara.masiero@statwolf</email>
          <email>chiara.masiero@statwolf.</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Supervised Learning, Feature Selection, Emotion Classification,</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <email>silvello@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gian Antonio Susto</string-name>
          <email>sustogia@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Document Classification</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padua</institution>
          ,
          <addr-line>Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>com, Statwolf Data Science</institution>
          ,
          <addr-line>Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe a novel supervised approach to extract a set of features for document representation in the context of Emotion Classification (EC). Our approach employs the coeficients of a logistic regression model to extract the most discriminative word unigrams and bigrams to perform EC. In particular, we employ this set of features to represent the documents, while we perform the classification using a Support Vector Machine. The proposed method is evaluated on two publicly available and widely-used collections. We also evaluate the robustness of the extracted set of features on diferent domains, using the first collection to perform feature extraction and the second one to perform EC. We compare the obtained results to similar supervised approaches for document classification (i.e. FastText), EC (i.e. #Emotional Tweets, SNBC and UMM) and to a Word2Vec-based pipeline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Content analysis and feature
selection; Sentiment analysis; • Computing methodologies →
Supervised learning by classification;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        The goal of Emotion classification (EC) is to detect and categorize
the emotion(s) expressed by a human. We can find numerous
examples in the literature presenting ways to perform EC on diferent
types of data sources such as audio [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or microblogs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Emotions have a large influence on our decision making. For this reason,
being able to understand how to identify them can be useful not
only to improve the interaction between humans and machines
(i.e. with chatbots, or robots), but also to extract useful insights for
marketing goals [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Indeed, EC is employed in a wide variety of
contexts which include – but are not limited to – social media [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and online stores – where it is closely related to Sentiment
Analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] – with the goal of interpreting emerging trends or to better
understand the opinions of customers. In this work, we focus EC
approaches which can be applied to textual data. The task is most
frequently tackled as a multi-class classification problem. Given
∗Extended abstract of the original paper published in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        This work was supported by the CDC-STARS project and co-funded by UNIPD.
a document d, and a set of candidate emotion labels, the goal is
to assign one label to d – sometimes more than one label can be
assigned, changing the task to multi-label classification. The most
used set of emotions in computer science is the set of the six
Ekman emotions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (i.e. anger, fear, disgust, joy, sadness, surprise).
Traditionally, EC has been performed using dictionary-based
approaches, i.e. lists of terms which are known to be related to certain
emotions as in ANEW [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, there are two main issues
which limit their application on a large scale: (i) they cannot adapt
to the context or domain where a word is used (ii) they cannot
infer an emotion label for portions of text which do not contain
any of the terms available in the dictionary. A possible
alternative to dictionary-based approaches are machine learning and deep
learning models based on an embedded representation of words,
such as Word2Vec [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or FastText [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These approaches however,
need lots of data to train an accurate model and they cannot
easily adapt to low resource domains. For this reason, we present a
novel approach for feature selection and a pipeline for emotion
classification which outperform state-of-the-art approaches
without requiring large amounts of data. Additionally, we show how
the proposed approach generalizes well to diferent domains. We
evaluate our approach on two popular and publicly available data
sets – i.e. the Twitter Emotion Corpus (TEC) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and SemEval 2007
Afective Text Corpus (1,250 Headlines) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] – and compare it to
state of-the-art approaches for document representation – such
as Word2Vec and FastText – and classification – i.e. #Emotional
Tweets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], SNBC [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and UMM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED APPROACH</title>
      <p>The proposed approach exploits the coeficients of a multinomial
logistic regression model to extract an emotion lexicon from a
collection of short textual documents. First, we extract all word
unigrams and bigrams in the target collection after performing
stopwords removal. 1 Second, we represent the documents using
the vector space model (TF-IDF). Then, we train a logistic regressor
model with elastic-net regularization to perform EC. This model is
characterized by the following loss function:</p>
      <p>" 1 ÕN K K
ℓ({β0k , βk }1K ) = − N i=1 kÕ=1 yiℓ (β0k + xiT βk ) − log(kÕ=1 e β0k +xiT βk )
!#
" p #
+ λ (1 − α )| |β | |F2 /2 + α Õ | |β | |1 ,
j=1
(1)
where β is a (p+1)×K matrix of coeficients and βk refers to the
kth column (for outcome category k). For last penalty term ||β ||1, we
employ a lasso penalty on its coeficients in order to induce sparse
1We employ a list of 170 English terms, see nltk v.3.2.5 https://www.nltk.org.
solution. To solve this optimization problem we use the partial
Newton algorithm by making a partial quadratic approximation of
the log-likelihood, allowing only (β0k , βk ) to vary for a single class
at a time. For each value of λ, we first cycle over all classes indexed
by k, computing each time a partial quadratic approximation about
the parameters of the current class. 2 Finally, we examine the β
coeficients for each class of the trained model and keep the features
(i.e. word unigrams and bigrams) associated to non-zero weights in
any of the classes. To evaluate the quality of the extracted features,
we perform EC using a Support Vector Machine (SVM). We consider
a vector representation of documents based on the set of features
extracted as described above, weighting them according to their
TF-IDF score.</p>
    </sec>
    <sec id="sec-4">
      <title>3 RESULTS</title>
      <p>
        For the evaluation of the proposed approach we consider the TEC
and 1,250 Headlines collections. TEC is composed by 21,051 tweets
which were labeled automatically – according to the set of six
Ekman emotions – using the hashtags they contained and removing
them afterwards. We split the collection into a training and a test
set of equal size to train the logistic regression model for feature
selection. Then, we perform a 5-fold cross validation to train an
SVM for EC using the previously extracted features and report in
Table 1 the average of the results over all six classes, obtained in
the five folds. We also report in Table 1 the performance of FastText
– that we computed as in the previous case – and the one of SNBC
as described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. From the results in Table 1, we observe that
      </p>
      <p>Method Mean Precision Mean Recall Mean F1 Score
Proposed Approach 0.509 0.477 0.490
#Emotional Tweets 0.474 0.360 0.406</p>
      <p>FastText 0.504 0.453 0.461</p>
      <p>
        SNBC 0.488 0.499 0.476
Table 1: Comparison with #Emotional Tweets, FastText and
SNBC on the TEC data set.
the proposed classification pipeline outperforms almost all of the
selected baselines on the TEC data set. The only exception is SNBC,
where we achieve a slighlty lower Recall (-0.022). The 1,250
Headlines data set is a collection of 1,250 newspaper headlines divided
in a training (1000 headlines) and a test (250 headlines) set. We
employ this data set to evaluate the robustness of the features that
we extracted from a randomly sampled subset of tweets equal to
70% of the total size of TEC data set. 3 The results of this experiment
are reported in Table 2. We report the performance of (i) a FastText
model trained on the training subsed of the data set of 1,000
headlines, (ii) an EC classification pipeline based on Word2Vec and a
Gaussian Naive Bayes classifier (GNB) trained on the same training
subset of 1,000 headlines of the data set, (iii) #Emotional Tweets,
described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and (iv) UMM, reported in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. From the results
reported in Table 2, we see that our approach outperforms again
all the selected baselines in almost all of the evaluations measures.
The approach presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is the only one to have a slightly
higher precision than our method (+0.002).
2A Python implementation which optimizes the parameters of the model is: https:
//github.com/bbalasub1/glmnet_python/blob/master/docs/glmnet_vignette.ipynb.
3We restricted the training set for the multinomial logistic regressor because of the
limitations of the glmnet library we used for its implementation.
      </p>
      <p>Method Mean Precision Mean Recall Mean F1 Score
Proposed Approach 0.377 0.790 0.479</p>
      <p>FastText 0.442 0.509 0.378
Word2Vec + GNB 0.309 0.423 0.346
#Emotional Tweets 0.444 0.353 0.393
UMM (ngrams + POS + CF) - - 0.410
Table 2: Comparison with #Emotional Tweets, UMM (best
pipeline on the dataset), FastText and Word2Vec+GNB on
250 Headlines data set.</p>
    </sec>
    <sec id="sec-5">
      <title>4 DISCUSSION AND FUTURE WORK</title>
      <p>
        We presented and evaluated a supervised approach to perform
feature selection for Emotion Classification (EC). Our pipeline relies
on a multinomial logistic regression model to perform feature
selection, and on a Support Vector Machine (SVM) to perform EC.
We evaluated it on two publicly available and widely-used
experimental collections, i.e. the Twitter Emotion Corpus (TEC) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
SemEval 2007 (1,250 Headlines) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We also compared it to
similar techniques such as the one described in #Emotional Tweets
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], FastText [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], SNBC [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], UMM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a Word2Vec-based [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
classification pipeline. We first evaluated our pipeline for EC on
documents from the same domain from which the features where
extracted (i.e. the TEC data set). Then, we employed it to perform
EC on the 1,250 Headlines dataset using the features extracted from
TEC. In both experiments, our approach outperformed the selected
baselines in almost all the performance measures. More information
to reproduce our experiments is provided in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We also make our
code publicly available. 4 We highlight that our approach might
be applied to other document classification tasks, such as topic
labeling or sentiment analysis. Indeed, we are using a general
approach adaptable to any task or applicative domain in the document
classification field.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandhakavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Padmanabhan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Massie</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Lexicon based feature extraction for emotion text classification</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>93</volume>
          (
          <year>2017</year>
          ),
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Bradley</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Afective norms for English words (ANEW): Instruction manual and afective ratings</article-title>
          .
          <source>Technical Report</source>
          . Citeseer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          .
          <year>1993</year>
          .
          <article-title>Facial expression</article-title>
          and emotion.
          <source>American psychologist 48</source>
          ,
          <issue>4</issue>
          (
          <year>1993</year>
          ),
          <fpage>384</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Bag of Tricks for Eficient Text Classification</article-title>
          . (
          <year>2016</year>
          ). arXiv:
          <volume>1607</volume>
          .01759 http://arxiv.org/abs/1607.01759
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , Chen K.,
          <string-name>
            <given-names>G. S</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In NIPS</source>
          <year>2013</year>
          .
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          .
          <year>2012</year>
          . #
          <article-title>Emotional tweets</article-title>
          .
          <source>In Proc. of the First Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics</source>
          ,
          <fpage>246</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval 2</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2008</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Purpura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Masiero</surname>
          </string-name>
          , G. Silvello, and
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Supervised Lexicon Extraction for Emotion Classification</article-title>
          .
          <source>In Companion Proc. of WWW 2019. ACM</source>
          ,
          <volume>1071</volume>
          -
          <fpage>1078</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Purpura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Masiero</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Susto</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>WS4ABSA: An NMF-Based WeaklySupervised Approach for Aspect-Based Sentiment Analysis with Application to Online Reviews</article-title>
          .
          <source>In Discovery Science (Lecture Notes in Computer Science)</source>
          , Vol.
          <volume>11198</volume>
          . Springer International Publishing, Cham,
          <fpage>386</fpage>
          -
          <lpage>401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F. H.</given-names>
            <surname>Rachman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarno</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Fatichah</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Music emotion classification based on lyrics-audio using corpus based emotion</article-title>
          .
          <source>International Journal of Electrical and Computer Engineering</source>
          <volume>8</volume>
          ,
          <issue>3</issue>
          (
          <year>2018</year>
          ),
          <fpage>1720</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Shahraki</surname>
          </string-name>
          and
          <string-name>
            <given-names>O. R.</given-names>
            <surname>Zaiane</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Lexical and learning-based emotion mining from text</article-title>
          .
          <source>In Proc. of CICLing</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Strapparava</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          .
          <year>2007</year>
          . Semeval-2007 task 14:
          <article-title>Afective text</article-title>
          .
          <source>ACL</source>
          ,
          <fpage>70</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>