Emo2Val: Inferring Valence Scores from fine-grained Emotion Values
                Alessandro Bondielli, Lucia C. Passaro and Alessandro Lenci
                CoLing Lab, Dipartimento di Filologia, Letteratura e Linguistica
                                  University of Pisa (Italy)
                         alessandro.bondielli@gmail.com
                            lucia.passaro@for.unipi.it
                             alessandro.lenci@unipi.it


                     Abstract                             In the domain of Affective Computing, the goal
                                                       moves from the identification of such variables to
    English. This paper studies the relation-          the annotation of the texts with the emotions they
    ship between the valence, one of the psy-          express and - for Sentiment Analysis - with their
    cholinguistic variables in the Italian ver-        degree of positivity and/or negativity.
    sion of ANEW (Montefinese et al., 2014),              The aim of this work is to study the relationship
    and emotive scores calculated by exploit-          between the most important psycholinguistic vari-
    ing distributional methods (Passaro et al.,        ables and emotive scores calculated by exploiting
    2015). We show two methods to infer va-            distributional methods. In particular, we will fo-
    lence from fine grained emotions and dis-          cus on valence ratings, assuming that, within these
    cuss their evaluation.                             three dimensions, valence is the most highly re-
                                                       lated with a positive, negative or neutral emotional
    Italiano. Questo lavoro studia la re-              content. In fact, it can be defined as the “the polar-
    lazione tra la valenza, una delle vari-            ity of emotional activation” (Lang et al., 1999).
    abili psicolinguistiche presenti nella ver-
                                                          A possible approach to infer the valence of
    sione italiana di ANEW (Montefinese et
                                                       the words from co-occurrence statistics is the one
    al., 2014) e degli score emotivi calco-
                                                       adopted by Louwerse and Recchia (2014), who
    lati distribuzionalmente (Passaro et al.,
                                                       followed a bootstrapping method to extend the
    2015). Mostriamo due metodi per inferire
                                                       ANEW lexicon (Bradley and Lang, 1999). An-
    la valenza a partire da tali valori e ne dis-
                                                       other approach would be to exploit a resource such
    cutiamo la valutazione.
                                                       as SenticNet (Cambria et al., 2016) to infer va-
                                                       lence based on values of polarity for words or
                                                       conceptual primitives. An alternative strategy is
1   Introduction
                                                       to infer the valence from an emotive lexicon such
Recent years have seen a surge in studies con-         as ItEM (Passaro et al., 2015; Passaro and Lenci,
cerning emotional ratings, both in psycholinguis-      2016), a distributional lexicon for Italian, in which
tics and in affective computing. Traditionally, the    words are associated with an emotive score for 8
three main behavioral dimensions to measure the        different emotions. In our opinion, this solution
emotional value of a word are valence, arousal and     has several advantages: first of all, ItEM has been
dominance. Warriner et al. (2013) define valence       proven to be quite robust, and guarantees high cov-
as the “pleasantness of the stimulus”, usually rang-   erage over Italian words; secondly, it is not only a
ing from 1 (very unpleasant) to 9 (very pleasant).     static resource, but it can be easily expanded with
The word dead has a low valence rating, whereas        new words, allowing for a quick adaptation to dif-
holiday has a higher one. Arousal is the intensity     ferent contexts. Finally, associating words with
of the feeling evoked on a scale from “stimulated”     fine-grained emotional values allows for a wide
to “unaroused”. A highly stimulating word is pas-      range of analyses, such as for instance hate and
sion. On the contrary, sleep is not arousing. Fi-      violence detection in texts.
nally, dominance is identified with the degree to         Experimental results showed, in an indirect
which the stimulus makes the reader feel “in con-      way, that distributional emotive ratings can be
trol” (Louwerse and Recchia, 2014). Victory is a       very useful in the implementation of systems for
word with high dominance.                              polarity classification (Passaro and Lenci, 2016;
Bondielli, 2016). However, what is the real re-           each target term is associated with a score quan-
lation between emotive scores and valence? Our            tifying its association with each emotion in the
hypothesis is that emotions can be seen as a rep-         Plutchik’s taxonomy (Plutchik, 1994): J OY, S AD -
resentation of valence on a more granular scale.          NESS , A NGER , F EAR , T RUST, D ISGUST, S UR -
The Plutchik’s emotion taxonomy (Plutchik, 1994;          PRISE and A NTICIPATION . The resource has been
Plutchik, 2001) is partitioned into positive or nega-     created as follows: in a first phase, feature elicita-
tive emotions. However, borderline emotions such          tion was used to create a small set of seed lemmas
as S URPRISE are harder to be included into a posi-       highly associated to one or more of the emotions
tive or negative class, and therefore to be attributed    in the taxonomy. Then, these lemmas have been
with a direct valence rating. Words like party            distributionally expanded with the most frequent
and gun will have widely differing valence rat-           words in two Italian corpora (Baroni et al., 2004;
ings, but both strongly elicit the emotion of S UR -      Baroni et al., 2009). Finally, the emotive scores for
PRISE. Hence it is interesting to ask the follow-         each word were calculated by measuring the co-
ing question: given ItEM, are we able to predict          sine similarity between the lemma and eight emo-
the valence (i.e., positivity and/or negativity) of its   tive centroids built from the collected seeds.
words? In order to address this latter point, we
performed a simple regression model to predict the        3     From fine-grained Emotion Values to
valence ratings of words in ANEW (Montefinese                   Polarity
et al., 2014) given the respective emotive values
in ItEM (Passaro et al., 2015; Passaro and Lenci,         We used 2 main regression models to predict the
2016).                                                    valence from the distributional emotive scores.
                                                          The first experiment, described in section 3.1
   This paper is organized as follow: in Section 2
                                                          shows a polynomial regression model, and the sec-
we describe the resources used for the creation of
                                                          ond one (section 3.2) shows a logistic model in
the model. Section 3 shows our method and the
                                                          which the valence scores in ANEW have been dis-
results obtained. Finally, in Section 4 we evaluate
                                                          cretized into two classes representing the positive-
the results and discuss our findings.
                                                          ness and negativeness of the word.
2     Resources                                              A simple preprocessing phase has been applied
                                                          to align the two resources. ANEW has 1121
The main resources we used for our experiments            words, but 65 of them have multiple POS (e.g.
are the Italian version of the Affective Norms for        aereo (plane) can be both a noun and an adjective).
English Words (Montefinese et al., 2014) and the          We duplicated each word, extending the dataset
Italian EMotive lexicon (Passaro et al., 2015).           to 1189 elements, and extracted distinct emotive
                                                          scores for each <lemma,PoS> pair. In addition,
2.1    Italian ANEW                                       we replaced word forms like “scorie” (waste), with
ANEW (Affective Norms for English Words)                  their most frequent word type (scoria) in ItaWaC
(Bradley and Lang, 1999) is a database created            (Baroni et al., 2004) and La Repubblica (Baroni et
from a rating of 1034 English words with val-             al., 2004). Eventually, 57 ANEW words were left
ues for valence, arousal and dominance. Mon-              out of the analysis because they were not in ItEM.
tefinese et al. (2014) provided an Italian version        Overall, the resulting size of the aligned dataset is
of ANEW, developed by translating the English             1129 elements. Finally, to cope with the different
ANEW words, and by adding the words taken                 distribution of data among the various emotions in
from the Italian semantic norms (Montefinese et           ItEM, we normalized the scores with their z-score.
al., 2012), for a total of 1121 words. Ratings
have been obtained via an experiment where par-           3.1    Polynomial regression
ticipants had to rate words for the target variables.     Due to the bimodal distribution of the data in
The reported ratings are the average of the ratings       ANEW, we decided to use a polynomial regres-
for all participants.                                     sion model to predict the valence of the words
                                                          in ANEW by exploiting their emotive normalized
2.2    ItEM                                               scores in ItEM. Preliminary tests had in fact shown
ItEM (Passaro et al., 2015; Passaro and Lenci,            that a simple multiple linear regression model was
2016) is an emotive lexicon for Italian, in which         not able to properly fit the data. The histogram
                                                       (MeanAE) of 1.08, a mean squared error (MSE)
                                                       of 1.81, and a Median absolute error (MedianAE)
                                                       of 0.95.


      Figure 1: Valence ratings distribution


in Figure 1 shows such data distribution, in which
most of the ANEW words have a valence score in                  Figure 2: Fitting of predictions
the ranges 2-3 and 6-8, with a slight bias towards
higher values.                                            For this experiment, we also provide two ad-
   To define the most performing degree (Deg) of       ditional evaluations (the corresponding results are
the polynomial function, we performed 10-fold          shown in Table 2):
cross validation for degrees in the range {1...5}.
The results, presented in Table 1, clearly show         A) the results of prediction by means of a 10-
overfitting for degrees equal or higher than 3. This       fold cross validation;
is due to the fact that, given the number of param-
eters (#P), the estimated minimum number of ob-         B) the results of prediction by means of split
servations (Min. Obs.), computed as #P × 15,               of the data between training (66%) and test
must be at most around the total number of obser-          (33%).
vations. This is true only for polynomial of de-
gree 1 and 2. This finding is in line with Schmidt        Method     R2    MeanAE     MSE    MedianAE
(1971) and Harrell (2001) who demonstrated that             A       0.53    1.13      1.99     0.98
to guarantee the reliability of the prediction, each        B       0.54    1.13      2.00     0.93
parameter in the regression model should have a                Table 2: Results of the evaluations
minimum number of observations between 10 and
20.                                                       We would like to notice that our prediction per-
      Deg    #P    Min. Obs.     R2      MSE
                                                       forms better for words with a very high arousal. In
       1       9    ∼ 135       0.46     2.24          fact, emotionally arousing words were more likely
       2      45    ∼ 675       0.53     1.82          to be produced as an emotive prototypical word in
       3     165    ∼ 2475      0.31     1.50          the elicitation phase of ItEM. As a consequence,
       4     495    ∼ 7425     −81.29    0.96
       5    1287   ∼ 19305     −11 B     0.00          since ItEM’s emotive centroids have been con-
                                                       structed using the vectors of these words (namely
Table 1: Experiments performed to define the most      the seeds), also their nearest neighbors (i.e., the
performing Deg for the polynomial                      most emotive words) are assumed to have a high
                                                       level of arousal. Moreover, the distribution of the
   Given this result, we performed a polynomial        data in Figure 3, clearly shows how, in ANEW,
interpolation over our parameters with a polyno-       high arousal corresponds to very high (or very
mial of degree 2. Then, we applied a simple mul-       low) valence ratings, suggesting that highly arous-
tiple linear regression over the new data for pre-     ing words tend to be very positive or very negative
dicting the valence. Figure 2 shows the result of      (i.e. polarized). Building on this evidence, we per-
the regression fitting. For this model, we obtained    formed an additional experiment in which we used
a R-Squared (R2 ) of 0.58, a mean absolute error       the portion of the data (573 words) with an arousal
                                                        3.2     Logistic regression
                                                        Building on the last experiment, and supposing a
                                                        discretization of the valence into the positive and
                                                        negative class, we also used a logistic regression
                                                        model to predict this binary valence. The results
                                                        of this experiment are very promising. We per-
                                                        formed 10-fold cross validation to evaluate the
                                                        effectiveness of the logistic regression over the
                                                        transformed valence ratings, and obtained an av-
                                                        erage mean accuracy of 0.80. Detailed results for
                                                        this evaluation are shown in Table 3.
                                                                             Precision   Recall    F1
                                                                 MicroAVG     0.806      0.803    0.802
      Figure 3: Valence-Arousal distribution                     MacroAVG     0.803      0.803    0.803

                                                            Table 3: Logistic regression (Cross Validation)
rating higher than its median (5.64) for prediction.
In such model, in fact, R2 is attested to ∼ 0.64.
   Given the distribution of the data showed in Fig-    4     Results and discussion
ure 2, it is clear that a polynomial regression might
not be a perfect fit for valence ratings. Neverthe-     The results provided in previous experiments
less, it is very important to focus on MeanAE and       showed both pros and cons of this approach.
MSE values. These errors are relatively low with           The main advantage of exploiting distributional
respect to the scale of the human-rated valences.       emotive scores to predict the word’s valence is that
                                                        such scores can be easily obtained in an unsuper-
   This means that, on average, the difference be-
                                                        vised way by means of co-occurrence statistics.
tween human-rated valence and predicted valence
                                                           Moreover, predicted data showed a rather good
is between 1 and 2. To prove this point, we also
                                                        accuracy with respect to the actual distribution, es-
compared the obtained scores with the original hu-
                                                        pecially considering the logistic regression experi-
man annotations, by exploiting the standard devia-
                                                        ment. In fact, our models reach peak performances
tion for each valence rating. We found that 73, 5%
                                                        by focusing the analysis on the sign of the valence
of our predictions fall into the correct range around
                                                        with logistic regression instead of working with
the average valence. If we consider a word having
                                                        continuous values.
(in ANEW) a valence score of around 8 (e.g. pace
                                                           On the other hand, the main drawback of our ap-
(peace)) the system will predict a score between
                                                        proach derives from the dimension of the ANEW
6 and 9, leaving the word around the same (posi-
                                                        dataset, and in particular from the lack of exam-
tive) area of the distribution. The same (and oppo-
                                                        ples around the medium valence score ratings. It
site) goes for low-valenced words, such as drogato
                                                        is clear that the ratings distribution in this resource
(drug addicted) and feccia (scum). Problems arise
                                                        prevented us from obtaining reliable results for
in the case of the words with a medium valence.
                                                        continuous values. This might also provide an ex-
Examples can be corridoio (corridor) and insipido
                                                        planation for the errors concerning the logistic re-
(bland). In this case, the word will have the same
                                                        gression experiment. We are confident that having
chance to be attributed with a high valence score
                                                        access to a new resource covering the full spec-
(5-6) or with a low one (3-4). Supposing to dis-
                                                        trum of the valence more evenly would have a pos-
cretize valence ratings in two classes, a positive
                                                        itive impact on our model.
and a negative one, with a cut on the median, pre-
dictions will fall in the right class for most of the
                                                        5     Conclusions and ongoing research
high (or low) valenced words, and (possibly) in
the wrong one for the words of medium valence.          In this work we studied the relationship between
In fact, by constructing a shallow mapping of the       valence and distributional emotive scores. We
valence into positive (with valence >= 5.5) and         modeled our data with regression in order to pre-
negative class, we found a correlation of 0.73 be-      dict both a continuous score for valence and its
tween predicted and actual data.                        corresponding binarized version (i.e., polarity).
   Despite the difficulties of modeling an accu-            European Chapter of the Association for Computa-
rate representation of a continuous valence rating          tional Linguistics (EACL06), Trento (Italy). Associ-
                                                            ation for Computational Linguistics.
from a small and unbalanced dataset like the Ital-
ian ANEW, we can identify a clear relationship            A. Esuli and F. Sebastiani. 2006b. Sentiwordnet: A
between distributional emotional scores and a dis-          publicly available lexical resource for opinion min-
crete valence obtained by categorizing the ratings          ing. In Proceedings of the 5th International Confer-
                                                            ence on Language Resources and Evaluation, pages
into a positive and a negative class.                       417–422, Genoa (Italy). European Language Re-
   In the near future, we plan to improve our re-           source Association (ELRA).
gression models, with the aim of reducing the im-
                                                          F.E. Harrell. 2001. Regression Modeling Strategies:
pact of the distribution of the data in ANEW, pos-
                                                             With Applications to Linear Models, Logistic Re-
sibly implementing new strategies able to cope               gression, and Survival Analysis. Graduate Texts in
with non linear data. ANEW is a highly renown                Mathematics. Springer.
psycholinguistic dataset, but we plan to extend the
                                                          Peter J Lang, Margaret M Bradley, and Bruce N Cuth-
present work to predict sentiment polarity scores           bert. 1999. International affective picture sys-
taken from SentiWordNet (Esuli and Sebastiani,              tem (iaps): Technical manual and affective ratings.
2006a; Esuli and Sebastiani, 2006b), thereby ex-            Gainesville, FL: The Center for Research in Psy-
ploiting the larger coverage of this resource.              chophysiology, University of Florida, 2.
   Moreover, we plan to follow the approach em-           MM Louwerse and G Recchia. 2014. Reproducing
ployed in ItEM to create a polarity lexicon for Ital-      affective norms with lexical co-occurrence statis-
ian, using ANEW words as seed to build posi-               tics: Predicting valence, arousal, and dominance.
tive and negative polarity centroids. This would           The Quarterly Journal of Experimental Psychology,
                                                           68(12):1–15.
also be beneficial for evaluating performances on a
emotion-based approach and a polarity-based one.          Maria Montefinese, Ettore Ambrosini, Beth Fairfield,
   Finally, we aim at testing the effectiveness of         and Nicola Mammarella. 2012. Semantic memory:
                                                           A feature-based analysis and new norms for Italian.
our system for Sentiment Polarity Classification.          Behavior Research Methods, pages 1–22, oct.
                                                          Maria Montefinese, Ettore Ambrosini, Beth Fairfield,
References                                                 and Nicola Mammarella. 2014. The adaptation of
                                                           the affective norms for english words (anew) for ital-
Marco Baroni, Silvia Bernardini, Federica Comastri,        ian. Behavior research methods, 46(3):887–903.
 Lorenzo Piccioni, Alessandra Volpi, Guy Aston, and
 Marco Mazzoleni. 2004. Introducing the la repub-         Lucia C. Passaro and Alessandro Lenci. 2016. Eval-
 blica corpus: A large, annotated, tei (xml)-compliant      uating context selection strategies to build emotive
 corpus of newspaper italian. issues, 2:5–163.              vector space models. In Proceedings of the Tenth In-
Marco Baroni, Silvia Bernardini, Adriano Ferraresi,         ternational Conference on Language Resources and
 and Eros Zanchetta. 2009. The wacky wide                   Evaluation (LREC 2016), Portoro (Slovenia).
 web: a collection of very large linguistically pro-      Lucia C Passaro, Laura Pollacci, and Alessandro Lenci.
 cessed web-crawled corpora. Language resources             2015. Item: A vector space model to bootstrap an
 and evaluation, 43(3):209–226.                             italian emotive lexicon. CLiC it, 60(15):215.
Alessandro Bondielli. 2016. Da facebook a twitter:
                                                          Robert Plutchik. 1994. The psychology and biology of
  Creazione e utilizzo di una risorsa lessicale emotiva
                                                            emotion. HarperCollins College Publishers.
  per la sentiment analysis di tweet. Master’s thesis,
  University of Pisa, Italy.                              R. Plutchik. 2001. The nature of emotions. American
Margaret M Bradley and Peter J Lang. 1999. Affective         Scientist, 89:344–350.
 norms for english words (anew): Instruction manual
                                                          Frank L Schmidt. 1971. The relative efficiency of re-
 and affective ratings. Technical report, Technical re-
                                                            gression and simple unit predictor weights in applied
 port C-1, the center for research in psychophysiol-
                                                            differential psychology. Educational and Psycho-
 ogy, University of Florida.
                                                            logical Measurement, 31(3):699–714.
Erik Cambria, Soujanya Poria, Rajiv Bajpai, and
   Björn W Schuller. 2016. Senticnet 4: A semantic       Amy Beth Warriner, Victor Kuperman, and Marc Brys-
   resource for sentiment analysis based on conceptual     baert. 2013. Norms of valence, arousal, and dom-
   primitives. In COLING, pages 2666–2677.                 inance for 13,915 english lemmas. Behavior re-
                                                           search methods, 45(4):1191–1207.
A. Esuli and F. Sebastiani. 2006a. Determining term
  subjectivity and term orientation for opinion min-
  ing. In Proceedings of the 11th Conference of the