=Paper= {{Paper |id=Vol-2769/36 |storemode=property |title=Polarity Imbalance in Lexicon-based Sentiment Analysis |pdfUrl=https://ceur-ws.org/Vol-2769/paper_36.pdf |volume=Vol-2769 |authors=Marco Vassallo,Giuliano Gabrieli,Valerio Basile,Cristina Bosco |dblpUrl=https://dblp.org/rec/conf/clic-it/VassalloGBB20 }} ==Polarity Imbalance in Lexicon-based Sentiment Analysis== https://ceur-ws.org/Vol-2769/paper_36.pdf
            Polarity Imbalance in Lexicon-based Sentiment Analysis

            Marco Vassallo1 , Giuliano Gabrieli1 , Valerio Basile2 , Cristina Bosco2
           1. CREA Research Centre for Agricultural Policies and Bio-economy, Italy
              2. Dipartimento di Informatica, Università degli Studi di Torino, Italy
{marco.vassallo|giuliano.gabrieli}@crea.gov.it, {valerio.basile|cristina.bosco}@unito.it




                     Abstract                           con), usando come corpus per la ponder-
                                                        azione TWITA, un corpus di larga scala di
    Polarity imbalance is an asymmetric sit-            messaggi da Twitter in italiano. La nuova
    uation that occurs while using para-                risorsa Weighted-MAL (W-MAL), presen-
    metric threshold values in lexicon-based            tata per la prima volta in questo arti-
    Sentiment-Analysis (SA). The variation              colo, ottiene migliori risultati nella classi-
    across the thresholds may have an opposite          ficazione della polarità specialmente, per
    impact on the prediction of negative and            i messaggi negativi, oltre ad alleviare il
    positive polarity. We hypothesize that this         problema sopracitato di sbilanciamento
    may be due to asymmetries in the data or            della polarità.
    in the lexicon, or both. We carry out there-
    fore experiments for evaluating the effect
    of lexicon and of the topics addressed          1   Introduction and Motivation
    in the data. Our experiments are based
    on a weighted version of the Italian lin-       Sentiment Analysis (SA) is the task of Natural
    guistic resource MAL (Morphologically-          Language Processing that aims at extracting opin-
    inflected Affective Lexicon) by using as        ions from natural language expressions, e.g., re-
    weighting corpus TWITA, a large-scale           views or social media posts. The basic approaches
    corpus of messages from Twitter in Ital-        to SA typically fall into one of two categories:
    ian. The novel Weighted-MAL (W-MAL),            dictionary-based and supervised machine learn-
    presented for the first time int this paper,    ing. Methods based on a dictionary make use of
    achieved better polarity classification re-     affective lexicons, language resources where each
    sults especially for negative tweets, along     word or lemma is associated to a score indicating
    with alleviating the aforementioned polar-      its affective valence (e.g., polarity). In SA they are
    ity imbalance.                                  faster than supervised statistical approaches and
                                                    require minimal adaptation, unless the resource
    Italiano. Lo sbilanciamento della polarità     is domain-specific, also when applied to multi-
    è una situazione di asimmetria che si viene    ple environments with minimal adaptation over-
    a creare quando si impiegano valori soglia      head. However, they only achieve good perfor-
    parametrici nella Sentiment Analysis (SA)       mance for identifying coarse opinion tendencies in
    basata su dizionario. La variazione dei         large datasets, since they cannot take into account
    valori soglia può avere un impatto opposto     the impact of the context on the polarity value as-
    rispetto alla predizione di polarità neg-      sociated to a word.
    ativa e positiva. Si ipotizza che questo        Supervised statistical methods, on the other hand,
    effetto sia dovuto ad asimmetrie nei dati       tend to provide better quality predictions across
    o nel dizionario, o in entrambi. Abbi-          benchmarks, due to their better ability to gener-
    amo condotto esperimenti per misurare           alize over individual words and expressions, and
    l’effetto del lessico e degli argomenti trat-   learning higher level features. These models also
    tati nel nostro dataset. I nostri esperi-       show a better ability to adapt to specific domains,
    menti sono basati su una versione pon-
                                                         Copyright c 2020 for this paper by its authors. Use
    derata della risorsa per l’italiano MAL         permitted under Creative Commons License Attribution 4.0
    (Morphologically-inflected Affective Lexi-      International (CC BY 4.0).
provided the availability of data suitable for train-      The final section provides conclusive remarks and
ing.                                                       some hints about future work.
   In order to access the lexical entries in an affec-
tive dictionary, lemmatization must be performed           2       Affective Lexicons
on each single word. Unfortunately, lemmatiza-
                                                           SA is typically cast as a text classification task,
tion is an error-prone process, with potentially
                                                           very often approached by supervised statisti-
negative impact on the performance of down-
                                                           cal models among the NLP research community
stream tasks such as SA. Vassallo et al. (2019)
                                                           (Barbieri et al., 2016). However, there are several
introduced a novel computational linguistic re-
                                                           scenarios where dictionary-based methods are pre-
source, namely the Morphologically-inflected Af-
                                                           ferred, including large-scale industry-ready sys-
fective Lexicon (henceforth MAL) in order to ad-
                                                           tems, and domain-specific applications. While
dress this issue by avoiding the lemmatization step
                                                           generally less accurate than supervised classifica-
in favor of a morphologically rich affective re-
                                                           tion, dictionary-based methods tend to be robust to
source.
                                                           the classification of sentiment across different do-
In the experiments we carried out on a specific text
                                                           mains, faster and with a higher level of scalability.
genre, namely social media, we have observed that
                                                              For the Italian language, several sentiment dic-
using a threshold to assign polarity classes is ben-
                                                           tionaries, or, using a more general term, affective
eficial, and using the MAL instead of a lemmati-
                                                           lexicons, were published with different levels of
zation step improves the SA performance overall,
                                                           granularity of the annotation and availability to the
in particular due to a better prediction of the neg-
                                                           public, as summarized on the website of the Italian
ative polarity. However, the variation in threshold
                                                           Association of Computational Linguistics1 .
has opposite impact on the prediction of negative
                                                              Sentix (Basile and Nissim, 2013) is one of the
and positive tweets.
                                                           first affective lexicons created for Italian language,
In this paper, we investigate the motivation beyond
                                                           with a first release described in (Basile and Nis-
this polarity imbalance. In particular, we speculate
                                                           sim, 2013), and a second release called Sentix
that this may be due to asymmetries in the data
                                                           2.02 . It provides an automatic alignment between
(e.g., different internal topics), in the lexicon (e.g.,
                                                           SentiWordNet, an automatically-built polarity lex-
different amounts of negative and positive terms),
                                                           icon for English by Baccianella et al. (2010), and
or both, and we provide experiments to better un-
                                                           the Italian portion of MultiWordNet (Pianta et al.,
derstand this result and validate these hypotheses.
                                                           2002). While the first version of Sentix associated
We can therefore summarize as follows our re-
                                                           two independent positive and negative polarity
search questions:
                                                           scores to each word, in Sentix 2.03 all the senses of
  • Is the polarity imbalance due to the topic ad-         each lemma have been collapsed into one entry by
    dressed?                                               means of a weighted average, where the weights
                                                           are proportional to sense frequencies computed on
  • Is the polarity imbalance due to the lexi-             the sense-annotated corpus SemCor (Langone et
    con (i.e., the resources we used, Sentix and           al., 2004). Moreover, the positive and negative po-
    MAL)?                                                  larity scores have been combined to form a single
                                                           polarity score ranging from -1 (totally negative) to
  • Is the polarity imbalance due to both?                 1 (totally positive). Sentix 2.0 includes 41,800 dif-
A further contribution of the paper consists in pro-       ferent lemmas.
viding a statistical method for finding the thresh-           In order to use a lemma-based affective lexi-
old for using the lexicon in SA tasks.                     con such as Sentix, lemmatization is a necessary
   The paper is organized as follows. In the next          step to undertake. In our previous work, we found
section, affective lexicons and the resource MAL           that such intermediate step causes a considerable
are discussed. In section 3, we describe the issues        amount of noise, in the form of lemmatization er-
related to polarity imbalance in lexicon-based ap-             1
                                                               http://www.ai-lc.it/en/affective-
proaches for SA. The fourth section is instead de-         lexica-and-other-resources-for-italian/
                                                             2
voted to discuss the impact on SA of lexicon and               https://github.com/valeriobasile/
                                                           sentixR
to introduce W-MAL. Section 5 discusses how the              3
                                                               https://github.com/valeriobasile/
topics addressed in the text may impact ob SA.             sentixR
Table 1: A tweet with the output of the three lemmatization models where the lemmas are alphabetically
ordered and the errors marked in bold.
      Original   @ANBI Nazionale Allarme idrico. Dopo il Po anche l’Adige è in crisi
                 d’acqua https://t.co/GLTlMNqzEv di @AgriculturaIT
      ISDT       acqua adigire allarme crisi d dopo idrico po - Sentix score: 0.080
      POSTWITA   acqua adigere allarme crisi di dopo idrico po - Sentix score: 0.080
      PARTUT     acquare adigere allarme crisi d dopo idrico po - Sentix score: -0.078


rors such as the ones shown in Table 1 (Vassallo et    Figure 1 show that the F1 score of the classifica-
al., 2019). We terefore built a new resource on top    tion of positive polarity instances increases with
of Sentix, described in the next section.              stricter thresholds, while the F1 score of negative
                                                       polarity instances decreases.
2.1     MAL                                               We postulate two non-mutually exclusive hy-
We proposed the Morphologically-inflected Af-          potheses on the origin of the polarity imbalance,
fective Lexicon in Vassallo et al. (2019, MAL).        namely the effect of lexicon and topic. The affec-
It is an extension of Sentix where the entries as-     tive scores in the lexicon may be biased towards
sociated to polarity scores rather than lemmas are     one end of the polarity spectrum due to a num-
the inflected forms related to each lemma, and the     ber of causes, resulting in skewed classification
polarity scores to be associated to each form are      results. On the other hand, some topics tend to
drawn from the original lemmas in Sentix. The ap-      attract opinions more polarized towards one end
proach consists in linking the lexical items found     of the spectrum than the other (e.g., “war” is an
in tweets with the entries of Sentix 2.0, without      inherently negative topic), therefore the classifica-
the application of an explicit lemmatization step.     tion might be influenced by this intrinsic polariza-
The lexicon is indeed expanded by considering all      tion.
the acceptable forms of its lemmas extracted from
the Morph-It collection of Italian forms (Zanchetta    4   The Effect of Lexicon on SA
and Baroni, 2005). Each form takes the same po-        In order to shed some light on the polarity im-
larity score of the original lemma, but when differ-   balance due to lexicon we applied a weighted
ent lemmas can assume the same form, the arith-        approach to MAL by developing the Weighted
metic mean of their polarity scores is assigned.       Morphologically-inflected Affective Lexicon (W-
The MAL comprises 148,867 forms and all the            MAL). It originates from the intuition that less fre-
items linked to the lemmas of Sentix 2.0 .             quent terms should have a higher impact on the
   Using the MAL we performed a series of ex-          computation of the polarity of the sentence where
periments on the impact of lemmatization on            they occur. This principle stems from the observa-
dictionary-based SA, which showed how the re-          tion that more sought-after terms are often used to
duction in lemmatization errors leads to a better      convey stronger opinions and feelings.
polarity classification performance.                      We therefore computed the relative frequency
                                                       of every item in MAL by using TWITA, a large-
3     Polarity Imbalance in Lexicon-based
                                                       scale corpus of messages from Twitter in the Ital-
      Sentiment Analysis
                                                       ian language (Basile et al., 2018). TWITA is in-
When using an affective lexicon to predict the po-     deed large (covering over 500 million tweets from
larity of natural language sentences, a threshold      2012 to 2018, and the collection is currently on-
must be fixed to translate the numerical scores        going) and domain-agnostic enough to provide a
into discrete classes, e.g., positive, neutral, and    sufficiently representative sample of the distribu-
negative. In Vassallo et al. (2019), we showed         tion of the Italian language words, although spe-
how the variation of such threshold has differ-        cific to one social media platform.
ent, opposite impacts on the accuracy of the clas-        Despite its size, not all the terms from the MAL
sification, using as a benchmark the corpus an-        occur in TWITA: 57.9% of the 148,867 terms oc-
notated with sentiment polarity made available         curring in MAL were found in TWITA, due to the
by the SENTIment POLarity Classification (SEN-         sparseness of particular inflected forms, and to the
TIPOLC) shared task at EVALITA 2016. More              presence of multi-word expressions in the lexicon
precisely, the red dotted lines with label ALL in      (18,661, about 12%) that were not considered for
Figure 1: Results of the polarity classification on SENTIPOLC. The threshold value on the X-axis is
applied to transform the sum of the scores from the lexicon into a positive or negative label.


matching the resources. For comparison, 73,36%             The original Zipf scale is a continuous scale and
of Sentix lemmas were found in TWITA.                   it ranges from 1 (very low frequency) to 6 (very
   Accordingly, the scores of MAL were recalcu-         high frequency) or even 7 (e.g., for very frequent
lated by weighting them with the associated words       words like auxiliar verbs). By computing the Zipf
frequency in TWITA, using the Zipf scale mea-           score of the MAL terms on TWITA, we found
sure (van Heuven et al., 2014). We decided to use       some terms with very low frequencies, resulting
this measure because of its easy understanding and      in negative values because of the logarithmic func-
the short computation timing. Actually, the Zipf        tion. These were re-coded with the minimun Zipf
scale measure is a logarithmic scale based on the       value. The resulting weights in the W-MAL range
well-known Zipf law of word frequency distribu-         from a minimum of -5.16 to a maximum of 5.95
tion (Zipf, 1949). The computation of Zipf values       (the original MAL ranged from -1 to 1). Eventu-
of terms frequencies from TWITA is straightfor-         ally, we decided to keep the terms that were not
ward and essentially equals to the logarithm of the     found in TWITA in the W-MAL with their MAL
absolute frequency scaled down by a multiplica-         original score.
tive factor:
                                                           We initially applied the Zipf scale to MAL po-
                                                      larity scores by simply multiplying the two found
                                                        scores and thus giving more weight to high fre-
                                f (i)
    Zipf (i) = log10  PN
                      
                                               +3
                                                       quent terms. However, using the affective lexi-
                                 f (i)    N
                             i=1
                                       + 10             con with such weighting scheme resulted in a de-
                              106          6
                                                        crease in its polarity classification performance.
where N is the number of tokens in TWITA                We therefore simply reversed the Zipf scale by
(6,644,867), f (i) is the absolute frequency of the     weighting the original scores inversely with re-
i−th token in TWITA, and the sum of the token           spect to their words frequency. By doing so, we
frequencies N
            P
               i=1 f (i) = 6, 906, 070, 053, there-     tested for our speculation of giving more weight
fore:                                                   to low frequent terms. We replicated the polar-
                                                        ity detection experiment on SENTIPOLC. The re-
                      
                                 f (i)
                                                       sults, shown in the green solid lines in Figure 1
   Zipf (i) = log10                                +3   labeled ALL, indicate a better performance over-
                          6, 906.07 + 6.644
all, and a reduced imbalance between the positive
(F1-scores standard deviation across the thresh-
olds of 0.035 with W-MAL vs 0.054 with MAL)
and (especially) the negative polarity class (F1-
scores standard deviation across the thresholds of
0.008 with W-MAL vs 0.042 with MAL).

   To further clarify the effect found on the polar-
ity scores, we show two example tweets in Figure
24 . In the figure, the MAL and W-MAL scores
are included for the highlighted words, along with
the total polarity scores computed with both dic-
tionaries, showing how the final judgment can
change from neutral to polarized (bottom exam-
ple) or switch polarity entirely (top example). In
particular in the top example the scores are associ-
ated with ”confondesse” (to confuse in subjunctive
mood) and to ”diritto” (right), while in the bot-                 Figure 2: A comparison between the scores calcu-
tom example the scores are associated with ”Isti-                 lated for polarized words of a tweet according to
tuto” (school) and to the periphrastic verbal form                MAL and W-MAL in two tweets from the test set.
”viene taciuto” (is silenced). This result confirms
our speculation that negative polarity is expressed               5   The Effect of Topic on Sentiment
with more specific words than positive polarity.                      Analysis
Psychology studies also show that more complex
forms of language were used for expressing crit-                  In order to investigate the interaction between the
icisms rather that positive evaluations (Stewart,                 imbalance of dictionary-based polarity classifica-
2015).                                                            tion and a possible asymmetry in the data (i.e. dif-
We also notice how the F1-score on the negative                   ferent internal topics), we performed such classi-
polarity is generally higher than the one on the                  fication with MAL and W-MAL with the reversed
positive polarity class. This means that the neg-                 Zipf scale on a benchmark with explicitly stated
ative polarity of tweets is better predicted than                 topics. As a matter of fact, the test set of SEN-
the positive polarity by means of the weighted                    TIPOLC is composed of 1,982 Italian tweets, or-
process with the inverse coding. This outcome                     ganized in 496 general i.e. domain-independent
seems to be substantially supported also by the                   tweets, and 1,486 political tweets, obtained by fil-
W-MAL directly proportional performance that                      tering data with specific keywords related to polit-
worked worse than the inverse version in terms of                 ical Italian figures. The results of our experiment
prediction. This trend was also observed across                   are also included in Figure 1 with the GENERAL
most of the results of the SENTIPOLC shared                       and POLITICAL labels.
task, mostly based on supervised models with lex-                    The first observation we draw from this ex-
ical features, further indicating that the vocabulary             periment is that the polarity imbalance is a phe-
of negative sentiments is richer than that of posi-               nomenon restricted to the topic-specific section
tive sentiment.                                                   of the dataset. This confirms the hypothesis that
                                                                  dictionary-based polarity classification is affected
                                                                  by the imbalance issue with the extent to which
                                                                  its topic is specific. In particular, we hypothesize
                                                                  that some topics (such as politics) tend to attract
   4
     The translation of the examples is as follows. For the top   opinions more polarized towards one end of the
example: They would be #thegoodschool if meritocracy were
not confused with “doormatcracy”: the one whereby even a
                                                                  spectrum (the negative one in this case), therefore
right becomes a concession. For the bottom example: @ste-         inducing the observed imbalance.
Giannini #thegoodschool In the rankings of the School there          The second observation is that weighting the
are also TFA qualified teachers with 48 months of service.
Why is it silenced? where steGiannini refers to the Italian       polarity scores in the dictionary based on word fre-
minister for school                                               quency (W-MAL) provides better overall results.
In particular, the F1 scores are better in the topic-       For all the aforementioned reasons, this
specific case, specifically due to a better prediction   work has drawn our attention to the necessity
of the negative polarity. This result reinforces the     of weighting the dictionary-based affective
idea that a polarized topic induces polarity imbal-      lexicons to SA with corpora-based word fre-
ance, and therefore a method to alleviate such im-       quencies. The resource is freely available at
balance (i.e., a weighting scheme) leads to better       https://github.com/valeriobasile/
performance. In our view, a reason for this effect is    sentixR/blob/master/sentix/inst/
that topic-specific messages make use of less fre-       extdata/W-MAL.tsv
quent words on average.                                     In future work, we plan on working on more
                                                         refined weighting strategies, e.g., leveraging the
6   Conclusion and Future Work                           frequency information of word forms in addition
                                                         to lemmas, and taking the topic distribution into
The weighting scheme proposed in this work is            consideration. Reducing the computation load
a promising solution to the polarity imbalance in        is a challenging goal as well (see Prakash et al.
dictionary-based SA. The experiments show that           (2015)). On the other hand, modern transformer-
weighting the polarity scores with word frequen-         based models have reached state-of-the-art results
cies yielded a more precise prediction of the po-        on the task of polarity detection (Polignano et al.,
larized tweets, with lessened bias in the thresh-        2019), although they are far more expensive and
olds for neutral scores. The novel resource here         time- consuming to run. We plan therefore to com-
presented, W-MAL, is an attempt to better charac-        pare the predictions of these systems, and study
terize the most sought-after words, which have an        ways to integrate their respective strengths (i.e.,
impact on the interaction between sentiment and          speed and transparency of the dictionary-based ap-
topic. We believe it also represents a promising         proach vs. the superior prediction capability of the
attempt to control for context-dependency while          deep neural models) in order to boost the overall
using lexicon-based methods for SA.                      performance.
In particular, with this resource we try to give            The present work was originally conceived
voice to the linguistic intuition that the exploita-     in the framework of the AGRItrend project led
tion of a specific form within a message might           by the CREA Research Centre for Agricultural
meaningfully impact on the sentiment expressed           Policies and Bio-economy, aiming at collecting
in the message. For instance, referring to the top       and analyzing social media data for opinions in
example in figure 2, by exploiting the subjunctive       the domain of public policies and agriculture.
mood ”confondesse” of the verb ”confondere” (to          As such, we plan on studying the impact of
confuse), the author joins together with the mean-       the techniques presented in this paper on that
ing of the verb also a sense of doubtfulness and of      particular domain, and observe if the same, or
unreality. This is also improved by the fact that        different, patterns emerge. On a similar line,
this form introduces a clause which is coordinated       so far we conducted experiments on data from
with the clause headed by a verb in conditional          Twitter, which facilitates access to large quantity
mood, i.e. ”sarebbe” (form of to be). This form          of data but restricts the range of text style and
of the verb ”confondere” seems especially ade-           genre found in them.
quate for contexts where a negative polarity is ex-
pressed and less appropriate for other cases. The
use of this specific mood for the verb has there-
fore a meaningful impact on the sentiment ex-            References
pressed. The MAL properly encodes this informa-          Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas-
tion, which may be lost when a lemmatization step           tiani. 2010. SentiWordNet 3.0: An enhanced lexi-
is applied on text and all forms are subsequently           cal resource for sentiment analysis and opinion min-
                                                            ing. In Proceedings of the Seventh conference on
considered as bearing the same meaning without              International Language Resources and Evaluation
further nuances. But the W-MAL does also bet-               (LREC’10). European Languages Resources Asso-
ter: it encodes the probabilistic information about         ciation (ELRA).
how suitable a form is for expressing a particular       Francesco Barbieri, Valerio Basile, Danilo Croce,
sentiment with respect to other available forms in         Malvina Nissim, Nicole Novielli, and Viviana Patti.
a given context.                                           2016. Overview of the Evalita 2016 SENTIment
  POLarity Classification Task. In Proceedings of         George Kingsley Zipf. 1949. Human Behaviour and
  Third Italian Conference on Computational Linguis-        the Principle of Least Effort: an Introduction to Hu-
  tics (CLiC-it 2016) & Fifth Evaluation Campaign of        man Ecology. Addison-Wesley.
  Natural Language Processing and Speech Tools for
  Italian. Final Workshop (EVALITA 2016). CEUR-
  WS.org.

Valerio Basile and Malvina Nissim. 2013. Sentiment
  analysis on Italian tweets. In Proceedings of the 4th
  Workshop on Computational Approaches to Subjec-
  tivity, Sentiment and Social Media Analysis, pages
  100–107.

Valerio Basile, Mirko Lai, and Manuela Sanguinetti.
  2018. Long-term Social Media Data Collection
  at the University of Turin. In Proceedings of the
  Fifth Italian Conference on Computational Linguis-
  tics (CLiC-it 2018). CEUR-WS.org.

Helen Langone, Benjamin R. Haskell, and George A.
  Miller. 2004. Annotating WordNet. In Proceed-
  ings of the Workshop Frontiers in Corpus Annotation
  at HLT-NAACL 2004, pages 63–69. Association for
  Computational Linguistics (ACL).

Emanuele Pianta, Luisa Bentivogli, and Christian Gi-
  rardi. 2002. MultiWordNet: developing an aligned
  multilingual database. In Proceedings of the First
  International Conference on Global WordNet, pages
  293–302.

Marco Polignano, Pierpaolo Basile, Marco de Gem-
 mis, Giovanni Semeraro, and Valerio Basile. 2019.
 ALBERTO: Italian BERT Language Understanding
 Model for NLP Challenging Tasks Based on Tweets.
 In Proceedings of the Sixth Italian Conference on
 Computational Linguistics (CLiC-it 2019). CEUR-
 WS.org.

Saurabh Prakash, T. Chakravarthy, and E. Kaveri.
  2015. Statistically weighted reviews to enhance sen-
  timent classification. Karbala International Journal
  of Modern Science, 1:26–31.

Martyn Stewart. 2015. The language of praise and
 criticism in a student evaluation survey. Studies In
 Educational Evaluation, 45:1–9.

Walter J. B. van Heuven, Pawel Mandera, Emmanuel
  Keuleers, and Marc Brysbaert. 2014. SUBTLEX-
  UK: a new and improved word frequency database
  for British English. The Quarterly Journal of Ex-
  perimental Psychology, 67:6:1176–1190.

Marco Vassallo, Giuliano Gabrieli, Valerio Basile, and
 Cristina Bosco. 2019. The tenuousness of lemmati-
 zation in lexicon-based sentiment analysis. In Pro-
 ceedings of the Sixth Italian Conference on Compu-
 tational Linguistics - CLiC-it 2019. Academia Uni-
 versity Press.

Eros Zanchetta and Marco Baroni. 2005. Morph-it!
  a free corpus-based morphological resource for the
  Italian language. Corpus Linguistics 2005, 1(1).