<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Polarity Imbalance in Lexicon-based Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Vassallo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuliano Gabrieli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>. CREA Research Centre for Agricultural Policies</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bio-economy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Dipartimento di Informatica, Universita` degli Studi di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Polarity imbalance is an asymmetric situation that occurs while using parametric threshold values in lexicon-based Sentiment-Analysis (SA). The variation across the thresholds may have an opposite impact on the prediction of negative and positive polarity. We hypothesize that this may be due to asymmetries in the data or in the lexicon, or both. We carry out therefore experiments for evaluating the effect of lexicon and of the topics addressed in the data. Our experiments are based on a weighted version of the Italian linguistic resource MAL (Morphologicallyinflected Affective Lexicon) by using as weighting corpus TWITA, a large-scale corpus of messages from Twitter in Italian. The novel Weighted-MAL (W-MAL), presented for the first time int this paper, achieved better polarity classification results especially for negative tweets, along with alleviating the aforementioned polarity imbalance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Lo sbilanciamento della polarita`
e` una situazione di asimmetria che si viene
a creare quando si impiegano valori soglia
parametrici nella Sentiment Analysis (SA)
basata su dizionario. La variazione dei
valori soglia puo` avere un impatto opposto
rispetto alla predizione di polarita`
negativa e positiva. Si ipotizza che questo
effetto sia dovuto ad asimmetrie nei dati
o nel dizionario, o in entrambi.
Abbiamo condotto esperimenti per misurare
l’effetto del lessico e degli argomenti
trattati nel nostro dataset. I nostri
esperimenti sono basati su una versione
ponderata della risorsa per l’italiano MAL
(Morphologically-inflected Affective
Lexicon), usando come corpus per la
ponderazione TWITA, un corpus di larga scala di
messaggi da Twitter in italiano. La nuova
risorsa Weighted-MAL (W-MAL),
presentata per la prima volta in questo
articolo, ottiene migliori risultati nella
classificazione della polarita` specialmente, per
i messaggi negativi, oltre ad alleviare il
problema sopracitato di sbilanciamento
della polarita`.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction and Motivation</title>
      <p>Sentiment Analysis (SA) is the task of Natural
Language Processing that aims at extracting
opinions from natural language expressions, e.g.,
reviews or social media posts. The basic approaches
to SA typically fall into one of two categories:
dictionary-based and supervised machine
learning. Methods based on a dictionary make use of
affective lexicons, language resources where each
word or lemma is associated to a score indicating
its affective valence (e.g., polarity). In SA they are
faster than supervised statistical approaches and
require minimal adaptation, unless the resource
is domain-specific, also when applied to
multiple environments with minimal adaptation
overhead. However, they only achieve good
performance for identifying coarse opinion tendencies in
large datasets, since they cannot take into account
the impact of the context on the polarity value
associated to a word.</p>
      <p>Supervised statistical methods, on the other hand,
tend to provide better quality predictions across
benchmarks, due to their better ability to
generalize over individual words and expressions, and
learning higher level features. These models also
show a better ability to adapt to specific domains,</p>
      <p>Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
provided the availability of data suitable for
training.</p>
      <p>In order to access the lexical entries in an
affective dictionary, lemmatization must be performed
on each single word. Unfortunately,
lemmatization is an error-prone process, with potentially
negative impact on the performance of
downstream tasks such as SA. Vassallo et al. (2019)
introduced a novel computational linguistic
resource, namely the Morphologically-inflected
Affective Lexicon (henceforth MAL) in order to
address this issue by avoiding the lemmatization step
in favor of a morphologically rich affective
resource.</p>
      <p>In the experiments we carried out on a specific text
genre, namely social media, we have observed that
using a threshold to assign polarity classes is
beneficial, and using the MAL instead of a
lemmatization step improves the SA performance overall,
in particular due to a better prediction of the
negative polarity. However, the variation in threshold
has opposite impact on the prediction of negative
and positive tweets.</p>
      <p>In this paper, we investigate the motivation beyond
this polarity imbalance. In particular, we speculate
that this may be due to asymmetries in the data
(e.g., different internal topics), in the lexicon (e.g.,
different amounts of negative and positive terms),
or both, and we provide experiments to better
understand this result and validate these hypotheses.
We can therefore summarize as follows our
research questions:</p>
      <p>Is the polarity imbalance due to the topic
addressed?
Is the polarity imbalance due to the
lexicon (i.e., the resources we used, Sentix and
MAL)?</p>
      <p>Is the polarity imbalance due to both?
A further contribution of the paper consists in
providing a statistical method for finding the
threshold for using the lexicon in SA tasks.</p>
      <p>The paper is organized as follows. In the next
section, affective lexicons and the resource MAL
are discussed. In section 3, we describe the issues
related to polarity imbalance in lexicon-based
approaches for SA. The fourth section is instead
devoted to discuss the impact on SA of lexicon and
to introduce W-MAL. Section 5 discusses how the
topics addressed in the text may impact ob SA.
The final section provides conclusive remarks and
some hints about future work.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Affective Lexicons</title>
      <p>
        SA is typically cast as a text classification task,
very often approached by supervised
statistical models among the NLP research community
        <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
        . However, there are several
scenarios where dictionary-based methods are
preferred, including large-scale industry-ready
systems, and domain-specific applications. While
generally less accurate than supervised
classification, dictionary-based methods tend to be robust to
the classification of sentiment across different
domains, faster and with a higher level of scalability.
      </p>
      <p>For the Italian language, several sentiment
dictionaries, or, using a more general term, affective
lexicons, were published with different levels of
granularity of the annotation and availability to the
public, as summarized on the website of the Italian
Association of Computational Linguistics1.</p>
      <p>
        Sentix
        <xref ref-type="bibr" rid="ref3">(Basile and Nissim, 2013)</xref>
        is one of the
first affective lexicons created for Italian language,
with a first release described in
        <xref ref-type="bibr" rid="ref3">(Basile and
Nissim, 2013)</xref>
        , and a second release called Sentix
2.02. It provides an automatic alignment between
SentiWordNet, an automatically-built polarity
lexicon for English by Baccianella et al. (2010), and
the Italian portion of MultiWordNet
        <xref ref-type="bibr" rid="ref6">(Pianta et al.,
2002)</xref>
        . While the first version of Sentix associated
two independent positive and negative polarity
scores to each word, in Sentix 2.03 all the senses of
each lemma have been collapsed into one entry by
means of a weighted average, where the weights
are proportional to sense frequencies computed on
the sense-annotated corpus SemCor
        <xref ref-type="bibr" rid="ref5">(Langone et
al., 2004)</xref>
        . Moreover, the positive and negative
polarity scores have been combined to form a single
polarity score ranging from -1 (totally negative) to
1 (totally positive). Sentix 2.0 includes 41,800
different lemmas.
      </p>
      <p>In order to use a lemma-based affective
lexicon such as Sentix, lemmatization is a necessary
step to undertake. In our previous work, we found
that such intermediate step causes a considerable
amount of noise, in the form of lemmatization
er1http://www.ai-lc.it/en/affectivelexica-and-other-resources-for-italian/
2https://github.com/valeriobasile/
sentixR</p>
      <p>
        3https://github.com/valeriobasile/
sentixR
rors such as the ones shown in Table 1
        <xref ref-type="bibr" rid="ref11">(Vassallo et
al., 2019)</xref>
        . We terefore built a new resource on top
of Sentix, described in the next section.
We proposed the Morphologically-inflected
Affective Lexicon in Vassallo et al. (2019, MAL).
It is an extension of Sentix where the entries
associated to polarity scores rather than lemmas are
the inflected forms related to each lemma, and the
polarity scores to be associated to each form are
drawn from the original lemmas in Sentix. The
approach consists in linking the lexical items found
in tweets with the entries of Sentix 2.0, without
the application of an explicit lemmatization step.
The lexicon is indeed expanded by considering all
the acceptable forms of its lemmas extracted from
the Morph-It collection of Italian forms
        <xref ref-type="bibr" rid="ref12">(Zanchetta
and Baroni, 2005)</xref>
        . Each form takes the same
polarity score of the original lemma, but when
different lemmas can assume the same form, the
arithmetic mean of their polarity scores is assigned.
The MAL comprises 148,867 forms and all the
items linked to the lemmas of Sentix 2.0 .
      </p>
      <p>Using the MAL we performed a series of
experiments on the impact of lemmatization on
dictionary-based SA, which showed how the
reduction in lemmatization errors leads to a better
polarity classification performance.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Polarity Imbalance in Lexicon-based</title>
    </sec>
    <sec id="sec-5">
      <title>Sentiment Analysis</title>
      <p>When using an affective lexicon to predict the
polarity of natural language sentences, a threshold
must be fixed to translate the numerical scores
into discrete classes, e.g., positive, neutral, and
negative. In Vassallo et al. (2019), we showed
how the variation of such threshold has
different, opposite impacts on the accuracy of the
classification, using as a benchmark the corpus
annotated with sentiment polarity made available
by the SENTIment POLarity Classification
(SENTIPOLC) shared task at EVALITA 2016. More
precisely, the red dotted lines with label ALL in
Figure 1 show that the F1 score of the
classification of positive polarity instances increases with
stricter thresholds, while the F1 score of negative
polarity instances decreases.</p>
      <p>We postulate two non-mutually exclusive
hypotheses on the origin of the polarity imbalance,
namely the effect of lexicon and topic. The
affective scores in the lexicon may be biased towards
one end of the polarity spectrum due to a
number of causes, resulting in skewed classification
results. On the other hand, some topics tend to
attract opinions more polarized towards one end
of the spectrum than the other (e.g., “war” is an
inherently negative topic), therefore the
classification might be influenced by this intrinsic
polarization.
4</p>
    </sec>
    <sec id="sec-6">
      <title>The Effect of Lexicon on SA</title>
      <p>In order to shed some light on the polarity
imbalance due to lexicon we applied a weighted
approach to MAL by developing the Weighted
Morphologically-inflected Affective Lexicon
(WMAL). It originates from the intuition that less
frequent terms should have a higher impact on the
computation of the polarity of the sentence where
they occur. This principle stems from the
observation that more sought-after terms are often used to
convey stronger opinions and feelings.</p>
      <p>
        We therefore computed the relative frequency
of every item in MAL by using TWITA, a
largescale corpus of messages from Twitter in the
Italian language
        <xref ref-type="bibr" rid="ref4">(Basile et al., 2018)</xref>
        . TWITA is
indeed large
        <xref ref-type="bibr" rid="ref4">(covering over 500 million tweets from
2012 to 2018, and the collection is currently
ongoing)</xref>
        and domain-agnostic enough to provide a
sufficiently representative sample of the
distribution of the Italian language words, although
specific to one social media platform.
      </p>
      <p>Despite its size, not all the terms from the MAL
occur in TWITA: 57.9% of the 148,867 terms
occurring in MAL were found in TWITA, due to the
sparseness of particular inflected forms, and to the
presence of multi-word expressions in the lexicon
(18,661, about 12%) that were not considered for
matching the resources. For comparison, 73,36%
of Sentix lemmas were found in TWITA.</p>
      <p>
        Accordingly, the scores of MAL were
recalculated by weighting them with the associated words
frequency in TWITA, using the Zipf scale
measure
        <xref ref-type="bibr" rid="ref10">(van Heuven et al., 2014)</xref>
        . We decided to use
this measure because of its easy understanding and
the short computation timing. Actually, the Zipf
scale measure is a logarithmic scale based on the
well-known Zipf law of word frequency
distribution
        <xref ref-type="bibr" rid="ref13">(Zipf, 1949)</xref>
        . The computation of Zipf values
of terms frequencies from TWITA is
straightforward and essentially equals to the logarithm of the
absolute frequency scaled down by a
multiplicative factor:
0
      </p>
      <p>1
f (i)
Zipf (i) = log10 B@ PiN=1 f(i) + 1N06 A</p>
      <p>C + 3
106
where N is the number of tokens in TWITA
(6,644,867), f (i) is the absolute frequency of the
i th token in TWITA, and the sum of the token
frequencies PiN=1 f (i) = 6; 906; 070; 053,
therefore:</p>
      <p>Zipf (i) = log10</p>
      <p>The original Zipf scale is a continuous scale and
it ranges from 1 (very low frequency) to 6 (very
high frequency) or even 7 (e.g., for very frequent
words like auxiliar verbs). By computing the Zipf
score of the MAL terms on TWITA, we found
some terms with very low frequencies, resulting
in negative values because of the logarithmic
function. These were re-coded with the minimun Zipf
value. The resulting weights in the W-MAL range
from a minimum of -5.16 to a maximum of 5.95
(the original MAL ranged from -1 to 1).
Eventually, we decided to keep the terms that were not
found in TWITA in the W-MAL with their MAL
original score.</p>
      <p>We initially applied the Zipf scale to MAL
polarity scores by simply multiplying the two found
scores and thus giving more weight to high
frequent terms. However, using the affective
lexicon with such weighting scheme resulted in a
decrease in its polarity classification performance.
We therefore simply reversed the Zipf scale by
weighting the original scores inversely with
respect to their words frequency. By doing so, we
tested for our speculation of giving more weight
to low frequent terms. We replicated the
polarity detection experiment on SENTIPOLC. The
results, shown in the green solid lines in Figure 1
labeled ALL, indicate a better performance
overall, and a reduced imbalance between the positive
(F1-scores standard deviation across the
thresholds of 0.035 with W-MAL vs 0.054 with MAL)
and (especially) the negative polarity class
(F1scores standard deviation across the thresholds of
0.008 with W-MAL vs 0.042 with MAL).</p>
      <p>
        To further clarify the effect found on the
polarity scores, we show two example tweets in Figure
24. In the figure, the MAL and W-MAL scores
are included for the highlighted words, along with
the total polarity scores computed with both
dictionaries, showing how the final judgment can
change from neutral to polarized (bottom
example) or switch polarity entirely (top example). In
particular in the top example the scores are
associated with ”confondesse” (to confuse in subjunctive
mood) and to ”diritto” (right), while in the
bottom example the scores are associated with
”Istituto” (school) and to the periphrastic verbal form
”viene taciuto” (is silenced). This result confirms
our speculation that negative polarity is expressed
with more specific words than positive polarity.
Psychology studies also show that more complex
forms of language were used for expressing
criticisms rather that positive evaluations
        <xref ref-type="bibr" rid="ref9">(Stewart,
2015)</xref>
        .
      </p>
      <p>We also notice how the F1-score on the negative
polarity is generally higher than the one on the
positive polarity class. This means that the
negative polarity of tweets is better predicted than
the positive polarity by means of the weighted
process with the inverse coding. This outcome
seems to be substantially supported also by the
W-MAL directly proportional performance that
worked worse than the inverse version in terms of
prediction. This trend was also observed across
most of the results of the SENTIPOLC shared
task, mostly based on supervised models with
lexical features, further indicating that the vocabulary
of negative sentiments is richer than that of
positive sentiment.</p>
      <p>4The translation of the examples is as follows. For the top
example: They would be #thegoodschool if meritocracy were
not confused with “doormatcracy”: the one whereby even a
right becomes a concession. For the bottom example:
@steGiannini #thegoodschool In the rankings of the School there
are also TFA qualified teachers with 48 months of service.
Why is it silenced? where steGiannini refers to the Italian
minister for school</p>
    </sec>
    <sec id="sec-7">
      <title>The Effect of Topic on Sentiment</title>
    </sec>
    <sec id="sec-8">
      <title>Analysis</title>
      <p>In order to investigate the interaction between the
imbalance of dictionary-based polarity
classification and a possible asymmetry in the data (i.e.
different internal topics), we performed such
classification with MAL and W-MAL with the reversed
Zipf scale on a benchmark with explicitly stated
topics. As a matter of fact, the test set of
SENTIPOLC is composed of 1,982 Italian tweets,
organized in 496 general i.e. domain-independent
tweets, and 1,486 political tweets, obtained by
filtering data with specific keywords related to
political Italian figures. The results of our experiment
are also included in Figure 1 with the GENERAL
and POLITICAL labels.</p>
      <p>The first observation we draw from this
experiment is that the polarity imbalance is a
phenomenon restricted to the topic-specific section
of the dataset. This confirms the hypothesis that
dictionary-based polarity classification is affected
by the imbalance issue with the extent to which
its topic is specific. In particular, we hypothesize
that some topics (such as politics) tend to attract
opinions more polarized towards one end of the
spectrum (the negative one in this case), therefore
inducing the observed imbalance.</p>
      <p>The second observation is that weighting the
polarity scores in the dictionary based on word
frequency (W-MAL) provides better overall results.
In particular, the F1 scores are better in the
topicspecific case, specifically due to a better prediction
of the negative polarity. This result reinforces the
idea that a polarized topic induces polarity
imbalance, and therefore a method to alleviate such
imbalance (i.e., a weighting scheme) leads to better
performance. In our view, a reason for this effect is
that topic-specific messages make use of less
frequent words on average.
6</p>
    </sec>
    <sec id="sec-9">
      <title>Conclusion and Future Work</title>
      <p>The weighting scheme proposed in this work is
a promising solution to the polarity imbalance in
dictionary-based SA. The experiments show that
weighting the polarity scores with word
frequencies yielded a more precise prediction of the
polarized tweets, with lessened bias in the
thresholds for neutral scores. The novel resource here
presented, W-MAL, is an attempt to better
characterize the most sought-after words, which have an
impact on the interaction between sentiment and
topic. We believe it also represents a promising
attempt to control for context-dependency while
using lexicon-based methods for SA.</p>
      <p>In particular, with this resource we try to give
voice to the linguistic intuition that the
exploitation of a specific form within a message might
meaningfully impact on the sentiment expressed
in the message. For instance, referring to the top
example in figure 2, by exploiting the subjunctive
mood ”confondesse” of the verb ”confondere” (to
confuse), the author joins together with the
meaning of the verb also a sense of doubtfulness and of
unreality. This is also improved by the fact that
this form introduces a clause which is coordinated
with the clause headed by a verb in conditional
mood, i.e. ”sarebbe” (form of to be). This form
of the verb ”confondere” seems especially
adequate for contexts where a negative polarity is
expressed and less appropriate for other cases. The
use of this specific mood for the verb has
therefore a meaningful impact on the sentiment
expressed. The MAL properly encodes this
information, which may be lost when a lemmatization step
is applied on text and all forms are subsequently
considered as bearing the same meaning without
further nuances. But the W-MAL does also
better: it encodes the probabilistic information about
how suitable a form is for expressing a particular
sentiment with respect to other available forms in
a given context.</p>
      <p>For all the aforementioned reasons, this
work has drawn our attention to the necessity
of weighting the dictionary-based affective
lexicons to SA with corpora-based word
frequencies. The resource is freely available at
https://github.com/valeriobasile/
sentixR/blob/master/sentix/inst/
extdata/W-MAL.tsv</p>
      <p>
        In future work, we plan on working on more
refined weighting strategies, e.g., leveraging the
frequency information of word forms in addition
to lemmas, and taking the topic distribution into
consideration. Reducing the computation load
is a challenging goal as well (see Prakash et al.
(2015)). On the other hand, modern
transformerbased models have reached state-of-the-art results
on the task of polarity detection
        <xref ref-type="bibr" rid="ref7">(Polignano et al.,
2019)</xref>
        , although they are far more expensive and
time- consuming to run. We plan therefore to
compare the predictions of these systems, and study
ways to integrate their respective strengths (i.e.,
speed and transparency of the dictionary-based
approach vs. the superior prediction capability of the
deep neural models) in order to boost the overall
performance.
      </p>
      <p>The present work was originally conceived
in the framework of the AGRItrend project led
by the CREA Research Centre for Agricultural
Policies and Bio-economy, aiming at collecting
and analyzing social media data for opinions in
the domain of public policies and agriculture.
As such, we plan on studying the impact of
the techniques presented in this paper on that
particular domain, and observe if the same, or
different, patterns emerge. On a similar line,
so far we conducted experiments on data from
Twitter, which facilitates access to large quantity
of data but restricts the range of text style and
genre found in them.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Baccianella</surname>
          </string-name>
          , Andrea Esuli, and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</article-title>
          .
          <source>In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)</source>
          .
          <source>European Languages Resources Association (ELRA).</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the Evalita 2016 SENTIment POLarity Classification Task</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ). CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Sentiment analysis on Italian tweets</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Mirko Lai, and
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Long-term Social Media Data Collection at the University of Turin</article-title>
          .
          <source>In Proceedings of the Fifth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2018</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Helen</given-names>
            <surname>Langone</surname>
          </string-name>
          ,
          <string-name>
            <surname>Benjamin R. Haskell</surname>
          </string-name>
          , and
          <string-name>
            <surname>George</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Annotating WordNet</article-title>
          .
          <source>In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL</source>
          <year>2004</year>
          , pages
          <fpage>63</fpage>
          -
          <lpage>69</lpage>
          .
          <article-title>Association for Computational Linguistics (ACL).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Pianta</surname>
          </string-name>
          , Luisa Bentivogli, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Girardi</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>MultiWordNet: developing an aligned multilingual database</article-title>
          .
          <source>In Proceedings of the First International Conference on Global WordNet</source>
          , pages
          <fpage>293</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Polignano</surname>
          </string-name>
          , Pierpaolo Basile, Marco de Gemmis, Giovanni Semeraro, and
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ). CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Saurabh</given-names>
            <surname>Prakash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakravarthy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaveri</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Statistically weighted reviews to enhance sentiment classification</article-title>
          .
          <source>Karbala International Journal of Modern Science</source>
          ,
          <volume>1</volume>
          :
          <fpage>26</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Martyn</given-names>
            <surname>Stewart</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The language of praise and criticism in a student evaluation survey</article-title>
          .
          <source>Studies In Educational Evaluation</source>
          ,
          <volume>45</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Walter J. B. van Heuven</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pawel Mandera</surname>
            , Emmanuel Keuleers, and
            <given-names>Marc</given-names>
          </string-name>
          <string-name>
            <surname>Brysbaert</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>SUBTLEXUK: a new and improved word frequency database for British English</article-title>
          .
          <source>The Quarterly Journal of Experimental Psychology</source>
          ,
          <volume>67</volume>
          :6:
          <fpage>1176</fpage>
          -
          <lpage>1190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Vassallo</surname>
          </string-name>
          , Giuliano Gabrieli, Valerio Basile, and
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The tenuousness of lemmatization in lexicon-based sentiment analysis</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational</source>
          Linguistics - CLiC-it
          <year>2019</year>
          . Academia University Press.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Eros</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Morph-it! a free corpus-based morphological resource for the Italian language</article-title>
          .
          <source>Corpus Linguistics</source>
          <year>2005</year>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>George</given-names>
            <surname>Kingsley Zipf</surname>
          </string-name>
          .
          <year>1949</year>
          .
          <article-title>Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>