=Paper=
{{Paper
|id=Vol-2006/paper072
|storemode=property
|title=Towards an Italian Lexicon for Polarity Classification (polarITA): a Comparative Analysis of Lexical Resources for Sentiment Analysis
|pdfUrl=https://ceur-ws.org/Vol-2006/paper072.pdf
|volume=Vol-2006
|authors=Delia Irazú Hernández Farías,Irene Laganà,Viviana Patti,Cristina Bosco
|dblpUrl=https://dblp.org/rec/conf/clic-it/FariasLPB17
}}
==Towards an Italian Lexicon for Polarity Classification (polarITA): a Comparative Analysis of Lexical Resources for Sentiment Analysis==
Towards an Italian Lexicon for Polarity Classification (polarITA):
a Comparative Analysis of Lexical Resources for Sentiment Analysis
Delia Irazú Hernández Farı́as Irene Laganà Viviana Patti, Cristina Bosco
PRHLT Research Center Dipartimento di Dipartimento di Informatica
Universitat Politècnica de València Studi Umanistici Università di Torino
dhernandez1@dsic.upv.es Università di Pavia {patti,bosco}@di.unito.it
irene.lagana01@universitadipavia.it
Abstract piece of text (Mohammad, 2016), is currently
among the most widely investigated topics within
English. The paper describes a prelimi- NLP. Overall, the approaches for addressing such
nary study for the development of a novel task are mainly based on techniques ranging from
lexicon for Italian sentiment analysis, i.e. traditional machine learning to novel deep learn-
where words are associated with polarity ing ones, as it can be seen also in the context of
values. Given the influence of sentiment shared tasks on sentiment polarity classification in
lexica on the performance of sentiment Twitter recently proposed, respectively for English
analysis systems, a methodology based on (Nakov et al., 2016) and Italian (Barbieri et al.,
the detection and classification of errors 2016), within the SemEval and Evalita periodical
in existing lexical resources is proposed evaluation campaigns. Moreover, the detection of
and an extrinsic evaluation of the impact specific words associated with polarity values or
of such errors is applied. The final aim is emotions has been considered as a powerful in-
to build a novel resource from the filtering formation source for identifying the sentiment be-
applied to the existing lexical resources, hind a text. Among the resources which are more
which can integrate them with missing lex- commonly exploited by SA systems for perform-
ical entries and more reliable associations ing their task there are therefore sentiment lexica,
of polarity with entries. i.e., lists of words with associated polarity values
Italiano. L’articolo descrive uno studio or emotions.
preliminare per lo sviluppo di una nuova Several techniques have been applied for the de-
risorsa lessicale per la sentiment analysis velopment of lexical resources for SA: they can be
in italiano, i.e. dove alle parole sono as- built from scratch, manually or automatically, or
sociati valori di polarità. Data l’influenza extracted from corpora (Nissim and Patti, 2017).
dei lessici di sentiment sulle performance Nevertheless, the vast majority of these resources
dei sistemi di sentiment analysis, viene are written in English, and a lack of resources cur-
proposta una metodologia basata sulla ril- rently features several other languages. One of
evazione e classificazione degli errori pre- the most commonly applied alternatives for hav-
senti nei lessici attualmente disponibili ed ing resources in language other than English is
una valutazione estrinseca dell’impatto di to automatically translate some available English
tali errori sui sistemi. L’obiettivo finale lexicon via tools such as Google translate1 . But
è ottenere un nuovo lessico grazie ad un there are many constraints involved in this kind
filtraggio applicato alle risorse lessicali of process, such as handling synonyms and pol-
disponibili, e a un’integrazione con le voci ysemous words, multi-word expressions, but also
lessicali mancanti, ottenendo una mag- to deal with cultural differences between source
giore affidabilità nell’associazione delle and target language. Apart from this, possible
polarità alle voci. variations of polarity across different contexts and
languages should be carefully taken into account,
while such approaches rely somehow on the as-
1 Introduction sumption that affective norms related to sentiment
are stable across languages.
Sentiment Analysis (SA), described as the task
1
of automatically determine the polarity in a given https://translate.google.com/
In this paper we are interested into evaluate the expected by a human annotator or also if there are
reliability of the lexical resources currently avail- other entries in the tweet that should appear as po-
able for Italian SA and, providing that the most of larized but are not in the lexicons).
them are obtained by translation, we will mainly We take as starting point the SA lexica ex-
focus on the reliability of automatically translating ploited by (Hernández Farı́as et al., 2014) in the
English resources to Italian language. For doing IRADABE system at Evalita2014’s SENTIPOLC
so, we carried out a methodology involving differ- (Basile et al., 2014). The same resources where
ent facets. Our final aim is to develop a new SA used also in the upgraded system that participated
resource for Italian, which comprises pre-existing at the same task in Evalita2016 (Buscaldi and
translated lexical entries enriched with the man- Hernández Farı́as, 2016).
ual correction of the polarity assigned, as resulting In those works the lexicon AFINN, (Nielsen,
from our analysis, but also includes entries which 2011), the one developed by Hu and Liu (hence-
are featured by a polarity but are missing in the forth HaL) (Hu and Liu, 2004), and SentiWord-
available lexica. Net (SWN) (Baccianella et al., 2010) were indeed
The paper is organized as follows. In the automatically translated to Italian, to exploit ob-
next section, we describe our methodology which tained information as features in their supervised
mainly consists in three steps: the selection of a system, but no specific evaluation or refining of
sample of tweets from an Italian sentiment cor- them was performed. In the present paper we ex-
pus and exploited as part of the gold standard in tend our selection by considering, beyond these
the Sentipolc@Evalita2016 shared task (Stranisci three, a further resource, i.e. Sentix (Basile and
et al., 2016; Barbieri et al., 2016); automatic ex- Nissim, 2013) (see Sec. 2.1) which has been de-
traction of the lexical entries polarized according veloped following a semantics oriented strategy
to a set of benchmark sentiment lexica for Italian; (see Sec. 2.1). Henceforth, we will use the expres-
the analysis of these entries and the comparison sion benchmark lexica) for referring to the four re-
with those expected by a human judge. Section sources. As reference corpus, we considered, in-
three shows instead an extrinsic evaluation of the stead, TwBuonaScuola (Stranisci et al., 2016), an
impact of the detected errors on the results of the Italian dataset manually annotated for sentiment
SA system. Some hints about future development polarity and irony, focused on the on-line debate
of this research are given in the conclusion. regarding a controversial Italian political reform,
which is part of the gold standard provided for
2 Our Methodology the Sentipolc shared task (Barbieri et al., 2016) at
Evalita 2016 (Basile et al., 2017).
Given the relevance of affective lexica in SA and Our methodology, whose results are shown in
related tasks, our major aims in the current re- Sec. 2.2, includes the steps described below.
search are to detect the limits of the currently Given a random selection of 500 tweets from
available lexical resources for Italian and to ex- TwBuonaScuola (henceforth ItalianTweets) in-
plore the possibility to develop a novel resource cluding 2,706 different words, we manually eval-
by correcting and extending them. In this paper uated the coverage of the benchmark lexica for
we focus in particular on the detection of the de- the words included in these tweets. In particular,
ficiencies of existing resources and on their mo- for each tweet we extracted automatically all the
tivations. Our methodology consists therefore in: words which are included in each of the bench-
(i) selecting of a sample of tweets from an Ital- mark lexica and its associated polarity.
ian sentiment corpus featured by political contents Then, for each tweets belonging to ItalianTweets,
(Stranisci et al., 2016) and exploited as part of we manually checked the obtained lists of words,
the gold standard in the Sentipolc@Evalita2016 considered in the context of the tweet, with a two-
shared task (Barbieri et al., 2016), with sentiment fold objective:
polarity annotation at the tweet level; (ii) automat-
(i) To deduce which words in the benchmark
ically extracting the lexical entries polarized ac-
lexica have a wrong polarity associated;
cording to a set of benchmark sentiment lexica for
Italian and (iii) manually checking the results for (ii) To identify those words that express certain
each expected lexical entry in the context of the polarity in the corpus but are not included in
whole tweet (i.e. if the polarity of the entry is that the benchmark lexica.
2.1 Sentiment Analysis Resources two sets of Italian words, the first composed of
In this section we describe the benchmark lexica. 277,000 entries with associated inflexion. How-
AFINN (Nielsen, 2011) is an English lexicon ever the lexicon is not publicly available.
composed of 2,477 words and 15 multi-word ex- Finally let us mention ItEM (Passaro et al., 2015),
pressions. Each entry is associated with a score an Italian emotive lexicon which aims at offering
which varies from -5 to +5 in order to respectively information about affect expressed in text accord-
introduce negative and positive polarity. The start- ing to finer levels of granularity, i.e. referring
ing point for the development of this resource is not simply to positive or negative sentiment po-
a list of obscene words and some positive words; larity but to emotional categories. In ItEM each
then the lexicon has been extended with words word is tagged with an emotional label from the
from a corpus of tweets and other lists of words height basic emotions of the Plutchik’s psycholog-
from Urban Dictionary2 for representing entries ical model (Plutchik, 1980).
typical of Internet language (e.g. “WTF” and Several scholars are devoting their efforts to the
“LOL”). After the manual annotation of the en- development of resources for other languages, by
tries the lexicon has been evaluated based on a cor- applying translation or other methodologies. Let
pus of tweets manually annotated for SA. us cite e.g. FEEL (Abdaoui et al., 2017), a French
HaL, (Hu and Liu, 2004), has been built within lexicon where words are associated with polarity
a project for developing methods to deal with and emotions obtained thanks to the application of
opinions expressed in reviews about various kinds translation tools to NRC-EmoLEx3 and a manual
of goods. A group of 30 adjectives featured by a validation of results.
single and stable polarity and manually annotated
has been expanded by including the words which 2.2 Qualitative Analysis of Benchmark Lexica
in WordNet’s synsets are synonyms or antonyms In order to detect the coverage and correctness of
of these seeds, providing that synonyms are fea- each benchmark lexicon, we selected from our ref-
tured by the same polarity and antonyms by the erence sample corpus the list of words that accord-
opposite one. The lexicon currently includes 6,800 ing to a human judge are featured by some affec-
entries classified as positive or negative. tive value in the context of the tweet where they
SentiWordNet 3.0 (Baccianella et al., 2010) is appear. Then, for each entry of this list and for
among the larger and more used resources ex- each benchmark lexicon, we observed if the word
ploited for SA. The main goal of the SentiWord- is represented in the resource and featured by the
Net project is the fully automated annotation of same polarity.
the polarity of the WordNet’s synsets using scores Given the preliminary nature of this investigation
that vary from 0.0 to 1.0 to each of the three ba- only a couple of researchers have been involved
sic polarity values (positive, negative, neutral) in in the task. Moreover, a further limit of our cur-
order to obtain 1 as the sum of them. By contrast rent research approach depends on the reference
with the other resources, SentiWordNet takes into to a given context (that determined by our sample
account different possible senses for each word. corpus); issues related to the context will be ac-
As far as Italian is concerned, only a few re- counted for in future investigations.
sources exist, such as Sentix (Basile and Nissim, We observed different coverages of the bench-
2013) and SABRINA (Borzı̀ et al., 2015). Sen- mark lexica on our Twitter corpus, first of all in
tix is the result of the alignment of four seman- terms of numbers of affective words occurring in
tic database, namely WordNet (Fellbaum, 1998), the tweets for each lexicon. The full vocabulary of
SentiWordNet, MultiWordNet (Pianta et al., 2002) the tweets is composed of 2,706 different words.
and Babelnet (Navigli and Ponzetto, 2012). The Only some of these words are featured by some
methodology consists in transferring to the Italian affective value, and focusing on them only we ob-
section of WordNet the information about polarity served the following occurrences: 160 words in
encoded in the English SentiWordNet’s synsets, AFINN, 190 words in HaL, 302 words in SWN
thus aligning Italian and English synsets. and 551 in Sentix. These word sets are partially
The development of SABRINA instead is based overlapped, since 69 words are included in all the
on the application of a prior polarity method on
3
http://www.saifmohammad.com/WebPages/
2
http://www.urbandictionary.com lexicons.html
lexica. “school” or “institution”, is aligned with “prison”
and “house/prison”, with a negative polarity which
Error is not appropriate for the Italian word.
Resource
(i) (ii) (iii) (iv)
Several errors could be probably avoided in
AFINN 1.2 2.5 16.8 8.7 the transition among languages by applying a
HaL 1.5 1.0 12.6 12.6 pre-processing including Part of Speech tagging
SWN 5.9 1.6 15.5 13.2 and considering the grammatical category of the
Sentix 5.9 2.1 15.2 16.6 source and target terms. See for instance, the
word tagliando (cutting) that occurs in the cor-
Table 1: Distribution of different errors in the pus as a Verb and in the benchmark lexica is in-
benchmark lexica (percentage wrt the coverage of stead aligned with the corresponding noun with
the lexicon). the meaning of voucher/coupon. This motivates
our decision about the attribution of PoS tags to
The total amount of words missing or with an the words in the first nucleus of a novel resource
attributed erroneous polarity in the benchmark lex- obtained by extending and correcting the existing
ica is 388. As far as the erroneous polarization ones. The overall impression is that, a manual
concerns, as summarized in Table 1, these words check, even is a very time-consuming task, is al-
are featured by four different kinds of errors: (i) a ways necessary and unavoidable, both when the
positive word is annotated as negative; (ii) a neg- new lexicon is obtained by translation, and when
ative word is annotated as positive; (iii) a neutral4 it is obtained relying on synset alignment.
word is annotated as positive; and (iv) a neutral
word is annotated as negative. The values are ex- 3 Lost in Translation: Impact of the
pressed in percentage with respect to the coverage Errors
of the lexica. As far as the distribution of errors
in the four classes, they are for all lexica prevail- The methodology even if applied on a small set of
ingly distributed in the last two classes, i.e. iii and tweets and based on a manual check of the bench-
iv, laying foundation for the hypothesis that in the mark lexica, confirms the hypothesis that many di-
automatic transition between English and Italian rections can be followed to improve the quality of
several non (clearly) polarized Italian words were existing lexical resources. The first result of this
instead polarized. preliminary analysis is the collection of a list of
Nevertheless, observing Table 1, we can see words with associated polarity which will be the
also that all the lexica are featured by very simi- nucleus of the novel resource, i.e. polarITA. Each
lar amounts of errors, regardless of the methodol- of the words in polarITA has been annotated with
ogy applied for their development (i.e. translation an overall polarity value (i.e., positive, negative,
or extraction from semantic databases). Several or none), and its corresponding Part-Of-Speech
errors, in particular for what concerns the polar- (POS) label. Table 2 summarizes the distribution
ity associated to specific words, can be generated of the words in polarITA in terms of polarity and
during translation, and a portion of them is there- POS labels.
fore motivated by the application of translation Experiments on a larger corpus and a quantita-
tools mainly because they do not consider context tive analysis based on a more formal classifica-
where each word occurs. But observing the results tion of errors is needed for the development of a
extracted from Sentix, which is not obtained sim- fully developed reliable lexical resource, together
ply by translation, and weighting the larger cov- with an in-depth investigation of the relevance of
erage that features this resource, we can see that context in the attribution of polarity, which is a
errors occurs in a percentage that positively com- very important issue. A comparison of the re-
pares with that of the other resources. In this case sults that a given SA engine exploiting features ex-
the problem probably depends on misalignment tracted from sentiment lexica, for instance IRAD-
of synsets for different languages. For example, ABE (Hernández Farı́as et al., 2014; Buscaldi and
the Italian word “istituto”, whose meaning can be Hernández Farı́as, 2016), obtains using each of the
4
benchmark lexica and using polarITA is planned
We considered neutral a word which is featured by a po-
larity which may vary across contexts, indicated by None in as future work for the evaluation of the novel lex-
Table 2. icon, which is not currently suitable because the
limited size of our reference corpus and the conse- Total words 388
quent partial coverage of errors. Polarity
Considering the current preliminary stage of de- Positive Negative None
velopment of polarITA, we tried an extrinsic eval- 225 140 23
uation for detecting the impact on the performance Part-of-speech labels
of SA systems of the errors currently featuring the Adjective 84
benchmark lexica and corrected in the novel lex- Adjective/Noun 1
icon. We compared the words which are miss- Adjective/Pronoun 2
ing or assigned to erroneous polarity in the bench- Adverb 16
mark lexica with the Italian words more com- Interjection 3
monly used and understood by native speakers, Noun 187
whose collection is available in the Vocabolario Noun/Adverb 1
di base della lingua italiana (vocItalian)5 recently Preposition 1
newly released. Like the first version of this re- Pronoun 1
source, published in 1980, (De Mauro, 1980), it Verb 92
includes three word classes: 2,999 High Usage
vocItalian
words (HU), 2,231 High Availability words (HA)
FO 187
and 1,979 Foundational words (FO).
HU 86
In polarITA we collected until now 284 words of
HA 11
the vocItalian, whose distribution across the three
classes is shown in Table 2. Among the words in
Table 2: Distribution of the words in polarITA in
the FO category we found “bene” (good), “men-
terms of polarity, POS labels, and vocItalian.
tire” (lie), and “giustizia” (justice). While words
like “assassino” (killer), “preoccupato” (worried),
and “entusiasta” (enthusiastic) are part of the HU ploited as a starting point for developing the novel
category. Finally, in the HA category it is possi- resource.
ble to find words such as “dannoso” (harmful) and As future work, we are planning to extend the
“emozionante” (exciting). resource in several directions: by investigating
This analysis suggests some hints for further in- multi-word expressions, extending the coverage to
vestigation, showing that the failures of lexica cur- a larger corpus, exploring the impact of figurative
rently available for Italian SA affect words very language devices such as irony and sarcasm in the
commonly used in communication and therefore use of certain polarized words (Hernández Farı́as
the improvement of these resources may hopefully et al., 2016). Moreover, our future effort will be
result in an advancement for SA and related tasks. oriented to the automatization of a larger part of
the methodology and its application to other lan-
4 Conclusions and Future Work guages currently under resourced.
In this paper we propose the preliminary investiga- Acknowledgements
tion about a methodology for the development of
a novel lexical resource for Italian SA, namely po- C. Bosco and V. Patti were partially funded by Pro-
larITA, which takes advantage of the analysis and getto di Ateneo/CSP 2016 (Immigrants, Hate and
filtering of errors occurring in the available lexi- Prejudice in Social Media, S1618 L2 BOSC 01)
cal resources. We carried out a manual analysis and by Fondazione CRT (Hate Speech and Social
of a set of tweets for determining the reliability of Media, 2016.0688).
sentiment-related lexica, showing that, even if the
transfer of lexical information between two differ-
ent languages is a common practice to address the References
lack of resources, information related to sentiment Amine Abdaoui, Jérôme Azé, Sandra Bringay, and Pas-
is lost during it. The identified errors are then ex- cal Poncelet. 2017. FEEL: a French Expanded
Emotion Lexicon. Language Resources and Eval-
5
https://www.internazionale.it/ uation, 51:833–855, September.
opinione/tullio-de-mauro/2016/12/23/il-
nuovo-vocabolario-di-base-della-lingua- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas-
italiana tiani. 2010. SentiWordNet 3.0: An Enhanced Lex-
ical Resource for Sentiment Analysis and Opinion Minqing Hu and Bing Liu. 2004. Mining and summa-
Mining. In Proceedings of the Seventh International rizing customer reviews. In Proceedings of the Tenth
Conference on Language Resources and Evaluation ACM SIGKDD International Conference on Knowl-
(LREC’10), pages 2200–2204, Valletta, Malta. Eu- edge Discovery and Data Mining, KDD ’04, pages
ropean Language Resources Association (ELRA). 168–177, New York, NY, USA. ACM.
Francesco Barbieri, Valerio Basile, Danilo Croce, Saif M. Mohammad. 2016. Sentiment Analysis:
Malvina Nissim, Nicole Novielli, and Viviana Patti. Detecting Valence, Emotions, and Other Affectual
2016. Overview of the EVALITA 2016 SENTiment States from Text. In Herb Meiselman, editor, Emo-
POLarity Classification Task. In Basile, Cutugno, tion Measurement. Elsevier.
Nissim, Patti, and Sprugnoli, editors, Proceedings of
Third Italian Conference on Computational Linguis- Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio
tics (CLiC-it 2016) & Fifth Evaluation Campaign Sebastiani, and Veselin Stoyanov. 2016. SemEval-
of Natural Language Processing and Speech Tools 2016 Task 4: Sentiment Analysis in Twitter. In
for Italian. Final Workshop (EVALITA 2016). CEUR Proceedings of the 10th International Workshop on
Workshop Proceedings. Semantic Evaluation (SemEval-2016), pages 1–18,
San Diego, California.
Valerio Basile and Malvina Nissim. 2013. Senti-
ment Analysis on Italian Tweets. In Proceedings Roberto Navigli and Simone Paolo Ponzetto. 2012.
of the 4th Workshop on Computational Approaches BabelNet: The Automatic Construction, Evaluation
to Subjectivity, Sentiment and Social Media Analy- and Application of a Wide-Coverage Multilingual
sis, pages 100–107, Atlanta, USA. Association for Semantic Network. Artificial Intelligence, 193:217–
Computational Linguistics. 250.
Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi- Finn Årup Nielsen. 2011. A new ANEW: evaluation of
viana Patti, and Paolo Rosso. 2014. Overview of a word list for sentiment analysis in microblogs. In
the Evalita 2014 SENTIment POLarity Classifica- Proceedings of the ESWC2011 Workshop on ’Mak-
tion Task. In Proceedings of the 4th evaluation cam- ing Sense of Microposts’: Big things come in small
paign of Natural Language Processing and Speech packages, volume 718 of CEUR Workshop Pro-
tools for Italian (EVALITA 2014), Pisa, Italy. ceedings, pages 93–98, Heraklion, Crete, Greece.
CEUR-WS.org.
Pierpaolo Basile, Francesco Cutugno, Malvina Nissim,
Viviana Patti, and Rachele Sprugnoli. 2017. Evalita Malvina Nissim and Viviana Patti. 2017. Semantic
goes social: Tasks, data, and community at the 2016 aspects in sentiment analysis. In Federico Alberto
edition. IJCoL - Italian Journal of Computational Pozzi, Elisabetta Fersini, Enza Messina, and Bing
Linguistics, 3(1):93–127. Liu, editors, Sentiment Analysis in Social Networks,
Valeria Borzı̀, Simone Faro, Arianna Pavone, and pages 31–48. Morgan Kaufmann, Boston.
Sabrina Sansone. 2015. Prior Polarity Lexi-
Lucia Passaro, Laura Pollacci, and Alessandro Lenci.
cal Resources for the Italian Language. CoRR,
2015. ItEM: A Vector Space Model to Bootstrap an
abs/1507.00133.
Italian Emotive Lexicon. volume II.
Davide Buscaldi and Delia Irazú Hernández Farı́as.
2016. IRADABE2: Lexicon Merging and Positional E. Pianta, L. Bentivogli, and C. Girardi. 2002. Mul-
Features for Sentiment Analysis in Italian. In Pro- tiWordNet: Developing an Aligned Multilingual
ceedings of the 5th Evaluation Campaign of Natural Database. In Proceedings of International Confer-
Language Processing and Speech Tools for Italian ence on Global WordNet.
(EVALITA 2016). aAcademia University Press. Robert Plutchik. 1980. A general psychoevolutionary
Tullio De Mauro. 1980. Guida all’uso delle parole theory of emotion. In R. Plutchik and H. Kellerman,
Num. 3 dei Libri di base. Editori Riuniti, Roma. editors, Emotion: Theory, research, and experience:
Vol. 1. Theories of emotion, pages 3–33. Academic
Christiane Fellbaum. 1998. WordNet: An Electronic press, New York.
Lexical Database. Bradford Books.
Marco Stranisci, Cristina Bosco, Delia Irazú
Delia Irazú Hernández Farı́as, Davide Buscaldi, and Hernández Farı́as, and Viviana Patti. 2016.
Belém Priego-Sánchez. 2014. IRADABE: Adapt- Annotating Sentiment and Irony in the Online
ing English Lexicons to the Italian Sentiment Polar- Italian Political Debate on #labuonascuola. In
ity Classification task. In First Italian Conference Proceedings of the Tenth International Conference
on Computational Linguistics (CLiC-it 2014) and on Language Resources and Evaluation (LREC
the fourth International Workshop EVALITA 2014, 2016). European Language Resources Association
pages 75–81. (ELRA).
Delia Irazú Hernández Farı́as, Viviana Patti, and Paolo
Rosso. 2016. Irony Detection in Twitter: The Role
of Affective Content. ACM Trans. Internet Technol.,
16(3):19:1–19:24.