=Paper= {{Paper |id=Vol-2263/paper024 |storemode=property |title=UNIBA - Integrating Distributional Semantics Features in a Supervised Approach for Detecting Irony in Italian Tweets |pdfUrl=https://ceur-ws.org/Vol-2263/paper024.pdf |volume=Vol-2263 |authors=Pierpaolo Basile,Giovanni Semeraro |dblpUrl=https://dblp.org/rec/conf/evalita/BasileS18 }} ==UNIBA - Integrating Distributional Semantics Features in a Supervised Approach for Detecting Irony in Italian Tweets== https://ceur-ws.org/Vol-2263/paper024.pdf

UNIBA - Integrating distributional semantics features in a supervised
approach for detecting irony in Italian tweets
Pierpaolo Basile and Giovanni Semeraro
Department of Computer Science
University of Bari Aldo Moro
Via, E. Orabona, 4 - 70125 Bari (Italy)
{pierpaolo.basile,giovanni.semeraro}@uniba.it

Abstract that the word “badante” (caregiver) is used in an
unconventional context, since “caregiver” usually
English. This paper describes the UNIBA
does not co-occur with words “Premier” or “Mario
team participation in the IronITA 2018
Monti”.
task at EVALITA 2018. We propose a su-
pervised approach based on LIBLINEAR Following this idea in our work we introduce a
that relies on keyword, polarity, micro- feature able to detect words used out of their usual
blogging features and representation of context. Moreover, we integrate further features
tweets in a distributional semantic model. based on keywords, bigrams, trigrams, polarity
Our system ranked 3rd and 4th in the irony and micro-blogging features as reported in (Basile
detection subtask. We participated only in and Novielli, 2014). Our idea is supported by best
the constraint run exploiting the training systems participating in the Semeval-2018 task 3 -
data provided by the task organizers. Irony detection in English tweets (Van Hee et al.,
2018), where the best systems not based on deep
Italiano. Questo articolo descrive learning exploit features based on polarity contrast
la partecipazione del team UNIBA al information and context incongruity.
task IronITA 2018 organizzato durante We evaluate our approach in the context of the
EVALITA 2018. Nell’articolo proponi- IronITA task at EVALITA 2018 (Cignarella et al.,
amo un approccio supervisionato basato 2018). The goal of the task is to predict irony in
su LIBLINEAR che sfrutta le parole chi- Italian tweets. The task is organized in two sub-
ave, la polarità, attributi tipici dei micro- tasks: 1) irony detection and 2) different types of
blog e la rappresentazione dei tweet in uno irony. In the second sub-task participates must
spazio semantico distribuzionale. Il nos- identify if irony belongs to sarcasm or not. In this
tro sistema si è classificato terzo e quarto paper, we propose an approach which is able to de-
nel sotto task di identificazione dell’ironia. tect the presence of irony without taking into ac-
Abbiamo partecipato solamente nel con- count different types of irony. We evaluate the ap-
straing run utilizzando i dati di training proach in a constrained setting using only the data
forniti dagli organizzatori del task. provided by task organizers. The only external re-
sources exploited in our approach are a polarity
1 Introduction lexicon and a collection of about 40M tweets ran-
domly extracted from TWITA(Basile and Nissim,
The irony is defined as “the use of words that
2013) (a collection of about 800M Italian tweets).
say the opposite of what you really mean, often
The paper is structured as follows: Section 2
as a joke and with a tone of voice that shows
describes our system, while evaluation and results
this”1 . This suggests us that when we are ana-
are reported in Section 3. Final remarks are pro-
lyzing written text for detecting irony, we should
vided in Section 4.
focus our attention on those words that are used in
an unconventional context. For example, given the 2 System Description
tweet: “S&P ha declassato Mario Monti da Pre-
mier a Badante #declassaggi”2 , we can observe Our approach adopts a supervised classifier based
1 on LIBLINEAR (Fan et al., 2008), in particular we
Oxford Learner Dictionary
2
In English: “S&P has downgraded Mario Monti from use the L2-regularized L2-loss linear SVM. Each
Premier to Caregiver” tweet is represented using several sets of features:
keyword-based : keyword-based features exploit many switches from POS to NEG, or vice
tokens occurring in the tweets. Unigrams, bi- versa, occur in the tweet.
grams and trigrams are considered. During
distributional semantics features : we compute
the tokenization we replace the user mentions
two kinds of distributional semantics fea-
and URLs with two metatokens: “ USER ”,
tures:
“ URL ”;
1. given a set of unlabelled downloaded
microblogging : microblogging features take into tweets, we build a geometric space in
account some attributes of the tweets that which each word is represented as a
are peculiar in the context of microblog- mathematical point. The similarity be-
ging. We exploit the following features: the tween words is computed as their close-
presence of emoticons, item character repe- ness in the space. To represent a tweet
titions3 , informal expressions of laughters4 in the geometric space, we adopt the su-
and the presence of exclamation and interrog- perposition operator (Smolensky, 1990),
ative marks. All microblobbing features are that is the vector sum of all the vectors
binary. of words occurring in the tweet. We use
→
−
the tweet vector t as a semantic feature
polarity : this block contains features extracted in training our classifiers;
from the SentiWordNet (Esuli and Sebastiani, 2. we extract three features that taking into
2006) lexicon. We translate SentiWordNet in account the usage of words in an uncon-
Italian through MultiWordNet (Pianta et al., ventional context. In particular, for each
2002). It is important to underline that Senti- word wi we compute a score aci that
WordNet is a synset-based lexicon while our measures how the word is out of its con-
Italian translation is a word based lexicon. In ventional context. Finally, we compute
order to automatically derive our Italian sen- three features: the average, the maxi-
timent lexicon from SentiWordNet, we per- mum and the minimum of all the aci
form three steps. First, we translate the synset scores. More details about the computa-
offset in SentiWordNet from version 3.0 to tion of the aci score are reported in Sub-
1.65 using automatically generated mapping section 2.1.
file. Then, we transfer the prior polarity of
SentiWordNet to the Italian lemmata. Fi- 2.1 Distributional Semantics Features
nally, we expand the lexicon using Morph- The distributional semantics model is built on a
it! (Zanchetta and Baroni, 2005), a lexicon collection of tweets. We randomly extract 40M
of inflected forms with their lemma and mor- tweets from TWITA and build a semantic space
phological features. We extend the polarity based on the Random Indexing (RI) (Sahlgren,
scores of each lemma to its inflected forms. 2005) technique using a context windows equals
Details about the creation of the sentiment to 2. Moreover, we consider only words occurring
lexicon are reported in (Basile and Novielli, more than ten times6 . The context window is dy-
2014). The obtained Italian translation of namic and it does not take into account words that
SentiWordNet is used to compute three fea- are not in the vocabulary. Our vocabulary contains
tures based on prior polarity of words in the 105,543 terms.
tweets: 1) the maximum positive polarity; The mathematical insight behind the RI is the
2) the maximum negative polarity; 3) polar- projection of a high-dimensional space on a lower
ity variation: for each token occurring in the dimensional one using a random matrix; this kind
tweet a tag is assigned, according to the high- of projection does not compromise distance met-
est polarity score of the token in the Italian rics (Dasgupta and Gupta, 1999).
lexicon. Tag values are in the set {OBJ, POS Formally, given a n × m matrix A and an m ×
, NEG}. The sentiment variation counts how k matrix R, which contains random vectors, we
3
define a new n × k matrix B as:
These features usually plays the same role of intensifiers
in informal writing contexts.
4
An,m · Rm,k = B n,k k << m (1)
i.e., sequences of “ah”.
5 6
Since MultiWordNet is based on WordNet 1.6. We call this set of words: the vocabulary.
The new matrix B has the property to preserve (−1, 1, 0, −2, 0, −1, 0, 1, 0, 1). This operation is
the distance between points, that is if the distance repeated for all the sentences in the corpus and for
between two any points in A is d; then the distance all the words in V . In this example, we used very
dr between the corresponding points in B will sat- small vectors, but in a real scenario, the vector di-
isfy the property that dr ≈ c × d. A proof of that mension ranges from hundreds to thousands of di-
is reported in the Johnson-Lindenstrauss lemma mensions. In particular, in our experiment we use
(Dasgupta and Gupta, 1999). a vector dimension equals to 200 with 10 no-zero
Specifically, RI creates the WordSpace in two elements.
steps: In order to compute the aci score for a word wi
in a tweet, we build a context vector cwi as the sum
1. A context vector is assigned to each word.
of random vectors assigned to words that co-occur
This vector is sparse, high-dimensional and
with wi in the tweet. Then we compare the cosine
ternary, which means that its elements can
similarity between cwi and the semantic vector svi
take values in {-1, 0, 1}. A context vec-
assigned to wi . The idea is to measure how the
tor contains a small number of randomly dis-
semantic vector is dissimilar to the context vector.
tributed non-zero elements, and the structure
If the word wi has never appeared in the context
of this vector follows the hypothesis behind
under analysis, its semantic vector does not con-
the concept of Random Projection;
tain the random vectors of the words in the con-
2. Context vectors are accumulated by analyz- text, this results in low cosine similarity. Finally,
ing co-occurring words. In particular, the se- the divergence from the context is computed as
mantic vector for any word is computed as 1 − cosSim(cwi , svi ).
the sum of the context vectors for words that
co-occur with the analyzed word. 3 Evaluation

Formally, given a corpus C of n documents, and We perform the evaluation using the data provided
a vocabulary V of m words extracted from C, we by the task organizers. The number of tweets in
perform two steps: 1) assign a context vector ci to the training set is 3,977, while the testing set con-
each word in V ; 2) compute a semantic vector svi sists of 872 tweets. The only parameter to set in
for each word wi as the sum of all context vectors LIBLINEAR is C (the cost), after a 5-fold cross
assigned to words co-occurring with wi . The con- validation on training we set C=1.
text is the set of m words that precede and follow We submit two runs: UNIBA1 includes the se-
wi . mantic vector representing the tweet as a feature,
For example, considering the following tweet: while UNIBA2 does not include this vector. Nev-
“siete il buono della scuola fatelo capire”. In the ertheless, features about the divergence are in-
first step we assign a random vector to each term cluded in both the runs.
as follows: Official results are reported in Table 1. Our runs
rank third and fourth in the final rank. Our team
is classified as second since the first two runs in
csiete = (−1, 0, 0, −1, 0, 0, 0, 0, 0, 0) the rank belong to the team1. We can notice that
cbuono = (0, 0, 0, −1, 0, 0, 0, 1, 0, 0) runs are very close in the rank. The last run is
cscuola = (0, 0, 0, 0, −1, 0, 0, 0, 1, 0) ranked below the baseline random, while any sys-
tem is ranked below the baseline baseline-mfc that
cf atelo = (0, 1, 0, 0, 0, −1, 0, 0, 0, 0)
assigns the most frequent class (non-ironic).
ccapire = (−1, 0, 0, 0, 0, 0, 0, 0, 0, 1) Results show that our system is not able to
improve performance exploiting the distributional
representation of tweets, since the two runs report
In the second step, we build a semantic vec-
the same average F1-score. We performed further
tor for each term by accumulating random vec-
experiments in order to understand the contribu-
tors of its co-occurring words. For example fix-
tion of each feature. Some relevant outcomes are
ing m = 2, the semantic vector for the word
reported in Table 2, in particular:
scuola is the sum of the random vectors si-
ete, buono, fatelo, capire. Summing these vec- • keyword-based features are able to achieve
tors, the semantic vector for scuola results in the best performance, in particular bigrams
team precision recall F1-score precision recall F1-score average
(non- (non- (non- (ironic) (ironic) (ironic) F1-score
ironic) ironic) ironic)
team1 0.785 0.643 0.707 0.696 0.823 0.754 0.731
team1 0.751 0.643 0.693 0.687 0.786 0.733 0.713
UNIBA1 0.748 0.638 0.689 0.683 0.784 0.730 0.710
UNIBA2 0.748 0.638 0.689 0.683 0.784 0.730 0.710
team3 0.700 0.716 0.708 0.708 0.692 0.700 0.704
team6 0.600 0.714 0.652 0.645 0.522 0.577 0.614
random 0.506 0.501 0.503 0.503 0.508 0.506 0.505
team7 0.505 0.892 0.645 0.525 0.120 0.195 0.420
baseline-mfc 0.501 1.000 0.668 0.000 0.000 0.000 0.334

Table 1: Task results.
run note no-iro-F iro-F avg-F
run1 all 0.6888 0.7301 0.7095
run2 no DSM 0.6888 0.7301 0.7095
1 keyword 0.6738 0.6969 0.6853
2 keyword, bigrams 0.6916 0.7219 0.7067
3 keyword, bigrams, trigrams 0.6992 0.7343 0.7168
4 keyword, bigrams, trigrams, blog 0.7000 0.7337 0.7168
5 keyword, bigrams, trigrams, polarity 0.6906 0.7329 0.7117
6 keyword, bigrams, trigrams, context 0.6937 0.7325 0.7131
7 only DSM 0.6166 0.6830 0.6406
8 only context 0.4993 0.5587 0.5290

Table 2: Task results obtained combining different types of features.

and trigrams contribute to improve the per- different kernels for distributional and keyword-
formance (run 1 and 2); based features.

• DSM features introduce some kind of noise 4 Conclusions
when are combined with other features, in
We propose a supervised system for detecting
fact run 4, 5 and 6 achieve good performance
irony in Italian tweets. The proposed system ex-
without DSM;
ploits different kinds of features: keyword-based,
• DSM alone without any other kind of features microblogging features, polarity, distributional se-
is able to achieve remarkable results, it is im- mantics features and a score that measure how a
portant to notice that in this run only the tweet word is used in an unconventional context. The
vector is used as a feature; word divergence from its conventional context is
computed exploiting the distributional semantics
• blog, polarity, and context features are not model build by the Random Indexing.
able to give a contribution to the overall sys- Results prove that our system is able to achieve
tem performance, however we can observe good performance and rank third in the official
that using only context features (only three ranking. However, a deep study on different com-
features for each tweet) we are able to over- binations of features shows that keyword-based
come both the baselines. features alone are able to achieve the best result,
while distributional features introduce noise dur-
Analyzing results we can conclude that a more ing the training. This outcome suggests the need
effective way to combine distributional with no- for a different strategy for combining distribu-
distributional features is needed. We plan to in- tional a no-distributional features.
vestigate as a future work the combination of two
References
Valerio Basile and Malvina Nissim. 2013. Sentiment
analysis on italian tweets. In Proc. of WASSA 2013,
pages 100–107.
Pierpaolo Basile and Nicole Novielli. 2014. Uniba
at evalita 2014-sentipolc task: Predicting tweet sen-
timent polarity combining micro-blogging, lexicon
and semantic features. In Proc. of EVALITA 2014,
pages 58–63, Pisa, Italy.
Alessandra Teresa Cignarella, Simona Frenda, Vale-
rio Basile, Cristina Bosco, Viviana Patti, and Paolo
Rosso. 2018. Overview of the evalita 2018 task on
irony detection in italian tweets (ironita). In Tom-
maso Caselli, Nicole Novielli, Viviana Patti, and
Paolo Rosso, editors, Proceedings of the 6th evalua-
tion campaign of Natural Language Processing and
Speech tools for Italian (EVALITA’18), Turin, Italy.
CEUR.org.

Sanjoy Dasgupta and Anupam Gupta. 1999. An ele-
mentary proof of the Johnson-Lindenstrauss lemma.
Technical report, Technical Report TR-99-006, In-
ternational Computer Science Institute, Berkeley,
California, USA.
Andrea Esuli and Fabrizio Sebastiani. 2006. Senti-
wordnet: A publicly available lexical resource for
opinion mining. In Proc. of LREC, pages 417–422.
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-
Rui Wang, and Chih-Jen Lin. 2008. Liblinear: A
library for large linear classification. Journal of ma-
chine learning research, 9(Aug):1871–1874.
Emanuele Pianta, Luisa Bentivogli, and Christian Gi-
rardi. 2002. Multiwordnet: developing an aligned
multilingual database. In Proc. 1st Intl Conf. on
Global WordNet, pages 293–302.
Magnus Sahlgren. 2005. An Introduction to Random
Indexing. In Methods and Applications of Semantic
Indexing Workshop at the 7th International Confer-
ence on Terminology and Knowledge Engineering,
TKE, volume 5.
Paul Smolensky. 1990. Tensor product variable bind-
ing and the representation of symbolic structures in
connectionist systems. Artificial Intelligence, 46(1-
2):159–216, November.
Cynthia Van Hee, Els Lefever, and Véronique Hoste.
2018. Semeval-2018 task 3: Irony detection in en-
glish tweets. In Proceedings of The 12th Interna-
tional Workshop on Semantic Evaluation, pages 39–
50.
Eros Zanchetta and Marco Baroni. 2005. Morph-it!:
a free corpus-based morphological resource for the
italian language. Proc. of the Corpus Linguistics
Conf. 2005.