=Paper=
{{Paper
|id=Vol-2765/146
|storemode=property
|title=UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2765/paper146.pdf
|volume=Vol-2765
|authors=Maurizio Moraca,Gianluca Sabella,Simone Morra
|dblpUrl=https://dblp.org/rec/conf/evalita/MoracaSM20
}}
==UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper)==
UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task
A
Maurizio Moraca, Gianluca Sabella, Simone Morra
Università degli Studi di Napoli Federico II
[mau.moraca,gia.sabella,simone.morra2]@studenti.unina.it
Abstract tures come 2 char-grams, unigram hash-
tag e l’Afinn weight calcolato sui tweet
tradotti in automatico. I risultati sono
promettenti in quanto le performance sono
English. This document describes a clas- in media superiori rispetto a quelle della
sification system for the SardiStance task baseline proposta dagli organizzatori del
at EVALITA 2020. The task consists in task.
classifying the stance of the author of a se-
ries of tweets towards a specific discussion
topic. The resulting system was specif-
ically developed by the authors as final 1 Introduction
project for the Natural Language Process-
ing class of the Master in Computer Sci- This work reports on the application of our
ence at University of Naples Federico II. system for the resolution of the EVALITA 2020’s
The proposed system is based on an SVM SardiStance task (Basile et al., 2020; Cignarella
classifier with a radial basis function as et al., 2020). Stance detection is a classification
kernel making use of features like 2 char- task aiming at determining the position (stance)
grams, unigram hashtag and Afinn weight of the author of a given text concerning the topic
computed on automatic translated tweets. (target) treated in the text itself. In other words,
The results are promising in that the sys- the challenge deals with automatically guessing
tem performances are on average higher if the author of the text is in favour, against or
than that of the baseline proposed by the is in a neutral position towards the topic subject
task organizers. of a given post. The utility of such an automatic
system can be found in political analysis, market-
Italiano. Questo documento descrive ing and opinion mining. Automatic determination
un sistema di classificazione per il task of Stance is a new approach to opinion mining
SardiStance di EVALITA 2020. Il task paradigm which finds better application in social
consiste nel classificare la posizione and political applications. It is quite different
dell’autore di una serie di tweets nei con- form in which sentiment analysis in many views,
fronti di uno specifico topic di discussione. but the main difference is the drastic reduction to
Il sistema risultante è stato specificamente a three class decision system (in favour, against,
sviluppato dagli autori come progetto fi- neutral) given its main fields of application. The
nale per il corso di Elaborazione del Lin- challenge poses many challenges, as the real target
guaggio Naturale nell’ambito del corso di might not be expressly cited in the text or could
laurea magistrale in Informatica presso bear a not so clear expression of the author’s opin-
l’università degli studi di Napoli Federico ion like in the following example (Lai et al., 2020):
II. Il sistema qui proposto si basa su un
classificatore SVM con una funzione radi- Target: Donald Trump
ale di base come kernel facendo uso di fea- Tweet: Jeb Bush is the only sane candidate in this
republican lineup.
Copyright © 2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-
ternational (CC BY 4.0). Although one could erroneously think that
this task is similar to sentiment analysis, the • unigram emoji, bag of emojis in binary rep-
following example illustrates how, in some cases, resentation (presence/absence)
stance detection results are opposed to those
reached by sentiment analysis (Lai et al., 2020): • unigram mentions, bag of mentions in binary
representation (presence/absence).
Target: Climate change is a real concern • num uppercase words, number of uppercase
Tweet: @RegimeChangeBC @ndnstyl It’s sad to words in a tweet.
be the last generation that could change but does
nothing. #Auspol • punctuation marks, frequency of each punc-
This tweet presents a negative polarity, although tuation mark (. , ; ! ?) and their total fre-
the author claims to be in favour of the target. quency.
Classification systems for stance detection, then,
attempt the individuation of the author position on • Afinn weight1 (Nielsen, 2011), based on a
the target taking into account of features obtained sentiment analysis lexicon made up of 3500
by the text that are almost similar to those used English words manually annotated with a po-
in hate speech detection, irony detection, mood larity value within the range [-5, +5]. The
detection, but with some further effort devoted to value of this feature is computed for each
the specificity of the task. tweet as the sum of the polarities associated
SardiStance is the first Italian Initiative focused to the words constituting the tweet translated
on the automatic classification of stance in tweets. to English via Google Translate.
It includes two different tasks: A) Stance Detec-
• Hu&Liu weight2 , based on a sentiment anal-
tion at a textual level, where tasl participants are
ysis lexicon composed of two separated lists
asked to resolve the guess basing only on the tweet
of English words, where the first one contains
textual content, and B) Stance Detection with the
2,006 words with a positive connotation, and
addition of contextual information about the tweet,
the second one contains 4,783 words with a
such as the number of retweets, the number of
negative connotation. In this work, a value of
favours or the date of posting; contextual informa-
+1 is given to words which overlap with the
tion about the author, location, user’s biography);
positive ones in the lexicon and a value of -1
we proposed runs only for task A). As required
to the ones overlapping with the negative list.
by the task proposal, task A requires a three-class
The total polarity of each tweet is computed
classification process where the system has to pre-
as the sum of the weights given to the words
dict whether the items in the set are in FAVOUR,
in a tweet.
AGAINST or NEUTRAL exploiting the text of the
tweet. • NRC vector3 (Bravo-Marquez et al., 2019),
based on a lexicon consisting in a list of En-
glish words, each of which is associated to
2 Description of the System the most representative emotion. The emo-
The system is based on a SVM classifier with a tion which are comprised are anger, fear, ex-
radial basis function (rbf) kernel. Most of the fea- pectancy, trust, surprise, sadness, joy, and
tures selected were inspired by (Lai et al., 2020) disgust. Furthermore, to each sample, a score
and correspond to the following ones: indicating the emotion intensity is also as-
sociated. This score has a value within the
range [0, 1].
• n-grams, bag of n consecutive words
in binary representation (presence/absence) • DPL vector4 (Castellucci et al., 2016),
where n corresponds to 1, 2 or 3. based on a lexicon of 75,021 pairs of
1
• char-grams, bag of n consecutive characters https://github.com/fnielsen/afinn/tree/master/afinn/data
2
in binary representation (presence/absence) https://github.com/woodrad/Twitter-Sentiment-
Mining/tree/master/Hu%20and%20Liu%20Sentiment%
where n corresponds to 2, 3, 4 or 5. 20Lexicon
3
http://saifmohammad.com/WebPages/AffectIntensity.htm
• unigram hashtag, bag of hashtags in binary 4
http://sag.art.uniroma2.it/demo-software/distributional-
representation (presence/absence). polarity-lexicon/
lemma::pos tag associated to scores indicat- The best settings obtained on the validation set
ing the level of positivity, negativity, and neu- data correspond to C = 10 e γ = 0.001.
trality of the lemma, as it follows
3 Results
(1) buono::a 0.76691014 0.12262548 In this section the performances of our system ob-
0.11046442 tained during the test phase on the validation and
test set are described. The validation set was ob-
For each tweet of the dataset, each word
tained extracting a sample of tweets from the train-
was lemmatised and, for each resulting
ing set via the Stratified Sampling algorithm se-
lemma, a morpho-syntactic category was as-
lecting the 20% of the training set. The evaluation
sociated. For this kind of analysis LinguA
metrics used are the mean value of the F1 score for
(Dell’Orletta, 2009; Attardi and Dell’Orletta,
the classes Against and Favour, Precision, Recall
2009; Attardi et al., 2009) was used. The
and F1 score for each class, and Accuracy. In table
DPL vector feature consists of a triplet of
3, the results obtained from the validation set are
scores representing positivity, negativity, and
shown. From these results, the mean F1 score is
neutrality levels in the tweet. To obtain this
obtained, corresponding to 0.5200. In table 3, the
value, the scores of each pair lemma::pos tag
results obtained from the test set are presented.
in a tweet were summed.
In order to select the best features combination, Precision Recall F1 Score
a wrapper-based feature selection algorithm was Against 0.5500 0.8300 0.6600
used to test all the possible features combinations. Favor 0.4400 0.3200 0.3100
The best one resulting from the collected perfor- None 0.3800 0.1300 0.0900
mance on the validation set was chosen, that is
the one combining 2 char-grams, unigram hash- Table 1: Validation Set Performance
tag and Afinn weight. The evaluation metrics are
discussed in the next section (Section 3). Since a
Precision Recall F1 Score
SVM classifier with an RBF kernel was used, it
Against 0.7300 0.8491 0.7850
was important to tune the C and γ parameters.
Favor 0.4348 0.3571 0.3922
To set the complexity of a generic SVM model,
None 0.3488 0.1744 0.2326
C is used: this parameter controls the accept-
able distance of the decision boundary in the n-
Table 2: Test Set Performance
dimensional features space from the support vec-
tors. A higher C complexity value increases the
model’s complexity, thus reducing the acceptable Team F1-score
distance but also increasing the risk of overfitting; Against Favour None
a lower C value leads to more general models that UNITOR 1 0.7866 0.5840 0.3910
may have reduced discrimination capability. The UNITOR 2 0.7881 0.5721 0.3979
γ parameter is specific for the RBF kernel. This UNITOR 3 0.7939 0.5647 0.3672
parameter controls the influence single points have UNITOR 4 0.7689 0.5522 0.3702
in the features space and controls the smoothness UninaStudents 0.7850 0.3922 0.2326
of the model, with lower values of γ leading to Baseline 0.7158 0.4409 0.2764
smoother models and vice-versa. SVMs are very
sensitive to parameters tuning so specific optimi- Table 3: Results compared with the baseline and
sation strategies must be adopted. In this case, the winning system
a grid search was performed using the following
ranges of values: In table 3, on the other hand, the results are
compared with the baseline proposed by the task
• C [0.1, 0.2, . . . , 1.0, 10, 100, 1000] organizers and the winning systems whose runs
were submitted by the UNITOR team (Gior-
• Gamma [0.001, 0.0009, 0.0008, . . . , 0.0001] gioni et al., 2020) for task A. Specifically, the
baseline used a SVM classifier based on token that our results are not at the state of the art in the
uni-gram features, whereas UNITOR used Um- field. However, a comparison with average per-
BERTo5 , adding sentiment, hate and irony tags to formances in similar tasks for languages different
the dataset sentences and using additional data to from English indicates performances that are not
train their systems. As it may be noted, the against significantly different.
class result for our system is higher than the base-
line and not so different from the first two runs of
UNITOR. Further investigations are, conversely, Acknowledgements
needed as far as the other two classes are con- We thank our teachers Francesco Cutugno and
cerned. Maria Di Maro for letting us approach with NLP
and EVALITA 2020 (Basile et al., 2020) and for
supporting us in our work. We also thank them for
4 Discussion
giving us the opportunity to take part to the com-
Our results are conditioned by the use of a training petition and for encouraging us to do our best.
set originally in English and translated into Italian
for our purposes, and, in particular, for the deriva-
tion of the Afinn weight features. As expected, the
References
translation, made via Google translate is, in some
Attardi, G. and Dell’Orletta, F. (2009). Reverse re-
cases poor and approximate, and can give rise to
vision and linear tree combination for dependency
a significant level of ambiguity, however we de- parsing. In Proceedings of Human Language Tech-
cided to afford this risk, translating directly the nologies: The 2009 Annual Conference of the North
tweets, instead of the lexicon, as we thought that in American Chapter of the Association for Computa-
this last case the ambiguity could have been even tional Linguistics, Companion Volume: Short Pa-
pers, pages 261–264.
greater, we just hoped that automatic translation is
by far more uncertain because of polysemy, lack Attardi, G., Dell’Orletta, F., Simi, M., and Turian, J.
of flexive morphological information, and simi- (2009). Accurate dependency parsing with a stacked
multilayer perceptron. Proceedings of EVALITA,
lar problems, as automatic translation skills are
9:1–8.
trained to solve at least at a first level of aproxima-
tion. In this view the use of an imperfect transla- Basile, V., Croce, D., Di Maro, M., and Passaro, L. C.
tion, however, is able to capture part of the seman- (2020). Evalita 2020: Overview of the 7th evalu-
ation campaign of natural language processing and
tic context in the texts, allowing us not to recur to speech tools for italian. In Basile, V., Croce, D.,
lemmatization and further processes on the lexi- Di Maro, M., and Passaro, L. C., editors, Pro-
con before translation. We choose to use a clas- ceedings of Seventh Evaluation Campaign of Nat-
sic approach based on an SVM classifier in order ural Language Processing and Speech Tools for
Italian. Final Workshop (EVALITA 2020), Online.
to make our results explainable, given the scholar
CEUR.org.
context in which this experience is grown. This
possibility would have been impossible if we had Bravo-Marquez, F., Frank, E., Pfahringer, B., and Mo-
used Deep Neural Networks, whose processes are hammad, S. M. (2019). Affectivetweets: a weka
package for analyzing affect in tweets. Journal of
not ”readable” from an external point of view. Fur- Machine Learning Research, 20(92):1–6.
thermore, the size of the data-set distributed for
this challenge does not consent an affordable train- Castellucci, G., Croce, D., and Basili, R. (2016). A lan-
ing with these systems. In this view, a compar- guage independent method for generating large scale
polarity lexicons. In Proceedings of the Tenth In-
ison of results obtained in other stance detection ternational Conference on Language Resources and
challenges, similar to that proposed here in Evalita Evaluation (LREC’16), pages 38–45.
(Mohammad et al., 2016; Taulé et al., 2017; Lai
et al., 2017), give strength to our choice concern- Cignarella, A. T., Lai, M., Bosco, C., Patti, V., and
Rosso, P. (2020). SardiStance@EVALITA2020:
ing the use of SVM that often outperform DNNs. Overview of the Task on Stance Detection in Ital-
As Master students, we approached these NLP ian Tweets. In Basile, V., Croce, D., Di Maro,
topics for the first time. Therefore, we are aware M., and Passaro, L. C., editors, Proceedings of
the 7th Evaluation Campaign of Natural Language
5 Processing and Speech Tools for Italian (EVALITA
https://huggingface.co/Musixmatch/umberto-
commoncrawl-cased-v1 2020). CEUR-WS.org.
Dell’Orletta, F. (2009). Ensemble system for part-of-
speech tagging. Proceedings of EVALITA, 9:1–8.
Giorgioni, S., Politi, M., Salman, S., Croce, D.,
and Basili, R. (2020). UNITOR@Sardistance2020:
Combining Transformer-based architectures and
Transfer Learning for robust Stance Detection. In
Basile, V., Croce, D., Di Maro, M., and Passaro,
L. C., editors, Proceedings of the 7th Evaluation
Campaign of Natural Language Processing and
Speech Tools for Italian (EVALITA 2020). CEUR-
WS.org.
Lai, M., Cignarella, A. T., Farı́as, D. I. H., Bosco, C.,
Patti, V., and Rosso, P. (2020). Multilingual stance
detection in social media political debates. Com-
puter Speech & Language, page 101075.
Lai, M., Cignarella, Alessandra Teresa, H. F. D. I., et al.
(2017). itacos at ibereval2017: Detecting stance in
catalan and spanish tweets. In IberEval 2017, vol-
ume 1881, pages 185–192. CEUR-WS. org.
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu,
X., and Cherry, C. (2016). Semeval-2016 task 6:
Detecting stance in tweets. In Proceedings of the
10th International Workshop on Semantic Evalua-
tion (SemEval-2016), pages 31–41.
Nielsen, F. Å. (2011). A new ANEW: evaluation of
a word list for sentiment analysis in microblogs.
In Rowe, M., Stankovic, M., Dadzie, A.-S., and
Hardey, M., editors, Proceedings of the ESWC2011
Workshop on ’Making Sense of Microposts’: Big
things come in small packages, volume 718 of
CEUR Workshop Proceedings, pages 93–98.
Taulé, M., Martı́, M. A., Rangel, F. M., Rosso, P.,
Bosco, C., Patti, V., et al. (2017). Overview of
the task on stance and gender detection in tweets on
catalan independence at ibereval 2017. In 2nd Work-
shop on Evaluation of Human Language Technolo-
gies for Iberian Languages, IberEval 2017, volume
1881, pages 157–177. CEUR-WS.