=Paper= {{Paper |id=Vol-2765/146 |storemode=property |title=UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper) |pdfUrl=https://ceur-ws.org/Vol-2765/paper146.pdf |volume=Vol-2765 |authors=Maurizio Moraca,Gianluca Sabella,Simone Morra |dblpUrl=https://dblp.org/rec/conf/evalita/MoracaSM20 }} ==UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper)== https://ceur-ws.org/Vol-2765/paper146.pdf
 UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task
                                 A

                 Maurizio Moraca, Gianluca Sabella, Simone Morra
                     Università degli Studi di Napoli Federico II
       [mau.moraca,gia.sabella,simone.morra2]@studenti.unina.it



                        Abstract                                    tures come 2 char-grams, unigram hash-
                                                                    tag e l’Afinn weight calcolato sui tweet
                                                                    tradotti in automatico. I risultati sono
                                                                    promettenti in quanto le performance sono
    English. This document describes a clas-                        in media superiori rispetto a quelle della
    sification system for the SardiStance task                      baseline proposta dagli organizzatori del
    at EVALITA 2020. The task consists in                           task.
    classifying the stance of the author of a se-
    ries of tweets towards a specific discussion
    topic. The resulting system was specif-
    ically developed by the authors as final                    1   Introduction
    project for the Natural Language Process-
    ing class of the Master in Computer Sci-                    This work reports on the application of our
    ence at University of Naples Federico II.                   system for the resolution of the EVALITA 2020’s
    The proposed system is based on an SVM                      SardiStance task (Basile et al., 2020; Cignarella
    classifier with a radial basis function as                  et al., 2020). Stance detection is a classification
    kernel making use of features like 2 char-                  task aiming at determining the position (stance)
    grams, unigram hashtag and Afinn weight                     of the author of a given text concerning the topic
    computed on automatic translated tweets.                    (target) treated in the text itself. In other words,
    The results are promising in that the sys-                  the challenge deals with automatically guessing
    tem performances are on average higher                      if the author of the text is in favour, against or
    than that of the baseline proposed by the                   is in a neutral position towards the topic subject
    task organizers.                                            of a given post. The utility of such an automatic
                                                                system can be found in political analysis, market-
    Italiano. Questo documento descrive                         ing and opinion mining. Automatic determination
    un sistema di classificazione per il task                   of Stance is a new approach to opinion mining
    SardiStance di EVALITA 2020. Il task                        paradigm which finds better application in social
    consiste nel classificare la posizione                      and political applications. It is quite different
    dell’autore di una serie di tweets nei con-                 form in which sentiment analysis in many views,
    fronti di uno specifico topic di discussione.               but the main difference is the drastic reduction to
    Il sistema risultante è stato specificamente               a three class decision system (in favour, against,
    sviluppato dagli autori come progetto fi-                   neutral) given its main fields of application. The
    nale per il corso di Elaborazione del Lin-                  challenge poses many challenges, as the real target
    guaggio Naturale nell’ambito del corso di                   might not be expressly cited in the text or could
    laurea magistrale in Informatica presso                     bear a not so clear expression of the author’s opin-
    l’università degli studi di Napoli Federico                ion like in the following example (Lai et al., 2020):
    II. Il sistema qui proposto si basa su un
    classificatore SVM con una funzione radi-                   Target: Donald Trump
    ale di base come kernel facendo uso di fea-                 Tweet: Jeb Bush is the only sane candidate in this
                                                                republican lineup.
     Copyright © 2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-
ternational (CC BY 4.0).                                        Although one could erroneously think that
this task is similar to sentiment analysis, the             • unigram emoji, bag of emojis in binary rep-
following example illustrates how, in some cases,             resentation (presence/absence)
stance detection results are opposed to those
reached by sentiment analysis (Lai et al., 2020):           • unigram mentions, bag of mentions in binary
                                                              representation (presence/absence).
Target: Climate change is a real concern                    • num uppercase words, number of uppercase
Tweet: @RegimeChangeBC @ndnstyl It’s sad to                   words in a tweet.
be the last generation that could change but does
nothing. #Auspol                                            • punctuation marks, frequency of each punc-
This tweet presents a negative polarity, although             tuation mark (. , ; ! ?) and their total fre-
the author claims to be in favour of the target.              quency.
Classification systems for stance detection, then,
attempt the individuation of the author position on         • Afinn weight1 (Nielsen, 2011), based on a
the target taking into account of features obtained           sentiment analysis lexicon made up of 3500
by the text that are almost similar to those used             English words manually annotated with a po-
in hate speech detection, irony detection, mood               larity value within the range [-5, +5]. The
detection, but with some further effort devoted to            value of this feature is computed for each
the specificity of the task.                                  tweet as the sum of the polarities associated
   SardiStance is the first Italian Initiative focused        to the words constituting the tweet translated
on the automatic classification of stance in tweets.          to English via Google Translate.
It includes two different tasks: A) Stance Detec-
                                                            • Hu&Liu weight2 , based on a sentiment anal-
tion at a textual level, where tasl participants are
                                                              ysis lexicon composed of two separated lists
asked to resolve the guess basing only on the tweet
                                                              of English words, where the first one contains
textual content, and B) Stance Detection with the
                                                              2,006 words with a positive connotation, and
addition of contextual information about the tweet,
                                                              the second one contains 4,783 words with a
such as the number of retweets, the number of
                                                              negative connotation. In this work, a value of
favours or the date of posting; contextual informa-
                                                              +1 is given to words which overlap with the
tion about the author, location, user’s biography);
                                                              positive ones in the lexicon and a value of -1
we proposed runs only for task A). As required
                                                              to the ones overlapping with the negative list.
by the task proposal, task A requires a three-class
                                                              The total polarity of each tweet is computed
classification process where the system has to pre-
                                                              as the sum of the weights given to the words
dict whether the items in the set are in FAVOUR,
                                                              in a tweet.
AGAINST or NEUTRAL exploiting the text of the
tweet.                                                      • NRC vector3 (Bravo-Marquez et al., 2019),
                                                              based on a lexicon consisting in a list of En-
                                                              glish words, each of which is associated to
2    Description of the System                                the most representative emotion. The emo-
The system is based on a SVM classifier with a                tion which are comprised are anger, fear, ex-
radial basis function (rbf) kernel. Most of the fea-          pectancy, trust, surprise, sadness, joy, and
tures selected were inspired by (Lai et al., 2020)            disgust. Furthermore, to each sample, a score
and correspond to the following ones:                         indicating the emotion intensity is also as-
                                                              sociated. This score has a value within the
                                                              range [0, 1].
    • n-grams, bag of n consecutive words
      in binary representation (presence/absence)           • DPL vector4 (Castellucci et al., 2016),
      where n corresponds to 1, 2 or 3.                       based on a lexicon of 75,021 pairs of
                                                            1
    • char-grams, bag of n consecutive characters             https://github.com/fnielsen/afinn/tree/master/afinn/data
                                                            2
      in binary representation (presence/absence)             https://github.com/woodrad/Twitter-Sentiment-
                                                         Mining/tree/master/Hu%20and%20Liu%20Sentiment%
      where n corresponds to 2, 3, 4 or 5.               20Lexicon
                                                            3
                                                              http://saifmohammad.com/WebPages/AffectIntensity.htm
    • unigram hashtag, bag of hashtags in binary            4
                                                              http://sag.art.uniroma2.it/demo-software/distributional-
      representation (presence/absence).                 polarity-lexicon/
     lemma::pos tag associated to scores indicat-          The best settings obtained on the validation set
     ing the level of positivity, negativity, and neu-   data correspond to C = 10 e γ = 0.001.
     trality of the lemma, as it follows

                                                         3    Results
        (1)   buono::a 0.76691014 0.12262548             In this section the performances of our system ob-
              0.11046442                                 tained during the test phase on the validation and
                                                         test set are described. The validation set was ob-
     For each tweet of the dataset, each word
                                                         tained extracting a sample of tweets from the train-
     was lemmatised and, for each resulting
                                                         ing set via the Stratified Sampling algorithm se-
     lemma, a morpho-syntactic category was as-
                                                         lecting the 20% of the training set. The evaluation
     sociated. For this kind of analysis LinguA
                                                         metrics used are the mean value of the F1 score for
     (Dell’Orletta, 2009; Attardi and Dell’Orletta,
                                                         the classes Against and Favour, Precision, Recall
     2009; Attardi et al., 2009) was used. The
                                                         and F1 score for each class, and Accuracy. In table
     DPL vector feature consists of a triplet of
                                                         3, the results obtained from the validation set are
     scores representing positivity, negativity, and
                                                         shown. From these results, the mean F1 score is
     neutrality levels in the tweet. To obtain this
                                                         obtained, corresponding to 0.5200. In table 3, the
     value, the scores of each pair lemma::pos tag
                                                         results obtained from the test set are presented.
     in a tweet were summed.

   In order to select the best features combination,                    Precision   Recall        F1 Score
a wrapper-based feature selection algorithm was           Against       0.5500      0.8300        0.6600
used to test all the possible features combinations.      Favor         0.4400      0.3200        0.3100
The best one resulting from the collected perfor-         None          0.3800      0.1300        0.0900
mance on the validation set was chosen, that is
the one combining 2 char-grams, unigram hash-                    Table 1: Validation Set Performance
tag and Afinn weight. The evaluation metrics are
discussed in the next section (Section 3). Since a
                                                                        Precision   Recall        F1 Score
SVM classifier with an RBF kernel was used, it
                                                          Against       0.7300      0.8491        0.7850
was important to tune the C and γ parameters.
                                                          Favor         0.4348      0.3571        0.3922
   To set the complexity of a generic SVM model,
                                                          None          0.3488      0.1744        0.2326
C is used: this parameter controls the accept-
able distance of the decision boundary in the n-
                                                                    Table 2: Test Set Performance
dimensional features space from the support vec-
tors. A higher C complexity value increases the
model’s complexity, thus reducing the acceptable             Team                      F1-score
distance but also increasing the risk of overfitting;                        Against    Favour      None
a lower C value leads to more general models that            UNITOR 1        0.7866     0.5840      0.3910
may have reduced discrimination capability. The              UNITOR 2        0.7881     0.5721      0.3979
γ parameter is specific for the RBF kernel. This             UNITOR 3        0.7939     0.5647      0.3672
parameter controls the influence single points have          UNITOR 4        0.7689     0.5522      0.3702
in the features space and controls the smoothness            UninaStudents   0.7850     0.3922      0.2326
of the model, with lower values of γ leading to              Baseline        0.7158     0.4409      0.2764
smoother models and vice-versa. SVMs are very
sensitive to parameters tuning so specific optimi-       Table 3: Results compared with the baseline and
sation strategies must be adopted. In this case,         the winning system
a grid search was performed using the following
ranges of values:                                          In table 3, on the other hand, the results are
                                                         compared with the baseline proposed by the task
   • C [0.1, 0.2, . . . , 1.0, 10, 100, 1000]            organizers and the winning systems whose runs
                                                         were submitted by the UNITOR team (Gior-
   • Gamma [0.001, 0.0009, 0.0008, . . . , 0.0001]       gioni et al., 2020) for task A. Specifically, the
baseline used a SVM classifier based on token             that our results are not at the state of the art in the
uni-gram features, whereas UNITOR used Um-                field. However, a comparison with average per-
BERTo5 , adding sentiment, hate and irony tags to         formances in similar tasks for languages different
the dataset sentences and using additional data to        from English indicates performances that are not
train their systems. As it may be noted, the against      significantly different.
class result for our system is higher than the base-
line and not so different from the first two runs of
UNITOR. Further investigations are, conversely,           Acknowledgements
needed as far as the other two classes are con-           We thank our teachers Francesco Cutugno and
cerned.                                                   Maria Di Maro for letting us approach with NLP
                                                          and EVALITA 2020 (Basile et al., 2020) and for
                                                          supporting us in our work. We also thank them for
4   Discussion
                                                          giving us the opportunity to take part to the com-
Our results are conditioned by the use of a training      petition and for encouraging us to do our best.
set originally in English and translated into Italian
for our purposes, and, in particular, for the deriva-
tion of the Afinn weight features. As expected, the
                                                          References
translation, made via Google translate is, in some
                                                          Attardi, G. and Dell’Orletta, F. (2009). Reverse re-
cases poor and approximate, and can give rise to
                                                            vision and linear tree combination for dependency
a significant level of ambiguity, however we de-            parsing. In Proceedings of Human Language Tech-
cided to afford this risk, translating directly the         nologies: The 2009 Annual Conference of the North
tweets, instead of the lexicon, as we thought that in       American Chapter of the Association for Computa-
this last case the ambiguity could have been even           tional Linguistics, Companion Volume: Short Pa-
                                                            pers, pages 261–264.
greater, we just hoped that automatic translation is
by far more uncertain because of polysemy, lack           Attardi, G., Dell’Orletta, F., Simi, M., and Turian, J.
of flexive morphological information, and simi-             (2009). Accurate dependency parsing with a stacked
                                                            multilayer perceptron. Proceedings of EVALITA,
lar problems, as automatic translation skills are
                                                            9:1–8.
trained to solve at least at a first level of aproxima-
tion. In this view the use of an imperfect transla-       Basile, V., Croce, D., Di Maro, M., and Passaro, L. C.
tion, however, is able to capture part of the seman-        (2020). Evalita 2020: Overview of the 7th evalu-
                                                            ation campaign of natural language processing and
tic context in the texts, allowing us not to recur to       speech tools for italian. In Basile, V., Croce, D.,
lemmatization and further processes on the lexi-            Di Maro, M., and Passaro, L. C., editors, Pro-
con before translation. We choose to use a clas-            ceedings of Seventh Evaluation Campaign of Nat-
sic approach based on an SVM classifier in order            ural Language Processing and Speech Tools for
                                                            Italian. Final Workshop (EVALITA 2020), Online.
to make our results explainable, given the scholar
                                                            CEUR.org.
context in which this experience is grown. This
possibility would have been impossible if we had          Bravo-Marquez, F., Frank, E., Pfahringer, B., and Mo-
used Deep Neural Networks, whose processes are              hammad, S. M. (2019). Affectivetweets: a weka
                                                            package for analyzing affect in tweets. Journal of
not ”readable” from an external point of view. Fur-         Machine Learning Research, 20(92):1–6.
thermore, the size of the data-set distributed for
this challenge does not consent an affordable train-      Castellucci, G., Croce, D., and Basili, R. (2016). A lan-
ing with these systems. In this view, a compar-             guage independent method for generating large scale
                                                            polarity lexicons. In Proceedings of the Tenth In-
ison of results obtained in other stance detection          ternational Conference on Language Resources and
challenges, similar to that proposed here in Evalita        Evaluation (LREC’16), pages 38–45.
(Mohammad et al., 2016; Taulé et al., 2017; Lai
et al., 2017), give strength to our choice concern-       Cignarella, A. T., Lai, M., Bosco, C., Patti, V., and
                                                            Rosso, P. (2020). SardiStance@EVALITA2020:
ing the use of SVM that often outperform DNNs.              Overview of the Task on Stance Detection in Ital-
As Master students, we approached these NLP                 ian Tweets. In Basile, V., Croce, D., Di Maro,
topics for the first time. Therefore, we are aware          M., and Passaro, L. C., editors, Proceedings of
                                                            the 7th Evaluation Campaign of Natural Language
   5                                                        Processing and Speech Tools for Italian (EVALITA
     https://huggingface.co/Musixmatch/umberto-
commoncrawl-cased-v1                                        2020). CEUR-WS.org.
Dell’Orletta, F. (2009). Ensemble system for part-of-
  speech tagging. Proceedings of EVALITA, 9:1–8.
Giorgioni, S., Politi, M., Salman, S., Croce, D.,
  and Basili, R. (2020). UNITOR@Sardistance2020:
  Combining Transformer-based architectures and
  Transfer Learning for robust Stance Detection. In
  Basile, V., Croce, D., Di Maro, M., and Passaro,
  L. C., editors, Proceedings of the 7th Evaluation
  Campaign of Natural Language Processing and
  Speech Tools for Italian (EVALITA 2020). CEUR-
  WS.org.
Lai, M., Cignarella, A. T., Farı́as, D. I. H., Bosco, C.,
  Patti, V., and Rosso, P. (2020). Multilingual stance
  detection in social media political debates. Com-
  puter Speech & Language, page 101075.
Lai, M., Cignarella, Alessandra Teresa, H. F. D. I., et al.
  (2017). itacos at ibereval2017: Detecting stance in
  catalan and spanish tweets. In IberEval 2017, vol-
  ume 1881, pages 185–192. CEUR-WS. org.
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu,
 X., and Cherry, C. (2016). Semeval-2016 task 6:
 Detecting stance in tweets. In Proceedings of the
 10th International Workshop on Semantic Evalua-
 tion (SemEval-2016), pages 31–41.
Nielsen, F. Å. (2011). A new ANEW: evaluation of
  a word list for sentiment analysis in microblogs.
  In Rowe, M., Stankovic, M., Dadzie, A.-S., and
  Hardey, M., editors, Proceedings of the ESWC2011
  Workshop on ’Making Sense of Microposts’: Big
  things come in small packages, volume 718 of
  CEUR Workshop Proceedings, pages 93–98.

Taulé, M., Martı́, M. A., Rangel, F. M., Rosso, P.,
  Bosco, C., Patti, V., et al. (2017). Overview of
  the task on stance and gender detection in tweets on
  catalan independence at ibereval 2017. In 2nd Work-
  shop on Evaluation of Human Language Technolo-
  gies for Iberian Languages, IberEval 2017, volume
  1881, pages 157–177. CEUR-WS.