UNITOR @ Sardistance2020: Combining Transformer-based
       Architectures and Transfer Learning for Robust Stance Detection
    Simone Giorgioni, Marcello Politi, Samir Salman, Danilo Croce and Roberto Basili
          Department of Enterprise Engineering, University of Roma, Tor Vergata
                        Via del Politecnico 1, 00133 Roma, Italy
             {simone.giorgioni,marcello.politi,samir.salman}@alumni.uniroma2.eu
                                        {croce,basili}@info.uniroma2.it


                        Abstract                                1       Introduction
    English. This paper describes the UNI-                      Stance detection aims at detecting if the author of
    TOR system that participated to the Stance                  a text is in favor of a target topic, or against it (Kre-
    Detection in Italian tweets (Sardistance)                   jzl et al., 2017). In this task, a text pair is generally
    task within the context of EVALITA 2020.                    considered: one text expresses the topic, while the
    UNITOR implements a transformer-based                       other one reflects the author’s judgments. In a pos-
    architecture whose accuracy is improved                     sible variant to such a setting, the topic is implicit
    by adopting a Transfer Learning tech-                       within an entire document collection over which
    nique. In particular, this work investigates                the stance detection is applied.
    the possible contribution of three auxil-                      In this work, we will consider this last setting,
    iary tasks related to Stance Detection, i.e.,               as defined in the in the Stance Detection in Ital-
    Sentiment Detection, Hate Speech Detec-                     ian Tweets (SardiStance) task (Cignarella et al.,
    tion and Irony Detection. Moreover, UN-                     2020) within the EVALITA 2020 (Basile et al.,
    ITOR relies on an additional dataset auto-                  2020). A set of texts (here tweets) is provided,
    matically downloaded and labeled through                    almost all concerning the same topic, i.e., the Sar-
    distant supervision. The UNITOR system                      dines Movement1 . The goal is to recognize if each
    ranked first in Task A within the compe-                    tweet is for or against (or neither) such target, only
    tition. This confirms the effectiveness of                  exploiting textual information. According to the
    Transformer-based architectures and the                     task definition, this corresponds to the so-called
    beneficial impact of the adopted strategies.                Task A. This is quite challenging problem, since
                                                                it requires at the same time to discover if a text
    Italiano. Questo lavoro descrive UN-
                                                                refers to the target topic and the author’s orienta-
    ITOR, uno dei sistemi partecipanti
                                                                tion, only relying on short messages written in a
    allo Stance Detection in Italian tweet
                                                                very conversational style.
    (SardiStance) task. UNITOR implementa
                                                                   We thus present the UNITOR system partici-
    un’architettura neurale basata su Trans-
                                                                pating to the SardiStance task A. The system is
    former, la cui accuratezza viene miglio-
                                                                based on a Transformer-based architecture for text
    rata applicando un metodo di Transfer
                                                                classification (Devlin et al., 2019) that is directly
    Learning, che sfrutta le informazioni di tre
                                                                pre-trained over a large-scale document collection
    task ausiliari, ovvero Sentiment Detection,
                                                                written in Italian, namely UmBERTo. In a nut-
    Hate Speech Detection e Irony Detection.
                                                                shell, the adopted architecture, which has been
    Inoltre, l’addestramento di UNITOR puó
                                                                demonstrated achieving state-of-the-art results in
    contare su un insieme di dati scaricati ed
                                                                many NLP tasks (Devlin et al., 2019), takes in in-
    etichettati automaticamente applicando
                                                                put a message and associates it to one of the target
    un semplice metodo di Distant Supervi-
                                                                classes indicating the stance. Moreover, due to the
    sion. Il sistema si é classificato al primo
                                                                task complexity and the small size of the dataset,
    posto nella competizione, confermando
                                                                in order to improve the generalization capabili-
    l’efficacia delle architetture basate su
                                                                ties of the neural network, we adopted a Trans-
    Transformer e il contributo delle strategie
                                                                fer Learning approach (Pan and Yang, 2010). Our
    adottate.
                                                                main assumption is that Stance Detection is tied
     Copyright © 2020 for this paper by its authors. Use per-   to other tasks involving emotion and subjectivity
mitted under Creative Commons License Attribution 4.0 In-
                                                                    1
ternational (CC BY 4.0).                                                https://en.wikipedia.org/wiki/Sardines_movement
analysis (such as Sentiment Analysis or Irony De-        adopted auxiliary tasks are described in Section
tection) even though important differences do exist      2.2, while our Transfer leaning strategy is in Sec-
among them. As a simplified example, let us con-         tion 2.3. Finally, an automatic strategy for Data
sider a message such as “I like the Sardines Move-       Augmentation is presented in Section 2.4.
ment”: it clearly expresses a positive sentiment,
also being in favour of the target topic. However,       2.1    UNITOR as a Transformer-based
a message such as “I like the EVALITA campaign.”                Architecture
is positive as well but it does not express any sup-     The approach proposed in (Devlin et al., 2019),
port or opposition to the Sardines (and it should be     namely Bidirectional Encoder Representations
associated to the None class). We thus speculate         from Transformers (BERT) provides a very effec-
that an automatic system trained over an auxiliary       tive model to pre-train a deep and complex neu-
task (e.g., Sentiment Classification) is beneficial,     ral network over large scale collections of non an-
but the transfer process must be carefully designed      notated texts and to apply it to a large variety of
in order to avoid catastrophic forgetting or inter-      NLP tasks. The building block of BERT is the
ference problems (Mccloskey and Cohen, 1989).            Transformer element (Vaswani et al., 2017), an
   In this work, we investigate the possible contri-     attention-based mechanism that learns contextual
bution of three auxiliary tasks involving the recog-     relations between words in a text. BERT provides
nition of emotions according to different settings,      a sentence embedding (as well as the contextual-
i.e., Sentiment Detection and Classification, Hate       ized lexical embeddings of words in the sentence)
Speech Detection and Irony Detection. We adopt           through a pre-training stage aiming at the acquisi-
three different classifiers (one for each auxiliary      tion of an expressive and robust language and text
task) and use them to add additional information to      model. The Transformer reads the entire input se-
the tweets provided in the SardiStance dataset. As       quence of words at once and is optimized through
an example, when considering the auxiliary task          two pre-training tasks. The first pre-training ob-
involving Hate Detection, the corresponding clas-        jective is the (masked language modeling) (Devlin
sifier will augment each input tweet by expressing       et al., 2019). In addition, a Next Sentence Predic-
if this expresses hate or not. After this step, the      tion task is used to jointly pre-train text embed-
final classifier is expected to learn the association    dings able to soundly represent discourse level in-
between messages and the stance categories, “be-         formation. This last objective operates on text-pair
ing aware” (with some unavoidable noise) if the          representations and aims at modeling relational in-
message expresses some sort of hate, irony and           formation, e.g. between the consecutive sentences
more generally, sentiment. Finally, we investigate       in a text. On top of the produced embeddings,
the possibility of augmenting the training mate-         BERT applies a fine-tuning stage devoted to adapt
rial by automatically downloading messages and           the entire architecture to the targeted task.
labeling them through distant supervision (Go et            The fine-tuning process of BERT for sentence
al., 2009). We first selected few hashtags clearly       classification (here adopted) operates on a single
in favour (or not) of the target topic to download       texts or text pairs, which can be given in input to
and label a set of set of messages. Then, in order       BERT, in analogy with a next sentence prediction
to add a set of neutral messages, we selected a set      task. The special token [CLS] is used as first el-
of news titles concerning the Sardines Movement.         ement of each input sequence and the embedding
   The UNITOR system ranked first in the com-            produced by BERT are used in input to a linear
petition, suggesting that the combination of             classifier customized for the target classification
the Transformer-based learning with the adopted          task. While the BERT architecture is pre-trained
strategies of Transfer Learning and Data Augmen-         on large-scale corpora, its application to new tasks
tation is beneficial. In the rest of the paper, Sec. 2   is generally obtained by customizing the final clas-
describes UNITOR. In Sec. 3, the evaluations are         sifier to the targeted problem and fine-tuning all
reported while Sec. 4 derives the conclusions.           the network parameters for few epochs, to avoid
                                                         catastrophic forgetting. In (Liu et al., 2019b)
2   Transformer-based architectures and                  RoBERTa is proposed as a variant of BERT which
    Transfer Learning for Stance Detection               modifies some key hyperparameters, including re-
The UNITOR system implements a Transformer-              moving the next-sentence pre-training objective,
based architecture described in Section 2.1. The         and training on more data, with much larger mini-
batches and learning rates. This allows RoBERTa                Irony Detection. We speculate that a robust de-
to improve on the masked language modeling ob-                 tection of stance requires the recognition of irony,
jective compared with BERT and leads to better                 which can even reverse the output of the classi-
downstream task performances.                                  fication task. For example a false stance can be
   UNITOR is based on a RoBERTa architecture                   expressed through a ironic message, such as “Le
pre-trained over Italian texts: we adopted Um-                 Sardine sono il futuro passato dell’Italia”4 . The
BERTo2 which is pre-trained over a subset of the               objective of Irony Detection is to detect whether
OSCAR corpus, made of 11 billion tokens. These                 a given message is ironic or not. We used the
architectures achieved state-of-the-art results in a           dataset provided IronITA 2018 (Cignarella et al.,
wide range of NLP tasks. However, they also                    2018), where a dataset of 4, 800 labeled messages
rely on large scale annotated datasets composed                is made available. We adopted the original binary
of (possibly hundreds) thousands of examples. In               classification task, mapping ironic messages to the
order to improve the quality of this architecture in           <ironico> and <non ironico> labels.
the SardiStance Task with a quite limited dataset,             Hate Speech Detection. Being against a topic
we adopted a simple Transfer Learning strategy by              can be often expressed through messages express-
relying on the following three auxiliary tasks.                ing also hate. We thus introduce also the Hate
                                                               Speech Detection task, which involves the auto-
2.2    Supporting UNITOR through Auxiliary                     matic recognition of hateful contents. We con-
       tasks                                                   sidered the setting proposed in HaSpeeDe 2018
In this work, we speculate that the complexity of              (Bosco et al., 2018), where a dataset of 3, 000 mes-
the Stance detection task can be simplified when-              sages is made available. We adopted the original
ever the system to be trained is already aware if              binary classification task: we mapped messages
input messages express some sort of Sentiment,                 expressing hate with the <odio> label and <non
Irony or Hate. In order to expose UNITOR to such               odio> in the other case.
information, we trained specific classifiers over
dedicated corpora made available in the previous               2.3    Transferring auxiliary tasks in the
editions of EVALITA, as it follows:                                   Transformer-based learning
Sentiment Detection and Classification. This                   In order to transfer the information from each aux-
task consists in the automatic detection of subjec-            iliary task into UNITOR, we first trained a spe-
tivity (and the eventual positive or negative polar-           cific UmBERTo-based sentence classifier on each
ity) in texts (Pang and Lee, 2008). Even though                of the datasets described in the previous section.
the Stance Detection is clearly different from a               In each case, the standard parameters proposed
traditional task of Sentiment Analysis, we spec-               in (Devlin et al., 2019) are used to fine-tune the
ulate that they are nevertheless related. As an                model5 . After these three training steps, the en-
example, we can suppose that the presence of                   tire SardiStance dataset is processed by each of the
stance is more probable in messages expressing                 three classifiers and the resulting labels are used to
subjectivity. We thus considered the setting pro-              “augment” the input messages. In particular, these
posed in SENTIPOLC 2016 (Barbieri et al., 2016)                labels generated a sort of new sentence, which is
where a dataset of 8, 000 tweets is made avail-                paired with the corresponding message. The fol-
able. For each message, the presence of subjec-                lowing example shows how a tweet6 against the
tivity is made explicit and, eventually, the posi-             movement is used in input to UNITOR:
tive and negative polarity. The labeling provided              “[CLS] negativo ironico odio [SEP]
in the dataset was slightly modified and mapped                #elezioniregionali Le Sardine aiuteranno a
to a classification problem over three classes: all            salvare il Paese! #mafammilpiacere Sono proprio
objective tweets were labeled with the special tag             dei bei perdigiorno falliti! [SEP]”
<neutrale>, the subjective and positive mes-
sages with <positivo> while the negative ones                  Consistently with (Devlin et al., 2019), the first
with <negativo>3 .                                                4
                                                                    In English: “Sardines are the future past of Italy”
                                                                  5
                                                                    The number of epochs was tuned over a development set
    2
      https://huggingface.co/Musixmatch/                       made of 10% of the corresponding dataset and the best epoch
umberto-commoncrawl-cased-v1                                   was selected by maximizing the classification accuracy.
    3                                                             6
      We discarded the few available messages with mixed po-        In English: “#regionalelections The Sardines will help to
larity, to simplify the final classification task.             save the country! #please They’re just a bunch of losers!”
pseudo-token [CLS] is added to generate the em-                   news. In the experimental evaluations discussed
bedding used in input in the final linear classi-                 in the next section, this dataset of “silver” data is
fier. Then, the pseudo-sentence “negativo iron-                   simply added to the training material. To avoid
ico odio” suggests that the message expresses neg-                over-fitting, we removed 90% of the occurrences
ative polarity and hate through the adoption of                   of the hashtags used as query in the new data.
irony. Finally, between the [SEP] pseudo-tokens,                  3       Results and Discussion
the original message is reported. This particular
                                                                  UNITOR participated to Task A - Textual Stance
schema resembles the classification of text pairs
                                                                  Detection (Cignarella et al., 2020) where the avail-
used in relational learning tasks, such as in Tex-
                                                                  able dataset is composed by 2,132 tweets con-
tual Entailment (Devlin et al., 2019). The output
                                                                  cerning the Sardines Movement: 1,028 tweets
of the auxiliary classifiers defines a sort of hypoth-
                                                                  are against the movement (label Against), 589
esis, i.e., the authors aims at expressing a negative
                                                                  tweets in favour of it (label Favour) and 515
sentiment through an ironic message which also
                                                                  tweets do not express any stance about the target
expresses hate, while the original message is the
                                                                  topic (label None).
direct consequence, i.e., the “implied” message7 .
                                                                     As discussed in Section 2, UNITOR is based
The UNITOR model is thus an UmBERTo-based
                                                                  on the UmBERTo pre-trained model, which re-
classifier trained over text pairs, where the first el-
                                                                  lies on the RoBERTa architecture. For parame-
ement encodes the information derived from the
                                                                  ter tuning, we adopted a 10-cross fold validation,
auxiliary tasks and the second one is the original
                                                                  so that the training material is divided in 10 folds,
message. Even though the quality of this label-
                                                                  each split according to 90%-10% proportion. The
ing process can introduce noise (due to incorrectly
                                                                  model is trained using a standard Cross-entropy
classified messages) this augmented input is ex-
                                                                  Loss and an ADAM optimizer initialized with a
pected to simplify the final training process, by
                                                                  learning rate set to 2 · 10−5 and linearly decreased
explicitly providing information about sentiment,
                                                                  during the training process. We trained the model
hate and irony.
                                                                  for 5 epochs, using a batch size of 32 elements.
2.4    Distant Supervision for Stance Detection                   At test time, an Ensemble of such classifiers is
                                                                  used: each message is in fact classified using all
In order to balance the limited amount of avail-
                                                                  10 models trained in the different folds and the la-
able data (especially considering the complexity
                                                                  bel suggested by the highest number of classifiers
of the task) we augmented the training material by
                                                                  is selected. In the Task A, we submitted two con-
labeling additional messages via Distant Supervi-
                                                                  strained runs, i.e., system considering only tweets
sion (Go et al., 2009). We speculate that a tweet
                                                                  from the competition, and two unconstrained ones,
containing an hashtag such as #vivalesardine (in
                                                                  where additional tweets were acquired and labeled
English: #ILikeSarine) is in favour to Sardines
                                                                  by applying the approach presented in Section 2.2.
instead of a tweet containing for example #sar-
                                                                  All models are implemented using Pytorch8 and
dinefritte (in English: #friedSardine) is against
                                                                  experiments were run over Google Colab9 .
to our target. Hence, we downloaded from the
                                                                     Results are reported in Table 1 in terms of Pre-
TWITA corpus (Basile and Nissim, 2013) 3, 200
                                                                  cision, Recall and F1 scores obtained by the dif-
tweets and labeled them via Distant Supervision.
                                                                  ferent models with respect to each label. The final
In particular, the following subset are derived:
                                                                  rank considers the average F1 (F1-avg) between
1, 500 tweets against the movement since contain-
                                                                  the Favour and Against classes.
ing #gatticonsalvini and 1,000 tweets in favour,
                                                                     First of all, the high complexity of this task is
since containing #nessunotocchilesardine, #ios-
                                                                  confirmed by the results obtained by the strong
toconlesardine, #unmaredisardine, #vivalesardine
                                                                  Baseline method (the last row). It is a Support
or #forzasardine. Finally, to enlarge the subset of
                                                                  Vector Machine trained over a simple Bag-of-
messages without stance, 700 neutral statements
                                                                  Word model (Cignarella et al., 2020) and achieves
were downloaded, which are actually titles from
                                                                  an average F1 of 57.84%, being competitive with
news, derived by querying “sardine” in Google
                                                                  many systems participating to the task and rank-
    7
      We investigate different ways to encode this information,   ing 13th over 22 submissions. One important re-
even using complex sentences, but negligible differences in
                                                                      8
the tuning process were measured, so we applied the simplest              https://pytorch.org/
                                                                      9
schema.                                                                   http://colab.research.google.com/
                                           F1                                 Rec                            Prec
 Rk     System
                           avg       Against Favor       None      Against   Favor      None      Against   Favor     None
   1    UNITOR_u_1        68.53%     78.66% 58.40%       39.10%    76.01%    57.65%    45.35%     81.50%    59.16%    34.36%
   2    UNITOR_c_1        68.01%     78.81% 57.21%       39.79%    74.66%    63.78%    43.60%     83.43%    51.87%    36.59%
   3    UNITOR_c_2        67.93%     79.39% 56.47%       36.72%    77.09%    61.22%    37.79%     81.83%    52.40%    35.71%
   4    Opponent_c_1      66.21%     75.80% 56.63%       42.13%    68.60%    64.29%    52.91%     84.69%    50.60%    35.00%
   5    UNITOR_u_2        66.06%     76.89% 55.22%       37.02%    72.64%    56.63%    44.77%     81.67%    53.88%    31.56%
   6    UmBERTo           65.69%     77.41% 53.97%       35.93%    74.12%    57.14%    40.11%     81.00%    51.14%    32.54%
  13    Baseline          57.84%     71.58% 44.09%       27.64%    68.06%    49.49%    29.65%     75.49%    39.75%    25.89%

Table 1: Results obtained by UNITOR at the SardiStance task. In bold best results for each measure. In
the system name "c" and "u" refer to constrained and unconstrained runs.

sult is obtained by the straight application of the               with respect to the best opponent system, which
UmBERTo model over the original messages (next                    achieved a 66.21% of F1. It seems that the noise
to last row in Table 1). In fact, this Transformer-               added both from the auxiliary tasks and the addi-
based architecture, empowered with the Ensem-                     tional data, negatively impacted the overall qual-
ble technique, achieves an average F1 of 65.69%:                  ity. On the contrary, when only the Hate Speech
a system which directly applies an Ensemble of                    Detection task is considered (i.e., UNITOR_u_1)
UmBERTo-based models would have ranked 6th                        additional data are positively capitalized by the
in the competition.                                               model, achieving the best average F1 score in the
   We thus trained UmBERTo by adopting the                        competition, i.e. 68.53%. These results suggest
Transfer Learning approach presented in Section                   that the combination of the Transformer-based
2.3 in the constrained setting. The adoption of                   learning with the adopted strategies of Transfer
all the three auxiliary tasks led to the constrained              Learning and Data Augmentation is highly ben-
submission called UNITOR_c_2. Moreover, we                        eficial, when only Hate is considered.
considered the training of UmBERTo by consid-                        From an error analysis, it seems that a signif-
ering one auxiliary task at a time. When consid-                  icant number of incorrect classifications occurred
ering only the Hate Speech Detection task, better                 in longer and complex messages, where the topic
results were obtained over the development set,                   of the stance is not clearly explicit nor captured
with respect to the adoption of the other tasks                   by the UmBERTo model, such as in “#carfagna:
taken individually, i.e., Sentiment Detection and                 “io per i liberali che non si affidano a Salvini” e
Irony Detection10 . Such a variant, called UN-                    “dalle sardine buone idee”. Auto-scacco in due
ITOR_c_1, considers tweets enriched only with                     mosse. Con la Polverini poi...”11 . This message is
information derived by the hate classifier and it                 considered to be Against while the system as-
generally shows higher precision with respect to                  signs the label None. Here, it is very challenging
the Against class. This suggests that a tweet                     to understand the connection between the “good
expressing hate is more likely in opposition to                   ideas of the sardines” and the very colloquial ex-
the Sardines Movement. Both constrained mod-                      pression “Auto-scacco” which can be translated as
els ranked 3rd and 2nd in the competition, respec-                “She messed herself ”. The same appears in the
tively. These results are impressive as they both                 tweet “Ho finalmente capito chi mi ricordava Mat-
outperformed of about 2% of absolute F1 the stan-                 tia Santori, quello delle sardine: Lodo Guenzi. (e
dard UmBERTo. Moreover, they confirm the ben-                     infatti in quanto a democristianitá stiamo lá)”12
eficial impact of Hate Speech Detection as an aux-                which again labeled Against but classified as
iliary task. Finally, we augmented the training                   None. Clearly the system is not able to link
dataset by using the additional data presented in                 the movement to its leader nor to the negative
Section 2.2. We extended the training material                    opinion about belonging to the Christian Demo-
used to train UNITOR_c_2 in order to obtain the                   crat Party. Another example is the tweet “Dopo
unconstrained submission called UNITOR_u_2. It                       11
                                                                        In English: “#carfagna: "come with me liberals who
is worth noticing that all three auxiliary tasks were             do not rely on Salvini" and "from Sardines movement good
used in this submission. This led to a performance                ideas." She messed herself up with two moves. Not to men-
drop, i.e. a 66.06% of average F1, which is lower                 tion Polverini...”
                                                                     12
                                                                        In English: “I finally understood who reminded me of
                                                                  Mattia Santori, the one with the Sardines movement: Lodo
   10
      The results of this tuning stage were not reported here     Guenzi. (in fact as far as Christian Democrats are concerned
for lack of space.                                                they are pretty the same).)”
avere ascoltato @luigidimaio mi viene in mente                Cristina Bosco, Felice Dell’Orletta, Fabio Poletto,
una sola parola:grazie. Fiducia nelle sue scelte                M. Sanguinetti, and M. Tesconi. 2018. Overview
                                                                of the evalita 2018 hate speech detection task. In
e immenso rispetto per i grandi risultati ottenuti.
                                                                EVALITA@CLiC-it.
Ora un nuovo inizio, con un nuovo entusiamo. An-
diamo versogli #statigenerali con serietà e matu-             Alessandra Teresa Cignarella, Simona Frenda, Valerio
                                                                Basile, Cristina Bosco, Viviana Patti, Paolo Rosso,
rità. Forza@mov5stelle!”13 . Here the system in-                et al. 2018. Overview of the EVALITA 2018 task
correctly assigns the Favour label because the                  on irony detection in Italian tweets (IronITA). In
tweet is in favour of a different movement.                     Sixth Evaluation Campaign of Natural Language
                                                                Processing and Speech Tools for Italian (EVALITA
4   Conclusion                                                  2018), volume 2263, pages 1–6.
In this work we present the results obtained by               Alessandra Teresa Cignarella, Mirko Lai, Cristina
the UNITOR system, which participated to the                    Bosco, Viviana Patti, and Paolo Rosso. 2020.
SardiStance task. UNITOR ranked first in Task                   SardiStance@EVALITA2020: Overview of the Task
                                                                on Stance Detection in Italian Tweets. In Valerio
A, both for constrained and unconstrained runs.                 Basile, Danilo Croce, Maria Di Maro, and Lucia C.
These results confirm the beneficial impact of                  Passaro, editors, Proceedings of the 7th Evalua-
Transformer based architecture for text classifi-               tion Campaign of Natural Language Processing and
cation also in the Stance Detection task. More-                 Speech Tools for Italian (EVALITA 2020). CEUR-
over, we demonstrate the beneficial impact of Hate              WS.org.
Speech Detection as an auxiliary task in a Transfer           Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Learning setting. Finally, we empirically demon-                 Kristina Toutanova. 2019. BERT: Pre-training of
                                                                 deep bidirectional transformers for language under-
strate that the adoption of Distance Supervision
                                                                 standing. In Proceedings of NAACL 2019, pages
is useful to reduce data sparseness. Future work                 4171–4186, Minneapolis, Minnesota, June.
will apply the above approaches to task B within
                                                              Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-
SardiStance. Moreover, we will investigate multi-               ter sentiment classification using distant supervision.
task learning approaches (Liu et al., 2019a) to cap-            Technical report.
italize information from auxiliary tasks in a more
                                                              Peter Krejzl, Barbora Hourová, and Josef Steinberger.
principled way.                                                 2017. Stance detection in online discussions.
                                                              Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jian-
References                                                      feng Gao. 2019a. Multi-task deep neural networks
Francesco Barbieri, Valerio Basile, Danilo Croce,               for natural language understanding. In Proceedings
  Malvina Nissim, Nicole Novielli, and Viviana Patti.           of ACL, pages 4487–4496, Florence, Italy, July.
  2016. Overview of the evalita 2016 sentiment polar-         Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
  ity classification task. In Proceedings of EVALITA            dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
  2016, Napoli, Italy, December 5-7, 2016, volume               Luke Zettlemoyer, and Veselin Stoyanov. 2019b.
  1749 of CEUR Workshop Proceedings.                            Roberta: A robustly optimized BERT pretraining
Valerio Basile and Malvina Nissim. 2013. Sentiment              approach. CoRR, abs/1907.11692.
  analysis on italian tweets. In Proceedings of the 4th       Michael Mccloskey and Neil J. Cohen. 1989. Catas-
  Workshop on Computational Approaches to Subjec-               trophic interference in connectionist networks: The
  tivity, Sentiment and Social Media Analysis, pages            sequential learning problem. The Psychology of
  100–107, Atlanta.                                             Learning and Motivation, 24:104–169.
Valerio Basile, Danilo Croce, Maria Di Maro, and Lu-          S.J. Pan and Q. Yang. 2010. A Survey on Transfer
  cia C. Passaro. 2020. Evalita 2020: Overview                   Learning. IEEE Transactions on Knowledge and
  of the 7th evaluation campaign of natural language             Data Engineering, 22(10):1345–1359.
  processing and speech tools for italian. In Valerio
  Basile, Danilo Croce, Maria Di Maro, and Lucia C.           Bo Pang and Lillian Lee. 2008. Opinion mining and
  Passaro, editors, Proceedings of Seventh Evalua-              sentiment analysis. Found. Trends Inf. Retr., 2(1-
  tion Campaign of Natural Language Processing and              2):1–135.
  Speech Tools for Italian. Final Workshop (EVALITA
  2020). CEUR-WS.org.                                         Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
                                                                Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz
   13
      In English: “After listening to @luigidimaio only one     Kaiser, and Illia Polosukhin. 2017. Attention is
expression came to my mind: thank you. I have trust in          all you need. In I. Guyon, U. V. Luxburg, S. Ben-
his choices and a huge respect for the great results ob-        gio, H. Wallach, R. Fergus, S. Vishwanathan, and
tained. Now it’s a new start, with new enthusiasm. Let’s        R. Garnett, editors, Advances in Neural Information
move towards the #statigenerali with seriousness and matu-      Processing Systems 30, pages 5998–6008.
rity.Forza@mov5stars”