<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNITOR @ Sardistance2020: Combining Transformer-based Architectures and Transfer Learning for Robust Stance Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simone Giorgioni</string-name>
          <email>simone.giorgioni@alumni.uniroma2.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcello Politi</string-name>
          <email>marcello.politi@alumni.uniroma2.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samir Salman</string-name>
          <email>samir.salman@alumni.uniroma2.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Croce</string-name>
          <email>croce@info.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Basili</string-name>
          <email>basili@info.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Enterprise Engineering, University of Roma</institution>
          ,
          <addr-line>Tor Vergata Via del Politecnico 1, 00133 Roma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes the UNITOR system that participated to the Stance Detection in Italian tweets (Sardistance) task within the context of EVALITA 2020. UNITOR implements a transformer-based architecture whose accuracy is improved by adopting a Transfer Learning technique. In particular, this work investigates the possible contribution of three auxiliary tasks related to Stance Detection, i.e., Sentiment Detection, Hate Speech Detection and Irony Detection. Moreover, UNITOR relies on an additional dataset automatically downloaded and labeled through distant supervision. The UNITOR system ranked first in Task A within the competition. This confirms the effectiveness of Transformer-based architectures and the beneficial impact of the adopted strategies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Questo lavoro descrive
UNITOR, uno dei sistemi partecipanti
allo Stance Detection in Italian tweet
(SardiStance) task. UNITOR implementa
un’architettura neurale basata su
Transformer, la cui accuratezza viene
migliorata applicando un metodo di Transfer
Learning, che sfrutta le informazioni di tre
task ausiliari, ovvero Sentiment Detection,
Hate Speech Detection e Irony Detection.
Inoltre, l’addestramento di UNITOR puó
contare su un insieme di dati scaricati ed
etichettati automaticamente applicando
un semplice metodo di Distant
Supervision. Il sistema si é classificato al primo
posto nella competizione, confermando
l’efficacia delle architetture basate su
Transformer e il contributo delle strategie
adottate.</p>
      <p>Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        Stance detection aims at detecting if the author of
a text is in favor of a target topic, or against it
        <xref ref-type="bibr" rid="ref9">(Krejzl et al., 2017)</xref>
        . In this task, a text pair is generally
considered: one text expresses the topic, while the
other one reflects the author’s judgments. In a
possible variant to such a setting, the topic is implicit
within an entire document collection over which
the stance detection is applied.
      </p>
      <p>
        In this work, we will consider this last setting,
as defined in the in the Stance Detection in
Italian Tweets (SardiStance) task
        <xref ref-type="bibr" rid="ref6">(Cignarella et al.,
2020)</xref>
        within the EVALITA 2020
        <xref ref-type="bibr" rid="ref3">(Basile et al.,
2020)</xref>
        . A set of texts (here tweets) is provided,
almost all concerning the same topic, i.e., the
Sardines Movement1. The goal is to recognize if each
tweet is for or against (or neither) such target, only
exploiting textual information. According to the
task definition, this corresponds to the so-called
Task A. This is quite challenging problem, since
it requires at the same time to discover if a text
refers to the target topic and the author’s
orientation, only relying on short messages written in a
very conversational style.
      </p>
      <p>
        We thus present the UNITOR system
participating to the SardiStance task A. The system is
based on a Transformer-based architecture for text
classification
        <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
        that is directly
pre-trained over a large-scale document collection
written in Italian, namely UmBERTo. In a
nutshell, the adopted architecture, which has been
demonstrated achieving state-of-the-art results in
many NLP tasks
        <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
        , takes in
input a message and associates it to one of the target
classes indicating the stance. Moreover, due to the
task complexity and the small size of the dataset,
in order to improve the generalization
capabilities of the neural network, we adopted a
Transfer Learning approach
        <xref ref-type="bibr" rid="ref13">(Pan and Yang, 2010)</xref>
        . Our
main assumption is that Stance Detection is tied
to other tasks involving emotion and subjectivity
1https://en.wikipedia.org/wiki/Sardines_movement
analysis (such as Sentiment Analysis or Irony
Detection) even though important differences do exist
among them. As a simplified example, let us
consider a message such as “I like the Sardines
Movement”: it clearly expresses a positive sentiment,
also being in favour of the target topic. However,
a message such as “I like the EVALITA campaign.”
is positive as well but it does not express any
support or opposition to the Sardines (and it should be
associated to the None class). We thus speculate
that an automatic system trained over an auxiliary
task (e.g., Sentiment Classification) is beneficial,
but the transfer process must be carefully designed
in order to avoid catastrophic forgetting or
interference problems
        <xref ref-type="bibr" rid="ref12">(Mccloskey and Cohen, 1989)</xref>
        .
      </p>
      <p>
        In this work, we investigate the possible
contribution of three auxiliary tasks involving the
recognition of emotions according to different settings,
i.e., Sentiment Detection and Classification, Hate
Speech Detection and Irony Detection. We adopt
three different classifiers (one for each auxiliary
task) and use them to add additional information to
the tweets provided in the SardiStance dataset. As
an example, when considering the auxiliary task
involving Hate Detection, the corresponding
classifier will augment each input tweet by expressing
if this expresses hate or not. After this step, the
final classifier is expected to learn the association
between messages and the stance categories,
“being aware” (with some unavoidable noise) if the
message expresses some sort of hate, irony and
more generally, sentiment. Finally, we investigate
the possibility of augmenting the training
material by automatically downloading messages and
labeling them through distant supervision
        <xref ref-type="bibr" rid="ref8">(Go et
al., 2009)</xref>
        . We first selected few hashtags clearly
in favour (or not) of the target topic to download
and label a set of set of messages. Then, in order
to add a set of neutral messages, we selected a set
of news titles concerning the Sardines Movement.
      </p>
      <p>The UNITOR system ranked first in the
competition, suggesting that the combination of
the Transformer-based learning with the adopted
strategies of Transfer Learning and Data
Augmentation is beneficial. In the rest of the paper, Sec. 2
describes UNITOR. In Sec. 3, the evaluations are
reported while Sec. 4 derives the conclusions.</p>
    </sec>
    <sec id="sec-3">
      <title>2 Transformer-based architectures and</title>
    </sec>
    <sec id="sec-4">
      <title>Transfer Learning for Stance Detection</title>
      <p>The UNITOR system implements a
Transformerbased architecture described in Section 2.1. The
adopted auxiliary tasks are described in Section
2.2, while our Transfer leaning strategy is in
Section 2.3. Finally, an automatic strategy for Data
Augmentation is presented in Section 2.4.</p>
      <sec id="sec-4-1">
        <title>2.1 UNITOR as a Transformer-based</title>
      </sec>
      <sec id="sec-4-2">
        <title>Architecture</title>
        <p>
          The approach proposed in
          <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
          ,
namely Bidirectional Encoder Representations
from Transformers (BERT) provides a very
effective model to pre-train a deep and complex
neural network over large scale collections of non
annotated texts and to apply it to a large variety of
NLP tasks. The building block of BERT is the
Transformer element
          <xref ref-type="bibr" rid="ref15">(Vaswani et al., 2017)</xref>
          , an
attention-based mechanism that learns contextual
relations between words in a text. BERT provides
a sentence embedding (as well as the
contextualized lexical embeddings of words in the sentence)
through a pre-training stage aiming at the
acquisition of an expressive and robust language and text
model. The Transformer reads the entire input
sequence of words at once and is optimized through
two pre-training tasks. The first pre-training
objective is the (masked language modeling)
          <xref ref-type="bibr" rid="ref7">(Devlin
et al., 2019)</xref>
          . In addition, a Next Sentence
Prediction task is used to jointly pre-train text
embeddings able to soundly represent discourse level
information. This last objective operates on text-pair
representations and aims at modeling relational
information, e.g. between the consecutive sentences
in a text. On top of the produced embeddings,
BERT applies a fine-tuning stage devoted to adapt
the entire architecture to the targeted task.
        </p>
        <p>
          The fine-tuning process of BERT for sentence
classification (here adopted) operates on a single
texts or text pairs, which can be given in input to
BERT, in analogy with a next sentence prediction
task. The special token [CLS] is used as first
element of each input sequence and the embedding
produced by BERT are used in input to a linear
classifier customized for the target classification
task. While the BERT architecture is pre-trained
on large-scale corpora, its application to new tasks
is generally obtained by customizing the final
classifier to the targeted problem and fine-tuning all
the network parameters for few epochs, to avoid
catastrophic forgetting. In
          <xref ref-type="bibr" rid="ref10 ref11">(Liu et al., 2019b)</xref>
          RoBERTa is proposed as a variant of BERT which
modifies some key hyperparameters, including
removing the next-sentence pre-training objective,
and training on more data, with much larger
minibatches and learning rates. This allows RoBERTa
to improve on the masked language modeling
objective compared with BERT and leads to better
downstream task performances.
        </p>
        <p>UNITOR is based on a RoBERTa architecture
pre-trained over Italian texts: we adopted
UmBERTo2 which is pre-trained over a subset of the
OSCAR corpus, made of 11 billion tokens. These
architectures achieved state-of-the-art results in a
wide range of NLP tasks. However, they also
rely on large scale annotated datasets composed
of (possibly hundreds) thousands of examples. In
order to improve the quality of this architecture in
the SardiStance Task with a quite limited dataset,
we adopted a simple Transfer Learning strategy by
relying on the following three auxiliary tasks.</p>
      </sec>
      <sec id="sec-4-3">
        <title>2.2 Supporting UNITOR through Auxiliary tasks</title>
        <p>
          In this work, we speculate that the complexity of
the Stance detection task can be simplified
whenever the system to be trained is already aware if
input messages express some sort of Sentiment,
Irony or Hate. In order to expose UNITOR to such
information, we trained specific classifiers over
dedicated corpora made available in the previous
editions of EVALITA, as it follows:
Sentiment Detection and Classification. This
task consists in the automatic detection of
subjectivity (and the eventual positive or negative
polarity) in texts
          <xref ref-type="bibr" rid="ref14">(Pang and Lee, 2008)</xref>
          . Even though
the Stance Detection is clearly different from a
traditional task of Sentiment Analysis, we
speculate that they are nevertheless related. As an
example, we can suppose that the presence of
stance is more probable in messages expressing
subjectivity. We thus considered the setting
proposed in SENTIPOLC 2016
          <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2016)</xref>
          where a dataset of 8; 000 tweets is made
available. For each message, the presence of
subjectivity is made explicit and, eventually, the
positive and negative polarity. The labeling provided
in the dataset was slightly modified and mapped
to a classification problem over three classes: all
objective tweets were labeled with the special tag
&lt;neutrale&gt;, the subjective and positive
messages with &lt;positivo&gt; while the negative ones
with &lt;negativo&gt;3.
        </p>
        <sec id="sec-4-3-1">
          <title>2https://huggingface.co/Musixmatch/</title>
          <p>umberto-commoncrawl-cased-v1</p>
          <p>3We discarded the few available messages with mixed
polarity, to simplify the final classification task.</p>
          <p>
            Irony Detection. We speculate that a robust
detection of stance requires the recognition of irony,
which can even reverse the output of the
classification task. For example a false stance can be
expressed through a ironic message, such as “Le
Sardine sono il futuro passato dell’Italia”4. The
objective of Irony Detection is to detect whether
a given message is ironic or not. We used the
dataset provided IronITA 2018
            <xref ref-type="bibr" rid="ref5">(Cignarella et al.,
2018)</xref>
            , where a dataset of 4; 800 labeled messages
is made available. We adopted the original binary
classification task, mapping ironic messages to the
&lt;ironico&gt; and &lt;non ironico&gt; labels.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Hate Speech Detection. Being against a topic</title>
        <p>
          can be often expressed through messages
expressing also hate. We thus introduce also the Hate
Speech Detection task, which involves the
automatic recognition of hateful contents. We
considered the setting proposed in HaSpeeDe 2018
          <xref ref-type="bibr" rid="ref4">(Bosco et al., 2018)</xref>
          , where a dataset of 3; 000
messages is made available. We adopted the original
binary classification task: we mapped messages
expressing hate with the &lt;odio&gt; label and &lt;non
odio&gt; in the other case.
2.3
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Transferring auxiliary tasks in the</title>
      </sec>
      <sec id="sec-4-6">
        <title>Transformer-based learning</title>
        <p>
          In order to transfer the information from each
auxiliary task into UNITOR, we first trained a
specific UmBERTo-based sentence classifier on each
of the datasets described in the previous section.
In each case, the standard parameters proposed
in
          <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
          are used to fine-tune the
model5. After these three training steps, the
entire SardiStance dataset is processed by each of the
three classifiers and the resulting labels are used to
“augment” the input messages. In particular, these
labels generated a sort of new sentence, which is
paired with the corresponding message. The
following example shows how a tweet6 against the
movement is used in input to UNITOR:
“[CLS] negativo ironico odio [SEP]
#elezioniregionali Le Sardine aiuteranno a
salvare il Paese! #mafammilpiacere Sono proprio
dei bei perdigiorno falliti! [SEP]”
Consistently with
          <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
          , the first
4In English: “Sardines are the future past of Italy”
5The number of epochs was tuned over a development set
made of 10% of the corresponding dataset and the best epoch
was selected by maximizing the classification accuracy.
        </p>
        <p>
          6In English: “#regionalelections The Sardines will help to
save the country! #please They’re just a bunch of losers!”
pseudo-token [CLS] is added to generate the
embedding used in input in the final linear
classifier. Then, the pseudo-sentence “negativo
ironico odio” suggests that the message expresses
negative polarity and hate through the adoption of
irony. Finally, between the [SEP] pseudo-tokens,
the original message is reported. This particular
schema resembles the classification of text pairs
used in relational learning tasks, such as in
Textual Entailment
          <xref ref-type="bibr" rid="ref7">(Devlin et al., 2019)</xref>
          . The output
of the auxiliary classifiers defines a sort of
hypothesis, i.e., the authors aims at expressing a negative
sentiment through an ironic message which also
expresses hate, while the original message is the
direct consequence, i.e., the “implied” message7.
The UNITOR model is thus an UmBERTo-based
classifier trained over text pairs, where the first
element encodes the information derived from the
auxiliary tasks and the second one is the original
message. Even though the quality of this
labeling process can introduce noise (due to incorrectly
classified messages) this augmented input is
expected to simplify the final training process, by
explicitly providing information about sentiment,
hate and irony.
2.4
        </p>
      </sec>
      <sec id="sec-4-7">
        <title>Distant Supervision for Stance Detection</title>
        <p>
          In order to balance the limited amount of
available data (especially considering the complexity
of the task) we augmented the training material by
labeling additional messages via Distant
Supervision
          <xref ref-type="bibr" rid="ref8">(Go et al., 2009)</xref>
          . We speculate that a tweet
containing an hashtag such as #vivalesardine (in
English: #ILikeSarine) is in favour to Sardines
instead of a tweet containing for example
#sardinefritte (in English: #friedSardine) is against
to our target. Hence, we downloaded from the
TWITA corpus
          <xref ref-type="bibr" rid="ref2">(Basile and Nissim, 2013)</xref>
          3; 200
tweets and labeled them via Distant Supervision.
In particular, the following subset are derived:
1; 500 tweets against the movement since
containing #gatticonsalvini and 1,000 tweets in favour,
since containing #nessunotocchilesardine,
#iostoconlesardine, #unmaredisardine, #vivalesardine
or #forzasardine. Finally, to enlarge the subset of
messages without stance, 700 neutral statements
were downloaded, which are actually titles from
news, derived by querying “sardine” in Google
7We investigate different ways to encode this information,
even using complex sentences, but negligible differences in
the tuning process were measured, so we applied the simplest
schema.
news. In the experimental evaluations discussed
in the next section, this dataset of “silver” data is
simply added to the training material. To avoid
over-fitting, we removed 90% of the occurrences
of the hashtags used as query in the new data.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3 Results and Discussion</title>
      <p>
        UNITOR participated to Task A - Textual Stance
Detection
        <xref ref-type="bibr" rid="ref6">(Cignarella et al., 2020)</xref>
        where the
available dataset is composed by 2,132 tweets
concerning the Sardines Movement: 1,028 tweets
are against the movement (label Against), 589
tweets in favour of it (label Favour) and 515
tweets do not express any stance about the target
topic (label None).
      </p>
      <p>As discussed in Section 2, UNITOR is based
on the UmBERTo pre-trained model, which
relies on the RoBERTa architecture. For
parameter tuning, we adopted a 10-cross fold validation,
so that the training material is divided in 10 folds,
each split according to 90%-10% proportion. The
model is trained using a standard Cross-entropy
Loss and an ADAM optimizer initialized with a
learning rate set to 2 10 5 and linearly decreased
during the training process. We trained the model
for 5 epochs, using a batch size of 32 elements.
At test time, an Ensemble of such classifiers is
used: each message is in fact classified using all
10 models trained in the different folds and the
label suggested by the highest number of classifiers
is selected. In the Task A, we submitted two
constrained runs, i.e., system considering only tweets
from the competition, and two unconstrained ones,
where additional tweets were acquired and labeled
by applying the approach presented in Section 2.2.
All models are implemented using Pytorch8 and
experiments were run over Google Colab9.</p>
      <p>Results are reported in Table 1 in terms of
Precision, Recall and F1 scores obtained by the
different models with respect to each label. The final
rank considers the average F1 (F1-avg) between
the Favour and Against classes.</p>
      <p>
        First of all, the high complexity of this task is
confirmed by the results obtained by the strong
Baseline method (the last row). It is a Support
Vector Machine trained over a simple
Bag-ofWord model
        <xref ref-type="bibr" rid="ref6">(Cignarella et al., 2020)</xref>
        and achieves
an average F1 of 57:84%, being competitive with
many systems participating to the task and
ranking 13th over 22 submissions. One important
re
      </p>
      <sec id="sec-5-1">
        <title>8https://pytorch.org/ 9http://colab.research.google.com/</title>
        <p>sult is obtained by the straight application of the
UmBERTo model over the original messages (next
to last row in Table 1). In fact, this
Transformerbased architecture, empowered with the
Ensemble technique, achieves an average F1 of 65:69%:
a system which directly applies an Ensemble of
UmBERTo-based models would have ranked 6th
in the competition.</p>
        <p>We thus trained UmBERTo by adopting the
Transfer Learning approach presented in Section
2.3 in the constrained setting. The adoption of
all the three auxiliary tasks led to the constrained
submission called UNITOR_c_2. Moreover, we
considered the training of UmBERTo by
considering one auxiliary task at a time. When
considering only the Hate Speech Detection task, better
results were obtained over the development set,
with respect to the adoption of the other tasks
taken individually, i.e., Sentiment Detection and
Irony Detection10. Such a variant, called
UNITOR_c_1, considers tweets enriched only with
information derived by the hate classifier and it
generally shows higher precision with respect to
the Against class. This suggests that a tweet
expressing hate is more likely in opposition to
the Sardines Movement. Both constrained
models ranked 3rd and 2nd in the competition,
respectively. These results are impressive as they both
outperformed of about 2% of absolute F1 the
standard UmBERTo. Moreover, they confirm the
beneficial impact of Hate Speech Detection as an
auxiliary task. Finally, we augmented the training
dataset by using the additional data presented in
Section 2.2. We extended the training material
used to train UNITOR_c_2 in order to obtain the
unconstrained submission called UNITOR_u_2. It
is worth noticing that all three auxiliary tasks were
used in this submission. This led to a performance
drop, i.e. a 66:06% of average F1, which is lower
10The results of this tuning stage were not reported here
for lack of space.
with respect to the best opponent system, which
achieved a 66:21% of F1. It seems that the noise
added both from the auxiliary tasks and the
additional data, negatively impacted the overall
quality. On the contrary, when only the Hate Speech
Detection task is considered (i.e., UNITOR_u_1)
additional data are positively capitalized by the
model, achieving the best average F1 score in the
competition, i.e. 68:53%. These results suggest
that the combination of the Transformer-based
learning with the adopted strategies of Transfer
Learning and Data Augmentation is highly
beneficial, when only Hate is considered.</p>
        <p>From an error analysis, it seems that a
significant number of incorrect classifications occurred
in longer and complex messages, where the topic
of the stance is not clearly explicit nor captured
by the UmBERTo model, such as in “#carfagna:
“io per i liberali che non si affidano a Salvini” e
“dalle sardine buone idee”. Auto-scacco in due
mosse. Con la Polverini poi...”11. This message is
considered to be Against while the system
assigns the label None. Here, it is very challenging
to understand the connection between the “good
ideas of the sardines” and the very colloquial
expression “Auto-scacco” which can be translated as
“She messed herself ”. The same appears in the
tweet “Ho finalmente capito chi mi ricordava
Mattia Santori, quello delle sardine: Lodo Guenzi. (e
infatti in quanto a democristianitá stiamo lá)”12
which again labeled Against but classified as
None. Clearly the system is not able to link
the movement to its leader nor to the negative
opinion about belonging to the Christian
Democrat Party. Another example is the tweet “Dopo
11In English: “#carfagna: "come with me liberals who
do not rely on Salvini" and "from Sardines movement good
ideas." She messed herself up with two moves. Not to
mention Polverini...”</p>
        <p>12In English: “I finally understood who reminded me of
Mattia Santori, the one with the Sardines movement: Lodo
Guenzi. (in fact as far as Christian Democrats are concerned
they are pretty the same).)”
avere ascoltato @luigidimaio mi viene in mente
una sola parola:grazie. Fiducia nelle sue scelte
e immenso rispetto per i grandi risultati ottenuti.
Ora un nuovo inizio, con un nuovo entusiamo.
Andiamo versogli #statigenerali con serietà e
maturità. Forza@mov5stelle!”13. Here the system
incorrectly assigns the Favour label because the
tweet is in favour of a different movement.
4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        In this work we present the results obtained by
the UNITOR system, which participated to the
SardiStance task. UNITOR ranked first in Task
A, both for constrained and unconstrained runs.
These results confirm the beneficial impact of
Transformer based architecture for text
classification also in the Stance Detection task.
Moreover, we demonstrate the beneficial impact of Hate
Speech Detection as an auxiliary task in a Transfer
Learning setting. Finally, we empirically
demonstrate that the adoption of Distance Supervision
is useful to reduce data sparseness. Future work
will apply the above approaches to task B within
SardiStance. Moreover, we will investigate
multitask learning approaches
        <xref ref-type="bibr" rid="ref10 ref11">(Liu et al., 2019a)</xref>
        to
capitalize information from auxiliary tasks in a more
principled way.
      </p>
      <p>13In English: “After listening to @luigidimaio only one
expression came to my mind: thank you. I have trust in
his choices and a huge respect for the great results
obtained. Now it’s a new start, with new enthusiasm. Let’s
move towards the #statigenerali with seriousness and
maturity.Forza@mov5stars”</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the evalita 2016 sentiment polarity classification task</article-title>
          .
          <source>In Proceedings of EVALITA</source>
          <year>2016</year>
          , Napoli, Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          , volume
          <volume>1749</volume>
          <source>of CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Sentiment analysis on italian tweets</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          , Atlanta.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 hate speech detection task</article-title>
          . In EVALITA@CLiC-it.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Simona Frenda, Valerio Basile, Cristina Bosco, Viviana Patti,
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 task on irony detection in Italian tweets (IronITA)</article-title>
          .
          <source>In Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Mirko Lai, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>SardiStance@EVALITA2020: Overview of the Task on Stance Detection in Italian Tweets</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ). CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of NAACL 2019</source>
          , pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Go</surname>
          </string-name>
          , Richa Bhayani, and
          <string-name>
            <given-names>Lei</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          .
          <source>Technical report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Krejzl</surname>
          </string-name>
          , Barbora Hourová, and
          <string-name>
            <given-names>Josef</given-names>
            <surname>Steinberger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Stance detection in online discussions</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Xiaodong</given-names>
            <surname>Liu</surname>
          </string-name>
          , Pengcheng He,
          <string-name>
            <surname>Weizhu Chen</surname>
          </string-name>
          , and
          <article-title>Jianfeng Gao. 2019a. Multi-task deep neural networks for natural language understanding</article-title>
          .
          <source>In Proceedings of ACL</source>
          , pages
          <fpage>4487</fpage>
          -
          <lpage>4496</lpage>
          , Florence, Italy, July.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          . 2019b.
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR, abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Mccloskey</surname>
          </string-name>
          and
          <string-name>
            <given-names>Neil J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>Catastrophic interference in connectionist networks: The sequential learning problem</article-title>
          .
          <source>The Psychology of Learning and Motivation</source>
          ,
          <volume>24</volume>
          :
          <fpage>104</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>S.J.</given-names>
            <surname>Pan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A Survey on Transfer Learning</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>22</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Found. Trends Inf. Retr.</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          - 2):
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <article-title>Ł ukasz Kaiser, and</article-title>
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          . In I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>