<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How “BERTology” Changed the State-of-the-Art also for Italian NLP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Tamburini</string-name>
          <email>fabio.tamburini@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FICLIT - University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at evaluating the impact of these models in enhancing application performance for Italian establishing the new state-of-the-art for some fundamental NLP tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The introduction of contextualised word
embeddings, starting with ELMo
        <xref ref-type="bibr" rid="ref22">(Peters et al., 2018)</xref>
        and
in particular with BERT
        <xref ref-type="bibr" rid="ref11">(Devlin et al., 2019)</xref>
        and
the subsequent BERT-inspired transformer models
        <xref ref-type="bibr" rid="ref17 ref18 ref26">(Liu et al., 2019; Martin et al., 2020; Sanh et al.,
2019)</xref>
        , marked a strong revolution in Natural
Language Processing, boosting the performance of
almost all applications and especially those based
on statistical analysis and Deep Neural Networks
(DNN).
      </p>
      <p>
        A recent study
        <xref ref-type="bibr" rid="ref1 ref15">(He and Choi, 2019)</xref>
        tried to
determine the new baselines for several NLP tasks
for English fixing the new state-of-the-art for the
examined tasks. This work aims at doing a
similar process also for Italian. We considered a
number of relevant tasks applying state-of-the-art
neural models available to the community and fed
them with all the contextualised word embeddings
specifically developed for Italian.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Italian “BERTology”</title>
      <p>The availability of various powerful
computational solutions for the community allowed for</p>
      <p>Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
the development of some BERT-derived models
trained specifically on big Italian corpora of
various textual types. All these models have been
taken into account for our evaluation. In
particular we considered those models that, at the time
of writing, are the only one available for Italian:
Multilingual BERT1: with the first BERT
release, Google developed also a
multilingual model (‘bert-base-multilingual-cased’ –
bertMC) that can be applied also for
processing Italian texts.</p>
      <p>
        AlBERTo2: last year, a research group
from the University of Bari developed a
brand new model for Italian especially
devoted to Twitter texts and social media
(‘m-polignano-uniba/bert uncased L-12
H768 A-12 italian alb3rt0’ – alUC) trained
by using 200 millions tweets from 2012
to 2015
        <xref ref-type="bibr" rid="ref23">(Polignano et al., 2019)</xref>
        . Only the
uncased model is available to the
community. Due to the specific training of alUC, it
requires a particular pre-processing step for
replacing hashtags, urls, etc. that alter the
official tokenisation, rendering it not really
applicable to word-based classification tasks
in general texts; thus, it will be used only for
working on twitter or social media data. In
any case we tested it in all considered tasks
and, whenever results were reasonable, we
reported them.
      </p>
      <p>
        GilBERTo3: it is a rather new CamemBERT
Italian model
(‘idb-ita/gilberto-uncasedfrom-camembert’ – giUC) trained by using
the huge Italian Web corpus section of the
OSCAR
        <xref ref-type="bibr" rid="ref21">(Ortis Sua´rez et al., 2019)</xref>
        Webcorpus project consisting of more than 11
1https://github.com/google-research/bert
2https://github.com/marcopoli/AlBERTo-it
3https://github.com/idb-ita/GilBERTo
billions of tokens. Also for GilBERTo it is
available only the uncased model.
the entire development set maintaining the same
epoch for the early stopping.
      </p>
      <p>UmBERTo4: the more recent model
developed explicitly for Italian, as far as we
know, is UmBERTo
(‘Musixmatch/umbertocommoncrawl-cased-v1’ – umC). As well as
GilBERTo, it has been trained by using
OSCAR, but the produced model, differently
from GilBERTo, is cased.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Tasks</title>
      <p>Following the work of He and Choi (2019), we
selected some basic tasks both for word and
sentence/text classification. We mainly concentrated
our efforts on tasks for which evaluation
procedures were well established in the Italian
community and reliable evaluation benchmark were
available. We choose (a) two very basic
wordclassification tasks, namely part-of-speech (PoS)
tagging and Named Entity Recognition (NER),
(b) the dependency parsing task and (c) two
very important tasks for social-media text
classification, namely Sentiment Analysis
(Subjectivity/Polarity/Irony classification) and Hate Speech
Detection (HSD).</p>
      <p>We mainly relied on some benchmark
proposed in one of the past EVALITA evaluation
challenges5 or the Universal Dependencies (UD)
project6.</p>
      <p>
        After the influential paper from
        <xref ref-type="bibr" rid="ref13 ref25 ref4">(Reimers and
Gurevych, 2017)</xref>
        it is clear to the community that
reporting a single score for each DNN training
session could be heavily affected by the system
initialisation point and we should instead report the
mean and standard deviation of various runs with
the same setting in order to get a more accurate
picture of the real systems performance and make
more reliable comparisons between them. Thus
any new result proposed in this paper is presented
as the mean and standard deviation of at least 5
runs.
      </p>
      <p>With regard to the dataset splitting, if a specific
dataset was already split in training/validation/test
set, we adopted this subdivision, while, if the
dataset was split only in development and test set,
we split it and used the training/validation sets for
training and tuning the stopping epoch and, once
fixed that parameter, we retrained the system on
4https://github.com/musixmatchresearch/umberto
5http://www.evalita.it
6https://universaldependencies.org
3.1</p>
      <sec id="sec-3-1">
        <title>Part-of-Speech Tagging</title>
        <p>
          The first task we worked on is the part-of-speech
tagging. This is a very basic task in NLP and a
lot of applications rely on precise PoS-tag
assignments. There are various data sets available for
this task taken from one of the EVALITA 2007
tasks
          <xref ref-type="bibr" rid="ref27">(Tamburini, 2007)</xref>
          and from the UD
annotated corpora.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>System</title>
        <p>
          <xref ref-type="bibr" rid="ref28">(Tamburini, 2016)</xref>
          Fine-TuninggiUC
Fine-TuningbertMC
Fine-TuningumC
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>EVALITA 2007</title>
        <p>98.18
98.75 0.04
98.80 0.05
99.10 0.04</p>
        <p>
          The best results for the EVALITA 2007 data set
has been obtained by
          <xref ref-type="bibr" rid="ref28">(Tamburini, 2016)</xref>
          using a
BiLSTM-CRF system based on word2vec word
embeddings enriched with morphological
information. For UD corpora we considered the ISDT
corpus v2.5 and PoSTWITA: there are no
evaluation data in literature for the ISDT corpus while
for PoSTWITA the best results were obtained by
          <xref ref-type="bibr" rid="ref4">(Basile et al., 2017)</xref>
          using a BiLSTM-CRF system
and by the best system at EVALITA 2016
          <xref ref-type="bibr" rid="ref10 ref12 ref7 ref9">(Cimino
and Dell’Orletta, 2016a)</xref>
          .
        </p>
        <p>The PoS-tagging system used for our
experiments is very simple and consist of a slight
modification to the fine tuning script ‘run ner.py’
available with the version 2.7.0 of the
Huggingface/Transformers package7. We did not employ
any hyperparameter tuning, the validation set has
been used only for determining the stopping
criterion.</p>
        <p>Tables 1, 2 and 3 show the results obtained by
fine tuning the considered BERT-derived models
for this task. A very relevant increase in
performance w.r.t. the literature is evident by looking at
the results and UmBERTo is consistently the best
system.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.1.1 PoS-tagging on Speech Data</title>
        <p>
          We pa
          <xref ref-type="bibr" rid="ref29">rticipated to the EVALITA 2020</xref>
          KIPOS
challenge
          <xref ref-type="bibr" rid="ref5">(Bosco et al., 2020)</xref>
          for evaluating
PoStaggers on speech data by using exactly the same
tagger. In this case, we did not make any
parameter tuning: we used the basic parameters and
stopped the training phase after 10 epochs. After
the challenge, we evaluated all the BERT-derived
models in order to propose a complete overview of
the available resources.
        </p>
        <p>Tables 4 show the results obtained by fine
tuning all the considered BERT-derived models for
the Main Task. A very relevant increase in
performance w.r.t. the other participants is evident
looking at the results and UmBERTo is again the
best system.</p>
        <p>We did not participate at the official challenge
for the two subtasks, but we included the results of
our best system also for these tasks. Table 5 shows
the results compared with the other two
participating systems. Again, the simple fine tuning of a
BERT-derived model, namely UnBERTo, exhibits
the best performance on Sub-task B. The scarcity
of data could probably affect the results on
Subtask A.
3.2</p>
      </sec>
      <sec id="sec-3-5">
        <title>Named Entity Recognition</title>
        <p>
          The second task we considered is Named
Entity Recognition. For system evaluation we
relied on the nice evaluation benchmark used in the
EVALITA 2009 campaign
          <xref ref-type="bibr" rid="ref3">(Bartalesi Lenzi et al.,
2009)</xref>
          . The best results gathered from
literature are due to
          <xref ref-type="bibr" rid="ref4">(Basile et al., 2017)</xref>
          that used a
7https://huggingface.co/transformers/
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>System</title>
        <p>
          <xref ref-type="bibr" rid="ref16 ref24">(Izzi and Ferilli, 2020)</xref>
          <xref ref-type="bibr" rid="ref16 ref24">(Proisl and Lapesa, 2020)</xref>
          Fine-TuningbertMC
Fine-TuningalUC
Fine-TuninggiUC
Fine-TuningumC
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>System</title>
        <p>
          <xref ref-type="bibr" rid="ref16 ref24">(Izzi and Ferilli, 2020)</xref>
          Fine-TuningumC
          <xref ref-type="bibr" rid="ref16 ref24">(Proisl and Lapesa, 2020)</xref>
          <xref ref-type="bibr" rid="ref16 ref24">(Izzi and Ferilli, 2020)</xref>
          <xref ref-type="bibr" rid="ref16 ref24">(Proisl and Lapesa, 2020)</xref>
          Fine-TuningumC
        </p>
        <p>
          BiLSTM-CRF system and to the best system at the
EVALITA 2009 campaign
          <xref ref-type="bibr" rid="ref30">(Zanoli et al., 2009)</xref>
          .
        </p>
        <p>For this task we used exactly the same script of
the previous task, being both tasks simple
wordclassification tasks, and did not apply any
hyperparameter tuning at all, fixing a priori the number
of epoch to 10.</p>
        <p>Table 6 outlines the obtained results. Again
a simple fine tuning of BERT-derived models is
enough powerful to guarantee relevant increases of
performance with respect to the previous literature
and, again, UmBERTo resulted the model
producing the best performance.</p>
      </sec>
      <sec id="sec-3-8">
        <title>System</title>
        <p>
          <xref ref-type="bibr" rid="ref30">(Zanoli et al., 2009)</xref>
          <xref ref-type="bibr" rid="ref4">(Basile et al., 2017)</xref>
          Fine-TuninggiUC
Fine-TuningbertMC
Fine-TuningumC
Parsing is one of the most important tasks in NLP
and the recent advances due to DNN and
contextualised distributed representations allowed for large
performance improvements.
        </p>
        <p>Universal Dependencies project is the reference
repository for standardised treebanks in various
languages, thus it seemed natural to gather
evaluation benchmarks from that project. As for
PoStagging, we used two treebanks from UD v2.5,
namely ISDT and PoSTWITA.</p>
        <p>
          The recent work from Antonelli and Tamburini
(2019) examined all the DNN parsers available at
the time re-training them on some Italian dataset.
In particular they showed that the neural parser
from Dozat and Manning (2017) (version 1.0) was
the parser exhibiting the best performance on
UDISDT v2.1. Giving that experience, we included
in our new experiments the last version (v3.0)
of this parser8 considering it as a strong
baseline for this task. The word embeddings we used
for these experiments were the same used in
          <xref ref-type="bibr" rid="ref1 ref15">(Antonelli and Tamburini, 2019)</xref>
          and are computed
using the ItWaC corpus
          <xref ref-type="bibr" rid="ref2">(Baroni et al., 2009)</xref>
          and
word2vec
          <xref ref-type="bibr" rid="ref19 ref20">(Mikolov et al., 2013a,b)</xref>
          .
        </p>
        <p>
          Very recently, a new wo
          <xref ref-type="bibr" rid="ref29">rk from Vacareanu et al.
(2020</xref>
          ) showed that we can efficiently compute
dependency parsing structures by treating this task
as a double fine tuning task over a BERT-derived
model, the first for determining the attachments
and the second the edge labels, getting
state-ofthe-art performance. Actually, the fine-tuning
DNN is more complex than in the previous tasks,
consisting of a bidirectional LSTM followed by
some dense layers.
        </p>
        <p>We applied their method and code (PaT) for
our parsing experiments using the greedy cycle
removal option. We changed text case depending on
the BERT-derived model case used in a specific
experiment. Tables 7 and 8 show the results for all
the parsing experiments.</p>
        <p>
          Considering the best results obtained by the
Dozat and Manning (2017) parser and those
presented in
          <xref ref-type="bibr" rid="ref1 ref15">(Antonelli and Tamburini, 2019)</xref>
          , we
observe a relevant increase in performance due
mainly to GilBERTo and UmBERTo.
        </p>
      </sec>
      <sec id="sec-3-9">
        <title>3.4 Sentiment Analysis</title>
        <p>Three main text-classification tasks are comprised
in the ‘Sentiment Analysis’ umbrella:
Subjectiv</p>
      </sec>
      <sec id="sec-3-10">
        <title>System</title>
        <p>(Antonelli and Tamburini, 2019)</p>
        <sec id="sec-3-10-1">
          <title>PaTbertMC</title>
          <p>(Dozat and Manning, 2018)</p>
        </sec>
        <sec id="sec-3-10-2">
          <title>PaTumC PaTgiUC</title>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>UD-PoSTW v2.5</title>
      </sec>
      <sec id="sec-3-12">
        <title>System UAS LAS</title>
        <p>
          PaTbertMC 87.97 0.20 82.03 0.24
          <xref ref-type="bibr" rid="ref14 ref8">(Dozat and Manning, 2018)</xref>
          88.04 0.13 84.08 0.10
PaTalUC 88.19 0.32 82.66 0.38
PaTumC 89.16 0.17 83.25 0.23
PaTgiUC 89.29 0.27 83.66 0.22
ity, Polarity and Irony detection. Thanks to the
EVALITA SENTIPOLC 2016 evaluation we could
rely on a complete dataset annotated with respect
to all the three tasks.
        </p>
        <p>
          Given the specific nature of dataset texts,
namely tweet texts, we adopted the particular
preprocessing procedure introduced by AlBERTo and
all the other parameters were kept as in
          <xref ref-type="bibr" rid="ref23">(Polignano et al., 2019)</xref>
          for comparability; the only
difference regards the training batch size that was
512 on TPU in the original paper and we had to
use gradient accumulation on GPU (batch size =
32 and accumulation steps = 16) to avoid memory
problems. Given the small size of the dataset and
the high variability of the various results, for these
tasks we decided to make 10 runs instead of 5.
        </p>
      </sec>
      <sec id="sec-3-13">
        <title>System</title>
        <p>
          TensorFlow+TPUalUC
Fine-TuningbertMC
          <xref ref-type="bibr" rid="ref7">(Castellucci et al., 2016)</xref>
          Fine-TuningalUC
Fine-TuningumC
Fine-TuninggiUC
          <xref ref-type="bibr" rid="ref23">(Polignano et al., 2019)</xref>
          (alUC)
        </p>
        <p>We slighly modified the script ‘run glue.py’
from the version 2.7.0 of the
Huggingface/Transformers package considering the
three tasks as a BERT-derived model fine-tuning
for text classification tasks respectively with 2, 4
and 2 classes.</p>
        <p>Tables 9, 10 and 11 present the obtained
results. We have to say that we had a lot of
problems in reproducing the results in Polignano et al.
(2019), both by using our script and also by
using the original TPU-based script on Google
Colab. In the cited tables, you can find the
original results and the ones produced by us using
the same script and setting marked by an asterisk
(TensorFlow+TPUalUC).
3.5</p>
      </sec>
      <sec id="sec-3-14">
        <title>Hate Speech Detection</title>
        <p>Hate Speech on social media has become a
relevant problem in recent years and the automatic
detection of such messages got a great importance in
NLP.</p>
        <p>
          Thanks to the dataset produ
          <xref ref-type="bibr" rid="ref6">ced by Bosco et al.
(2018</xref>
          ) we had the possibility to test the same text
        </p>
      </sec>
      <sec id="sec-3-15">
        <title>System</title>
        <p>
          Fine-TuningbertM
          <xref ref-type="bibr" rid="ref6">C
(Cimino et al., 2018</xref>
          )
Fine-TuningumC
Fine-TuningalUC
Fine-TuninggiUC
classification procedures we used for Sentiment
Analysis also for this task both on Facebook and
Twitter data. Table 12 shows the results we
obtained comparing them with the best system at
the EVALITA 2018 HaSpeDe
          <xref ref-type="bibr" rid="ref6">campaign (Cimino
et al., 2018</xref>
          ). GilBERTo exhibit the best
performance on both subtasks.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusions</title>
      <p>The starting idea of this work was to derive the
new state-of-the-art for some NLP tasks for
Italian after the ‘BERT-revolution’ thanks to the
recent availability of Italian BERT-derived models.
Looking at the results presented in previous
sections for some very important tasks, we can
certainly conclude that BERT-derived models,
specifically trained on Italian texts, allow for a large
increase in performance also for some important
Italian NLP tasks. On the contrary, the
multilingual BERT model developed by Google was not
able to produce good results and should not be
used when are available specific models for the
studied language.</p>
      <p>A side, and sad, consideration that emerges
from this study regards the complexity of the
models. All the DNN models used in this work for
the various tasks involved very simple fine-tuning
processes of some BERT-derived model. Machine
learning and Deep learning changed completely
the approaches to NLP solutions, but never before
we were in a situation in which a single
methodological approach can solve different NLP
problems always establishing the state-of-the-art for
that problem. And we did not apply any
parameter tuning at all! The only optimisation regards
the early stopping definition on validation set. By
tuning all the hyperparameters, it is reasonable we
can further increase the overall performance.</p>
      <p>For the future, it would be interesting to
evaluate end-to-end systems, for example for solving
PoS-tagging + Parsing and PoS-tagging + NER by
using the BERT-derived model fine tuning code
and PaT for both end-to-end tasks.</p>
      <p>A lot of scholars are working in studying new
transformer-based models or training the most
promising ones on different languages; there are
brand new Italian models that were made available
very recently not yet included into our evaluations
like the one produced by Stefan Schweter at CIS,
LMU Munich9; it would be interesting to insert
them into our tests.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We gratefully acknowledge the support of
NVIDIA Corporation with the donation of the
Titan Xp GPU used for this research.
9https://github.com/stefan-it/fine-tuned-berts-seq</p>
      <p>J.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Antonelli</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>State-of-theart Italian dependency parsers based on neural and ensemble systems</article-title>
          .
          <source>Italian Journal of Computational Linguistics</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>33</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bernardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferraresi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The wacky wide web: a collection of very large linguistically processed web-crawled corpora</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Bartalesi Lenzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Speranza</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>EVALITA 2009 The Entity Recognition Task</article-title>
          .
          <source>In Proceedings of the EVALITA 2009 Workshop</source>
          , Reggio Emilia, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , G. Semeraro, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cassotti</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Bidirectional LSTM-CNNs-CRF for Italian Sequence Labeling</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ), pages
          <fpage>18</fpage>
          -
          <lpage>23</lpage>
          , Roma, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          , S. Ballare`,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cerruti</surname>
          </string-name>
          , E. Goria, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Mauri</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KIPoS@EVALITA2020: Overview of the Task on KIParla Part of Speech tagging</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Hate Speech Detection Task</article-title>
          . In
          <source>In Proc. of the EVALITA 2018 Workshop</source>
          , Torino, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Castellucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Basili</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Context-aware Convolutional Neural Networks for Twitter Sentiment Analysis in Italian</article-title>
          .
          <source>In Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          , L. De Mattei, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-task Learning in Deep Neural Networks at EVALITA 2018</article-title>
          .
          <source>In In Proc. of the EVALITA 2018 Workshop</source>
          , Torino, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell</surname>
          </string-name>
          <article-title>'Orletta. 2016a. Building the state-of-the-art in POS tagging of Italian Tweets</article-title>
          .
          <source>In Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell</surname>
          </string-name>
          <article-title>'Orletta. 2016b. Tandem LSTM-SVM Approach for Sentiment Analysis</article-title>
          .
          <source>In Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>M.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Rosa</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Durante</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Tweet2Check evaluation at Evalita Sentipolc 2016</article-title>
          .
          <source>In Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Dozat</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep Biaffine Attention for Neural Dependency Parsing</article-title>
          .
          <source>In Proceedings of the 2017 International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Dozat</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Simpler but more accurate semantic dependency parsing</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>484</fpage>
          -
          <lpage>490</lpage>
          , Melbourne, Australia.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Choi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Establishing strong baselines for the new decade: Sequence tagging, syntactic and semantic parsing with BERT</article-title>
          .
          <source>In The Thirty-Third International Flairs Conference, AAAI Publications</source>
          , pages
          <fpage>228</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>G.L.</given-names>
            <surname>Izzi</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>A hybrid approach for part-of-speech tagging</article-title>
          .
          <source>In Proceedings of the Seventh International Workshop EVALITA</source>
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR, abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Muller</surname>
          </string-name>
          , P.J. Ortiz Sua´rez,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          , E. de la Clergerie,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>CamemBERT: a tasty French language model</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>7203</fpage>
          -
          <lpage>7219</lpage>
          , Online. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . 2013a.
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>In Proc. of Workshop</source>
          at ICLR.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . 2013b.
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          , USA. Curran Associates Inc.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>P.J. Ortis</surname>
          </string-name>
          <article-title>Sua´rez, B</article-title>
          .
          <string-name>
            <surname>Sagot</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Romary</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures</article-title>
          .
          <source>In 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)</source>
          , Cardiff, United Kingdom.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>M.E. Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In Proc. of NAACL-HLT</source>
          <year>2018</year>
          , pages
          <fpage>2227</fpage>
          -
          <lpage>2237</lpage>
          , New Orleans, Louisiana.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ), Bari, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Proisl</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lapesa</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Klumsy: Experiments on part-of-speech tagging of spoken italian</article-title>
          .
          <source>In Proceedings of the Seventh International Workshop EVALITA</source>
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>338</fpage>
          -
          <lpage>348</lpage>
          , Copenhagen, Denmark. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          .
          <source>In Proc. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>EVALITA 2007: the Part-ofSpeech Tagging Task</article-title>
          .
          <source>Intelligenza Artificiale</source>
          ,
          <source>IV(2)</source>
          :
          <fpage>4</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>(Better than) State-of-the-Art PoS-tagging for Italian Texts</article-title>
          .
          <source>In Proceedings of the Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ), pages
          <fpage>280</fpage>
          -
          <lpage>284</lpage>
          , Napoli, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Vacareanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.C.</given-names>
            <surname>Gouveia Barbosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Valenzuela-Esca</surname>
          </string-name>
          ´rcega, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Parsing as tagging</article-title>
          .
          <source>In Proceedings of The 12th Language Resources and Evaluation Conference</source>
          , pages
          <fpage>5225</fpage>
          -
          <lpage>5231</lpage>
          , Marseille, France. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanoli</surname>
          </string-name>
          , E. Pianta, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Giuliano</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Named entity recognition through redundancy driven classifiers</article-title>
          .
          <source>In Proceedings of the Workshop EVALITA</source>
          <year>2009</year>
          ,
          <string-name>
            <given-names>Reggio</given-names>
            <surname>Emilia</surname>
          </string-name>
          , Italy.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>