<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Tamburini</string-name>
          <email>fabio.tamburini@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FICLIT - University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>English. The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at applying simple fine-tuning methods for producing highperformance solutions at the EVALITA KIPOS PoS-tagging task (Bosco et al., 2020). Italian. L'utilizzazione di word embedding contestuali ha consentito notevoli incrementi nelle performance dei sistemi automatici sviluppati per affrontare vari task nell'ambito dell'elaborazione del linguaggio naturale. Recentemente sono stati introdotti alcuni nuovi modelli sviluppati specificatamente per la lingua italiana. Lo scopo di questo lavoro e` valutare se un semplice fine-tuning di questi modelli sia sufficiente per ottenere performance di alto livello nel task KIPOS di EVALITA 2020.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The introduction of contextualised word
embeddings, starting with ELMo (Peters et al., 2018) and
in particular with BERT (Devlin et al., 2019) and
the subsequent BERT-inspired transformer
models
        <xref ref-type="bibr" rid="ref3 ref4 ref6">(Liu et al., 2019; Martin et al., 2020; Sanh
et al., 2019)</xref>
        , marked a strong revolution in
Natural Language Processing (NLP), boosting the
performance of almost all applications and especially
those based on statistical analysis and Deep
Neural Networks (DNN).
      </p>
      <p>Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>
        This work heavily refers to an upcoming work
of the same author
        <xref ref-type="bibr" rid="ref7">(Tamburini, 2020)</xref>
        experimenting various contextualised word embeddings
for Italian to a number of different tasks and it
is aimed at applying simple fine-tuning methods
for producing high-performance solutions at the
EVALITA KIPOS PoS-tagging task
        <xref ref-type="bibr" rid="ref1 ref2">(Bosco et al.,
2020; Basile et al., 2020)</xref>
        .
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Italian “BERTology”</title>
      <p>
        The availability of various powerful
computational solutions for the community allowed for
the development of some BERT-derived models
trained specifically on big Italian corpora of
various textual types. All these models have been
taken into account for our evaluation. In
particular we considered those models that, at the time
of writing, are the only one available for Italian:
• Multilingual BERT1: with the first BERT
release Google developed also a
multilingual model (‘bert-base-multilingual-cased’ –
bertMC) that can be applied also for
processing Italian texts.
• AlBERTo2: last year a research group
from the University of Bari developed a
brand new model for Italian especially
devoted to Twitter texts and social media
(‘m-polignano-uniba/bert uncased
L12 H-768 A-12 italian alb3rt0’ – alUC)
(Polignano et al., 2019). Only the uncased
model is available to the community. Due
to the specific training of alUC, it requires a
particular pre-processing step for replacing
hashtags, urls, etc. that alter the official
tokenisation, rendering it not really
applicable to word-based classification tasks in
general texts; thus, it will be used only for
1https://github.com/google-research/bert
2https://github.com/marcopoli/AlBERTo-it
working on twitter or social media data. In
any case we tested it in all considered tasks
and, whenever results were reasonable, we
reported them.
• GilBERTo3: it is a rather new CamemBERT
Italian model
(‘idb-ita/gilberto-uncasedfrom-camembert’ – giUC) trained by using
the huge Italian Web corpus section of the
OSCAR
        <xref ref-type="bibr" rid="ref5">(Ortis Sua´rez et al., 2019)</xref>
        project.
Also for GilBERTo it is available only the
uncased model.
• UmBERTo4: the more recent model
developed explicitly for Italian, as far as we
know, is UmBERTo
(‘Musixmatch/umbertocommoncrawl-cased-v1’ – umC). As well as
GilBERTo, it has been trained by using
OSCAR, but the produced model, differently
from GilBERTo, is cased.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>KIPOS 2020 PoS-tagging Task</title>
      <p>Part-of-speech tagging is a very basic task in NLP
and a lot of applications rely on precise PoS-tag
assignments. Spoken data present further
challenges for PoS-taggers: small datasets for system
training, short training sentences, less constrained
language, the massive presence of interjections,
etc. are all examples of phenomena that increase
the difficulties for building reliable automatic
systems.</p>
      <p>
        The PoS-tagging system used for our
experiments is very simple and consist of a slight
modification to the fine tuning script ‘run ner.py’
available with the version 2.7.0 of the
Huggingface/Transformers package5. We did not employ
any hyperparameter tuning, and, as the stopping
criterion, we fixed the number of epoch to 10
and chose the UmBERTo model on the basis of
the previous experience
        <xref ref-type="bibr" rid="ref7">(Tamburini, 2020)</xref>
        . After
the challenge, we evaluated all the BERT-derived
models in order to propose a complete overview of
the available resources.
      </p>
      <p>Table 1 shows the results obtained by fine
tuning all the considered BERT-derived models for
the Main Task. A very relevant increase in
performance w.r.t. the other participants is evident
looking at the results and UmBERTo is consistently the
best system.</p>
      <p>3https://github.com/idb-ita/GilBERTo
4https://github.com/musixmatchresearch/umberto
5https://github.com/huggingface/transformers</p>
      <sec id="sec-3-1">
        <title>System</title>
        <sec id="sec-3-1-1">
          <title>Fine-TuningumC</title>
          <p>Fine-TuninggiUC
Fine-TuningalUC
Fine-TuningbertMC
2nd ranked system
3rd ranked system</p>
          <p>We did not participate at the official challenge
for the two subtasks, but we included the results of
our best system also for these tasks into this report.
Tables 2 and 3 show the results compared with the
other two participating systems.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>System</title>
        <sec id="sec-3-2-1">
          <title>Other Participant 1</title>
          <p>Fine-TuningumC
Other participant 2</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Sub-Task A Accuracy</title>
      </sec>
      <sec id="sec-3-4">
        <title>Form. Inform. Both</title>
        <p>87.37 87.58 87.48
86.47 83.16 84.75
78.73 75.79 77.20</p>
        <p>Again, the simple fine tuning of a BERT-derived
model, namely UnBERTo, exhibits the best
performance on Sub-task B. The small amount of data
could probably affect the results on Sub-task A.</p>
        <p>We collected the most frequent errors produced
by the proposed system: Table 4 shows that,
unexpectedly, the most frequent misclassifications
involve grammatical words. The typical behaviour
of the classical PoS-taggers tend to wrongly
classify lexical words, namely nouns, verbs and
adjectives, intermixing their classes. Apparently, on
this dataset, grammatical words appear to be more
complex to classify than lexical words. This
behaviour should be investigated more appropriately
by using bigger datasets and better consistency
checks on the annotated data.
The starting idea of this work was to design the
simplest DNN model for Italian PoS-tagging after
the ‘BERT-revolution’ thanks to the recent
availability of Italian BERT-derived models. Looking
at the results presented in previous sections, we
can certainly conclude that BERT-derived models,
specifically trained on Italian texts, allow for a
relevant increase in performance also when applied
to spoken language by simple fine-tuning
procedures. The multilingual BERT model developed
by Google was not able to produce good results
and should not be used when are available specific
models for the studied language.</p>
        <p>A side, and sad, consideration that emerges
from this study regards the complexity of the
models. All the DNN models used in this work
involved very simple fine-tuning processes of some
BERT-derived model. Machine learning and Deep
learning changed completely the approaches to
NLP solutions, but never before we were in a
situation in which a single methodological approach
can solve different NLP problems always
establishing the state-of-the-art for that problem.
Moreover, we did not apply any parameter tuning at all
and fixed the early stopping criterion on 10 epochs
without any optimisation. By tuning all the
hyperparameters, it is reasonable we can further
increase the overall performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>We gratefully acknowledge the support of
NVIDIA Corporation with the donation of the
Titan Xp GPU used for this research.</p>
      <p>J.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          , S. Ballare`,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cerruti</surname>
          </string-name>
          , E. Goria, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Mauri</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KIPoS@EVALITA2020: Overview of the Task on KIParla Part of Speech tagging</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org. Devlin, M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR, abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Muller</surname>
          </string-name>
          , P.J. Ortiz Sua´rez,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          , E. de la Clergerie,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>CamemBERT: a tasty French language model</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>7203</fpage>
          -
          <lpage>7219</lpage>
          , Online. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>P.J. Ortis</surname>
          </string-name>
          <article-title>Sua´rez, B</article-title>
          .
          <string-name>
            <surname>Sagot</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Romary</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures</article-title>
          . In 7th Workshop on the Challenges in M. Polignano,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ), Bari, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          .
          <source>In Proc. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>How “BERTology” Changed the State-of-the-Art also for Italian NLP</article-title>
          .
          <source>In Proceedings of the Seventh Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2020</year>
          ), Bologna, Italy.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>