=Paper= {{Paper |id=Vol-2765/94 |storemode=property |title=UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data (short paper) |pdfUrl=https://ceur-ws.org/Vol-2765/paper94.pdf |volume=Vol-2765 |authors=Fabio Tamburini |dblpUrl=https://dblp.org/rec/conf/evalita/Tamburini20 }} ==UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data (short paper)== https://ceur-ws.org/Vol-2765/paper94.pdf
            UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for
                         PoS-tagging Spoken Data

                                              Fabio Tamburini
                                     FICLIT - University of Bologna, Italy
                                            fabio.tamburini@unibo.it




                       Abstract                                   This work heavily refers to an upcoming work
                                                               of the same author (Tamburini, 2020) experi-
    English. The use of contextualised word                    menting various contextualised word embeddings
    embeddings allowed for a relevant perfor-                  for Italian to a number of different tasks and it
    mance increase for almost all Natural Lan-                 is aimed at applying simple fine-tuning methods
    guage Processing (NLP) applications. Re-                   for producing high-performance solutions at the
    cently some new models especially devel-                   EVALITA KIPOS PoS-tagging task (Bosco et al.,
    oped for Italian became available to schol-                2020; Basile et al., 2020).
    ars. This work aims at applying simple
    fine-tuning methods for producing high-                    2       Italian “BERTology”
    performance solutions at the EVALITA
    KIPOS PoS-tagging task (Bosco et al.,                      The availability of various powerful computa-
    2020).                                                     tional solutions for the community allowed for
                                                               the development of some BERT-derived models
    Italian. L’utilizzazione di word embed-                    trained specifically on big Italian corpora of var-
    ding contestuali ha consentito notevoli in-                ious textual types. All these models have been
    crementi nelle performance dei sistemi au-                 taken into account for our evaluation. In partic-
    tomatici sviluppati per affrontare vari task               ular we considered those models that, at the time
    nell’ambito dell’elaborazione del linguag-                 of writing, are the only one available for Italian:
    gio naturale. Recentemente sono stati
    introdotti alcuni nuovi modelli sviluppati                     • Multilingual BERT1 : with the first BERT
    specificatamente per la lingua italiana.                         release Google developed also a multilin-
    Lo scopo di questo lavoro è valutare se                         gual model (‘bert-base-multilingual-cased’ –
    un semplice fine-tuning di questi modelli                        bertMC) that can be applied also for process-
    sia sufficiente per ottenere performance di                      ing Italian texts.
    alto livello nel task KIPOS di EVALITA
    2020.                                                          • AlBERTo2 : last year a research group
                                                                     from the University of Bari developed a
1   Introduction                                                     brand new model for Italian especially
                                                                     devoted to Twitter texts and social media
The introduction of contextualised word embed-
                                                                     (‘m-polignano-uniba/bert uncased L-
dings, starting with ELMo (Peters et al., 2018) and
                                                                     12 H-768 A-12 italian alb3rt0’ – alUC)
in particular with BERT (Devlin et al., 2019) and
                                                                     (Polignano et al., 2019). Only the uncased
the subsequent BERT-inspired transformer mod-
                                                                     model is available to the community. Due
els (Liu et al., 2019; Martin et al., 2020; Sanh
                                                                     to the specific training of alUC, it requires a
et al., 2019), marked a strong revolution in Natu-
                                                                     particular pre-processing step for replacing
ral Language Processing (NLP), boosting the per-
                                                                     hashtags, urls, etc. that alter the official
formance of almost all applications and especially
                                                                     tokenisation, rendering it not really appli-
those based on statistical analysis and Deep Neu-
                                                                     cable to word-based classification tasks in
ral Networks (DNN).
                                                                     general texts; thus, it will be used only for
     Copyright ©2020 for this paper by its authors. Use per-
                                                                   1
mitted under Creative Commons License Attribution 4.0 In-              https://github.com/google-research/bert
                                                                   2
ternational (CC BY 4.0).                                               https://github.com/marcopoli/AlBERTo-it
        working on twitter or social media data. In      System                  Main Task Accuracy
        any case we tested it in all considered tasks                           Form. Inform. Both
        and, whenever results were reasonable, we        Fine-TuningumC         93.49   91.13   92.26
        reported them.                                   Fine-TuninggiUC        92.96   89.92   91.38
                                                         Fine-TuningalUC        90.02   89.82   89.92
    • GilBERTo3 : it is a rather new CamemBERT           Fine-TuningbertMC      91.67   88.05   89.79
      Italian model (‘idb-ita/gilberto-uncased-          2nd ranked system      87.56   88.24   87.91
      from-camembert’ – giUC) trained by using           3rd ranked system      81.58   79.37   80.43
      the huge Italian Web corpus section of the
      OSCAR (Ortis Suárez et al., 2019) project.       Table 1: PoS-tagging Accuracy for the EVALITA
      Also for GilBERTo it is available only the        KIPOS 2020 benchmark for the Main Task. The
      uncased model.                                    Fine-TuningumC has been submitted for the chal-
                                                        lenge as the system “UniBO”.
    • UmBERTo4 : the more recent model de-
      veloped explicitly for Italian, as far as we
      know, is UmBERTo (‘Musixmatch/umberto-               We did not participate at the official challenge
      commoncrawl-cased-v1’ – umC). As well as          for the two subtasks, but we included the results of
      GilBERTo, it has been trained by using OS-        our best system also for these tasks into this report.
      CAR, but the produced model, differently          Tables 2 and 3 show the results compared with the
      from GilBERTo, is cased.                          other two participating systems.

3       KIPOS 2020 PoS-tagging Task                      System                  Sub-Task A Accuracy
                                                                                Form. Inform. Both
Part-of-speech tagging is a very basic task in NLP
                                                         Other Participant 1    87.37   87.58   87.48
and a lot of applications rely on precise PoS-tag
                                                         Fine-TuningumC         86.47   83.16   84.75
assignments. Spoken data present further chal-
                                                         Other participant 2    78.73   75.79   77.20
lenges for PoS-taggers: small datasets for system
training, short training sentences, less constrained    Table 2: PoS-tagging Accuracy for the EVALITA
language, the massive presence of interjections,        KIPOS 2020 benchmark for the Sub-Task A.
etc. are all examples of phenomena that increase
the difficulties for building reliable automatic sys-
tems.                                                    System                  Sub-Task B Accuracy
   The PoS-tagging system used for our experi-                                  Form. Inform. Both
ments is very simple and consist of a slight mod-        Fine-TuningumC         89.74   89.52   89.63
ification to the fine tuning script ‘run ner.py’         Other participant 1    87.81   88.10   87.96
available with the version 2.7.0 of the Hugging-         Other Participant 2    77.11   77.50   77.31
face/Transformers package5 . We did not employ
any hyperparameter tuning, and, as the stopping         Table 3: PoS-tagging Accuracy for the EVALITA
criterion, we fixed the number of epoch to 10           KIPOS 2020 benchmark for the Sub-Task B.
and chose the UmBERTo model on the basis of
the previous experience (Tamburini, 2020). After           Again, the simple fine tuning of a BERT-derived
the challenge, we evaluated all the BERT-derived        model, namely UnBERTo, exhibits the best perfor-
models in order to propose a complete overview of       mance on Sub-task B. The small amount of data
the available resources.                                could probably affect the results on Sub-task A.
   Table 1 shows the results obtained by fine tun-         We collected the most frequent errors produced
ing all the considered BERT-derived models for          by the proposed system: Table 4 shows that, unex-
the Main Task. A very relevant increase in perfor-      pectedly, the most frequent misclassifications in-
mance w.r.t. the other participants is evident look-    volve grammatical words. The typical behaviour
ing at the results and UmBERTo is consistently the      of the classical PoS-taggers tend to wrongly clas-
best system.                                            sify lexical words, namely nouns, verbs and ad-
    3
      https://github.com/idb-ita/GilBERTo
                                                        jectives, intermixing their classes. Apparently, on
    4
      https://github.com/musixmatchresearch/umberto     this dataset, grammatical words appear to be more
    5
      https://github.com/huggingface/transformers       complex to classify than lexical words. This be-
haviour should be investigated more appropriately         perparameters, it is reasonable we can further in-
by using bigger datasets and better consistency           crease the overall performance.
checks on the annotated data.
                                                          Acknowledgments
                      Formal
    #mistakes        Gold tag        System tag           We gratefully acknowledge the support of
       19            ADP A              ADP               NVIDIA Corporation with the donation of the Ti-
       16            CCONJ              ADV               tan Xp GPU used for this research.
       12            PROPN               X
       10           NOUN.LIN             X                References
       10              ADJ             VERB               Valerio Basile, Danilo Croce, Maria Di Maro, and
                     Informal                               Lucia C. Passaro. 2020. Evalita 2020: Overview
    #mistakes        Gold tag        System tag             of the 7th evaluation campaign of natural lan-
       59             PRON            SCONJ                 guage processing and speech tools for italian. In
       38            ADP A              ADP                 Proceedings of Seventh Evaluation Campaign
       22              ADV            CCONJ                 of Natural Language Processing and Speech
       15             NUM               DET                 Tools for Italian. Final Workshop (EVALITA
       15              INTJ            PARA                 2020), Online. CEUR.org.
       15            CCONJ              ADV
       12             NOUN            PROPN               C. Bosco, S. Ballarè, M. Cerruti, E. Goria,
       10          VERB PRON           VERB                 and C. Mauri. 2020. KIPoS@EVALITA2020:
                                                            Overview of the Task on KIParla Part of Speech
              Table 4: Error Analysis                       tagging. In Proceedings of Seventh Evalua-
                                                            tion Campaign of Natural Language Processing
                                                            and Speech Tools for Italian. Final Workshop
4   Discussion and Conclusions                              (EVALITA 2020), Online. CEUR.org.
The starting idea of this work was to design the          J.     Devlin, M.-W. Chang, K. Lee, and
simplest DNN model for Italian PoS-tagging after               K. Toutanova. 2019.         BERT: Pre-training
the ‘BERT-revolution’ thanks to the recent avail-              of deep bidirectional transformers for language
ability of Italian BERT-derived models. Looking                understanding. In In Proc. of the 2019 Con-
at the results presented in previous sections, we              ference of the North American Chapter of the
can certainly conclude that BERT-derived models,               Association for Computational Linguistics:
specifically trained on Italian texts, allow for a rel-        Human Language Technologies, Volume 1
evant increase in performance also when applied                (Long and Short Papers), pages 4171–4186,
to spoken language by simple fine-tuning proce-                Minneapolis, Minnesota.
dures. The multilingual BERT model developed              Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi,
by Google was not able to produce good results              D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
and should not be used when are available specific          and V. Stoyanov. 2019. Roberta: A robustly
models for the studied language.                            optimized BERT pretraining approach. CoRR,
   A side, and sad, consideration that emerges              abs/1907.11692.
from this study regards the complexity of the mod-
els. All the DNN models used in this work in-             L. Martin, B. Muller, P.J. Ortiz Suárez, Y. Dupont,
volved very simple fine-tuning processes of some             L. Romary, E. de la Clergerie, D. Seddah, and
BERT-derived model. Machine learning and Deep                B. Sagot. 2020. CamemBERT: a tasty French
learning changed completely the approaches to                language model. In Proceedings of the 58th
NLP solutions, but never before we were in a sit-            Annual Meeting of the Association for Compu-
uation in which a single methodological approach             tational Linguistics, pages 7203–7219, Online.
can solve different NLP problems always estab-               Association for Computational Linguistics.
lishing the state-of-the-art for that problem. More-      P.J. Ortis Suárez, B. Sagot, and L. Romary. 2019.
over, we did not apply any parameter tuning at all           Asynchronous Pipeline for Processing Huge
and fixed the early stopping criterion on 10 epochs          Corpora on Medium to Low Resource Infras-
without any optimisation. By tuning all the hy-              tructures. In 7th Workshop on the Challenges in
  the Management of Large Corpora (CMLC-7),
  Cardiff, United Kingdom.
M.E. Peters, M. Neumann, M. Iyyer, M. Gard-
 ner, C. Clark, K. Lee, and L. Zettlemoyer. 2018.
 Deep contextualized word representations. In
 Proc. of NAACL-HLT 2018, pages 2227–2237,
 New Orleans, Louisiana.
M. Polignano, P. Basile, M. de Gemmis, G. Se-
 meraro, and V. Basile. 2019. ALBERTO: Ital-
 ian BERT Language Understanding Model for
 NLP Challenging Tasks Based on Tweets. In
 Proceedings of the Sixth Italian Conference
 on Computational Linguistics (CLiC-it 2019),
 Bari, Italy.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf.
  2019. Distilbert, a distilled version of bert:
  smaller, faster, cheaper and lighter. In Proc. 5th
  Workshop on Energy Efficient Machine Learn-
  ing and Cognitive Computing - NeurIPS 2019.
F. Tamburini. 2020. How “BERTology” Changed
   the State-of-the-Art also for Italian NLP. In
   Proceedings of the Seventh Italian Conference
   on Computational Linguistics (CLiC-it 2020),
   Bologna, Italy.