=Paper=
{{Paper
|id=Vol-2765/94
|storemode=property
|title=UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2765/paper94.pdf
|volume=Vol-2765
|authors=Fabio Tamburini
|dblpUrl=https://dblp.org/rec/conf/evalita/Tamburini20
}}
==UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data (short paper)==
UniBO @ KIPoS: Fine-tuning the Italian “BERTology” for PoS-tagging Spoken Data Fabio Tamburini FICLIT - University of Bologna, Italy fabio.tamburini@unibo.it Abstract This work heavily refers to an upcoming work of the same author (Tamburini, 2020) experi- English. The use of contextualised word menting various contextualised word embeddings embeddings allowed for a relevant perfor- for Italian to a number of different tasks and it mance increase for almost all Natural Lan- is aimed at applying simple fine-tuning methods guage Processing (NLP) applications. Re- for producing high-performance solutions at the cently some new models especially devel- EVALITA KIPOS PoS-tagging task (Bosco et al., oped for Italian became available to schol- 2020; Basile et al., 2020). ars. This work aims at applying simple fine-tuning methods for producing high- 2 Italian “BERTology” performance solutions at the EVALITA KIPOS PoS-tagging task (Bosco et al., The availability of various powerful computa- 2020). tional solutions for the community allowed for the development of some BERT-derived models Italian. L’utilizzazione di word embed- trained specifically on big Italian corpora of var- ding contestuali ha consentito notevoli in- ious textual types. All these models have been crementi nelle performance dei sistemi au- taken into account for our evaluation. In partic- tomatici sviluppati per affrontare vari task ular we considered those models that, at the time nell’ambito dell’elaborazione del linguag- of writing, are the only one available for Italian: gio naturale. Recentemente sono stati introdotti alcuni nuovi modelli sviluppati • Multilingual BERT1 : with the first BERT specificatamente per la lingua italiana. release Google developed also a multilin- Lo scopo di questo lavoro è valutare se gual model (‘bert-base-multilingual-cased’ – un semplice fine-tuning di questi modelli bertMC) that can be applied also for process- sia sufficiente per ottenere performance di ing Italian texts. alto livello nel task KIPOS di EVALITA 2020. • AlBERTo2 : last year a research group from the University of Bari developed a 1 Introduction brand new model for Italian especially devoted to Twitter texts and social media The introduction of contextualised word embed- (‘m-polignano-uniba/bert uncased L- dings, starting with ELMo (Peters et al., 2018) and 12 H-768 A-12 italian alb3rt0’ – alUC) in particular with BERT (Devlin et al., 2019) and (Polignano et al., 2019). Only the uncased the subsequent BERT-inspired transformer mod- model is available to the community. Due els (Liu et al., 2019; Martin et al., 2020; Sanh to the specific training of alUC, it requires a et al., 2019), marked a strong revolution in Natu- particular pre-processing step for replacing ral Language Processing (NLP), boosting the per- hashtags, urls, etc. that alter the official formance of almost all applications and especially tokenisation, rendering it not really appli- those based on statistical analysis and Deep Neu- cable to word-based classification tasks in ral Networks (DNN). general texts; thus, it will be used only for Copyright ©2020 for this paper by its authors. Use per- 1 mitted under Creative Commons License Attribution 4.0 In- https://github.com/google-research/bert 2 ternational (CC BY 4.0). https://github.com/marcopoli/AlBERTo-it working on twitter or social media data. In System Main Task Accuracy any case we tested it in all considered tasks Form. Inform. Both and, whenever results were reasonable, we Fine-TuningumC 93.49 91.13 92.26 reported them. Fine-TuninggiUC 92.96 89.92 91.38 Fine-TuningalUC 90.02 89.82 89.92 • GilBERTo3 : it is a rather new CamemBERT Fine-TuningbertMC 91.67 88.05 89.79 Italian model (‘idb-ita/gilberto-uncased- 2nd ranked system 87.56 88.24 87.91 from-camembert’ – giUC) trained by using 3rd ranked system 81.58 79.37 80.43 the huge Italian Web corpus section of the OSCAR (Ortis Suárez et al., 2019) project. Table 1: PoS-tagging Accuracy for the EVALITA Also for GilBERTo it is available only the KIPOS 2020 benchmark for the Main Task. The uncased model. Fine-TuningumC has been submitted for the chal- lenge as the system “UniBO”. • UmBERTo4 : the more recent model de- veloped explicitly for Italian, as far as we know, is UmBERTo (‘Musixmatch/umberto- We did not participate at the official challenge commoncrawl-cased-v1’ – umC). As well as for the two subtasks, but we included the results of GilBERTo, it has been trained by using OS- our best system also for these tasks into this report. CAR, but the produced model, differently Tables 2 and 3 show the results compared with the from GilBERTo, is cased. other two participating systems. 3 KIPOS 2020 PoS-tagging Task System Sub-Task A Accuracy Form. Inform. Both Part-of-speech tagging is a very basic task in NLP Other Participant 1 87.37 87.58 87.48 and a lot of applications rely on precise PoS-tag Fine-TuningumC 86.47 83.16 84.75 assignments. Spoken data present further chal- Other participant 2 78.73 75.79 77.20 lenges for PoS-taggers: small datasets for system training, short training sentences, less constrained Table 2: PoS-tagging Accuracy for the EVALITA language, the massive presence of interjections, KIPOS 2020 benchmark for the Sub-Task A. etc. are all examples of phenomena that increase the difficulties for building reliable automatic sys- tems. System Sub-Task B Accuracy The PoS-tagging system used for our experi- Form. Inform. Both ments is very simple and consist of a slight mod- Fine-TuningumC 89.74 89.52 89.63 ification to the fine tuning script ‘run ner.py’ Other participant 1 87.81 88.10 87.96 available with the version 2.7.0 of the Hugging- Other Participant 2 77.11 77.50 77.31 face/Transformers package5 . We did not employ any hyperparameter tuning, and, as the stopping Table 3: PoS-tagging Accuracy for the EVALITA criterion, we fixed the number of epoch to 10 KIPOS 2020 benchmark for the Sub-Task B. and chose the UmBERTo model on the basis of the previous experience (Tamburini, 2020). After Again, the simple fine tuning of a BERT-derived the challenge, we evaluated all the BERT-derived model, namely UnBERTo, exhibits the best perfor- models in order to propose a complete overview of mance on Sub-task B. The small amount of data the available resources. could probably affect the results on Sub-task A. Table 1 shows the results obtained by fine tun- We collected the most frequent errors produced ing all the considered BERT-derived models for by the proposed system: Table 4 shows that, unex- the Main Task. A very relevant increase in perfor- pectedly, the most frequent misclassifications in- mance w.r.t. the other participants is evident look- volve grammatical words. The typical behaviour ing at the results and UmBERTo is consistently the of the classical PoS-taggers tend to wrongly clas- best system. sify lexical words, namely nouns, verbs and ad- 3 https://github.com/idb-ita/GilBERTo jectives, intermixing their classes. Apparently, on 4 https://github.com/musixmatchresearch/umberto this dataset, grammatical words appear to be more 5 https://github.com/huggingface/transformers complex to classify than lexical words. This be- haviour should be investigated more appropriately perparameters, it is reasonable we can further in- by using bigger datasets and better consistency crease the overall performance. checks on the annotated data. Acknowledgments Formal #mistakes Gold tag System tag We gratefully acknowledge the support of 19 ADP A ADP NVIDIA Corporation with the donation of the Ti- 16 CCONJ ADV tan Xp GPU used for this research. 12 PROPN X 10 NOUN.LIN X References 10 ADJ VERB Valerio Basile, Danilo Croce, Maria Di Maro, and Informal Lucia C. Passaro. 2020. Evalita 2020: Overview #mistakes Gold tag System tag of the 7th evaluation campaign of natural lan- 59 PRON SCONJ guage processing and speech tools for italian. In 38 ADP A ADP Proceedings of Seventh Evaluation Campaign 22 ADV CCONJ of Natural Language Processing and Speech 15 NUM DET Tools for Italian. Final Workshop (EVALITA 15 INTJ PARA 2020), Online. CEUR.org. 15 CCONJ ADV 12 NOUN PROPN C. Bosco, S. Ballarè, M. Cerruti, E. Goria, 10 VERB PRON VERB and C. Mauri. 2020. KIPoS@EVALITA2020: Overview of the Task on KIParla Part of Speech Table 4: Error Analysis tagging. In Proceedings of Seventh Evalua- tion Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop 4 Discussion and Conclusions (EVALITA 2020), Online. CEUR.org. The starting idea of this work was to design the J. Devlin, M.-W. Chang, K. Lee, and simplest DNN model for Italian PoS-tagging after K. Toutanova. 2019. BERT: Pre-training the ‘BERT-revolution’ thanks to the recent avail- of deep bidirectional transformers for language ability of Italian BERT-derived models. Looking understanding. In In Proc. of the 2019 Con- at the results presented in previous sections, we ference of the North American Chapter of the can certainly conclude that BERT-derived models, Association for Computational Linguistics: specifically trained on Italian texts, allow for a rel- Human Language Technologies, Volume 1 evant increase in performance also when applied (Long and Short Papers), pages 4171–4186, to spoken language by simple fine-tuning proce- Minneapolis, Minnesota. dures. The multilingual BERT model developed Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, by Google was not able to produce good results D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and should not be used when are available specific and V. Stoyanov. 2019. Roberta: A robustly models for the studied language. optimized BERT pretraining approach. CoRR, A side, and sad, consideration that emerges abs/1907.11692. from this study regards the complexity of the mod- els. All the DNN models used in this work in- L. Martin, B. Muller, P.J. Ortiz Suárez, Y. Dupont, volved very simple fine-tuning processes of some L. Romary, E. de la Clergerie, D. Seddah, and BERT-derived model. Machine learning and Deep B. Sagot. 2020. CamemBERT: a tasty French learning changed completely the approaches to language model. In Proceedings of the 58th NLP solutions, but never before we were in a sit- Annual Meeting of the Association for Compu- uation in which a single methodological approach tational Linguistics, pages 7203–7219, Online. can solve different NLP problems always estab- Association for Computational Linguistics. lishing the state-of-the-art for that problem. More- P.J. Ortis Suárez, B. Sagot, and L. Romary. 2019. over, we did not apply any parameter tuning at all Asynchronous Pipeline for Processing Huge and fixed the early stopping criterion on 10 epochs Corpora on Medium to Low Resource Infras- without any optimisation. By tuning all the hy- tructures. In 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), Cardiff, United Kingdom. M.E. Peters, M. Neumann, M. Iyyer, M. Gard- ner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL-HLT 2018, pages 2227–2237, New Orleans, Louisiana. M. Polignano, P. Basile, M. de Gemmis, G. Se- meraro, and V. Basile. 2019. ALBERTO: Ital- ian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. In Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019), Bari, Italy. V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. In Proc. 5th Workshop on Energy Efficient Machine Learn- ing and Cognitive Computing - NeurIPS 2019. F. Tamburini. 2020. How “BERTology” Changed the State-of-the-Art also for Italian NLP. In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020), Bologna, Italy.