=Paper=
{{Paper
|id=Vol-3033/keynote1
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-3033/keynote1.pdf
|volume=Vol-3033
}}
==None==
Returning the L in NLP: Why Language (Variety) Matters and How to Embrace it in Our Models Barbara Plank Computer Science Department IT University of Copenhagen Abstract NLP’s success today is driven by advances in modeling together with huge amounts of unla- beled data to train language models. However, for many application scenarios like low-resource languages, non-standard data and dialects we do not have access to labeled resources and even unlabeled data might be scarce. Moreover, evaluation today largely focuses on standard splits, yet language varies along many dimensions [3]. What is more is that for almost every NLP task, the existence of a single perceived gold answer is at best an idealization. In this talk, I will emphasize the importance of language variation in inputs and outputs and its impact on NLP. I will outline ways on how to go about it. This includes recent work on how to transfer models to low-resource languages and language variants [5, 6], the use of incidental (or fortuitous) learning signals such as genre for dependency parsing [2] and learning beyond a single ground truth [1, 3, 4]. Biography. Barbara Plank is Professor in the Computer Science Department at ITU (IT Univer- sity of Copenhagen). She is also the Head of the Master in Data Science Program. She received her PhD in Computational Linguistics from the University of Groningen. Her research interests focus on Natural Language Processing, in particular transfer learning and adaptations, learn- ing from beyond the text, and in general learning under limited supervision and fortuitous data sources. She (co)-organised several workshops and international conferences, amongst which the PEOPLES workshop (since 2016) and the first European NLP Summit (EurNLP 2019). Bar- bara was general chair of the 22nd Northern Computational Linguistics conference (NoDaLiDa 2019) and workshop chair for ACL in 2019. Barbara is member of the advisory board of the European Association for Computational Linguistics (EACL) and vice-president of the Northern European Association for Language Technology (NEALT). References [1] Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, and Massimo Poesio. Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi- Task Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2591–2597, 2021. [2] Max Müller-Eberstein, Rob van der Goot, and Barbara Plank. Genre as Weak Supervision for Cross-lingual Dependency Parsing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4786–4802, 2021. [3] Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In Pro- ceedings of KONVENS 2016, Ruhr-University Bochum. Bochumer Linguistische Arbeits- berichte, 2016. [4] Barbara Plank, Dirk Hovy, and Anders Søgaard. Learning part-of-speech taggers with inter- annotator agreement loss. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 742–751, 2014. [5] Barbara Plank, Kristian Nørgaard Jensen, and Rob van der Goot. DaN+: Danish nested named entities and lexical normalization. In Proceedings of the 28th International Confer- ence on Computational Linguistics, pages 6649–6662, 2020. [6] Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, and Barbara Plank. From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero- shot Spoken Language Understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2479–2497, 2021.