-

Normalization for Swiss German

Pius von Däniken

Manuela Hürlimann

Mark Cieliebak

ciel@zhaw.ch

Lugano, Switzerland

0 Centre for Artificial Intelligence, Zurich University of Applied Sciences (ZHAW) , Winterthur , Switzerland

Written Swiss German is not standardized and varies across authors and their dialects and its use is almost exclusively constrained to communication on social media or via text messaging. Many corpora will therefore contain many distinct surface forms for the same word which can make their analysis challenging. It is therefore desirable to be able to normalize them to a single common surface form. We collected Swiss German utterances from social media and two annotators mapped every token to a corresponding form in Standard German. The task is to build models that can perform such a mapping automatically. This is diferent from translation since the resulting normalized utterance will in general not be grammatically correct Standard German as word order is preserved. A similar efort has previously been undertaken for text messages by the SMS4Science project. There is also a recent related shared task on lexical normalization of other languages at WNUT2021 workshop. During the shared task session, we presented the shared task dataset, how it was created, and gave an overview of ∗Corresponding author.

the annotation tool. Since there were no participants this year, we presented the results of a naive baseline on the task dataset.