=Paper= {{Paper |id=Vol-3361/ws3 |storemode=property |title=GSWNORM 2022 - Shared Task on Text Normalization for Swiss German (Abstract) |pdfUrl=https://ceur-ws.org/Vol-3361/workshop3.pdf |volume=Vol-3361 |authors=Pius von Däniken,Manuela Hürlimann,Mark Cieliebak |dblpUrl=https://dblp.org/rec/conf/swisstext/DanikenHC22 }} ==GSWNORM 2022 - Shared Task on Text Normalization for Swiss German (Abstract)== https://ceur-ws.org/Vol-3361/workshop3.pdf
GSWNORM2022 - Shared Task on Text Normalization for
Swiss German
Pius von Däniken1,∗ , Manuela Hürlimann1 and Mark Cieliebak2
1
    Centre for Artificial Intelligence, Zurich University of Applied Sciences (ZHAW), Winterthur, Switzerland


                                             Abstract
                                             Written Swiss German is not standardized and varies across authors and their dialects and its use is almost exclusively
                                             constrained to communication on social media or via text messaging. Many corpora will therefore contain many distinct
                                             surface forms for the same word which can make their analysis challenging. It is therefore desirable to be able to normalize
                                             them to a single common surface form. We collected Swiss German utterances from social media and two annotators mapped
                                             every token to a corresponding form in Standard German. The task is to build models that can perform such a mapping
                                             automatically. This is different from translation since the resulting normalized utterance will in general not be grammatically
                                             correct Standard German as word order is preserved. A similar effort has previously been undertaken for text messages by
                                             the SMS4Science project. There is also a recent related shared task on lexical normalization of other languages at WNUT2021
                                             workshop. During the shared task session, we presented the shared task dataset, how it was created, and gave an overview of
                                             the annotation tool. Since there were no participants this year, we presented the results of a naive baseline on the task dataset.




SwissText 2022: Swiss Text Analytics Conference, June 08–10, 2022,
Lugano, Switzerland
∗
     Corresponding author.
Envelope-Open vode@zhaw.ch (P. v. Däniken); hueu@zhaw.ch (M. Hürlimann);
ciel@zhaw.ch (M. Cieliebak)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)