Parsing Approaches for Swiss German Noëmi Aepli Simon Clematide University of Zurich University of Zurich noemi.aepli@uzh.ch simon.clematide@cl.uzh.ch educated as it is the case in other countries. On the basis of their high acceptance in the Swiss culture and Abstract with the introduction of digital communication, Swiss German has undergone a spread over all kinds of com- This paper presents different approaches munication forms and social media. Despite being towards universal dependency parsing for oral languages, the dialects are used increasingly in Swiss German. Dealing with dialects is a written contexts, and writers spell as they please. challenging task in Natural Language Pro- cessing because of the huge linguistic vari- For Natural Language Processing (NLP), low- ability, which is partly due to the lack of resourced languages are challenging, particularly in standard spelling rules. Building a statistical cases like Swiss German where no orthographic rules parser requires expensive resources which are are followed. Compiling NLP resources from scratch only available for a few dozen high-resourced such as syntactically annotated text corpora (tree- languages. In order to overcome the low- banks) is a laborious and expensive process. Thus, in resource problem for dialects, approaches to such cases, cross-lingual approaches offer a perspec- cross-lingual learning are exploited. We ap- tive to get started with automatic processing of the ply different cross-lingual parsing strategies respective language. Such approaches are especially to Swiss German, making use of Standard promising if a closely related resource-rich language German resources. The methods applied is available, which is the case for Swiss German. are annotation projection and model transfer. The Universal Dependencies (UD) project aims The results show around 60% Labelled At- at developing and setting a standard for cross- tachment Score for all approaches and pro- linguistically consistently annotated treebanks in or- vide a first substantial step towards Swiss der to facilitate multilingual parsing research. We sup- German dependency parsing. The resources port this idea by adopting the current UD standard as are available for further research on NLP ap- much as possible. plications for Swiss German dialects. The information about which word of the sentence is dependent on which other one is important in or- 1 Introduction der to correctly understand the meaning of a sentence. Thus, it is needed for numerous NLP applications like Swiss German is a dialect continuum of the Aleman- information extraction or grammar checking. The task nic dialect group, comprising numerous varieties used of identifying these dependencies is done by a depen- in the German-speaking part of Switzerland.1 Unlike dency parser (see Figure 1 for a Swiss German exam- other dialect situations, the Swiss German dialects are ple in UD). deeply rooted in the Swiss culture and enjoy a high reputation, i.e. dialect speakers are not considered less In this paper, we apply two different cross-lingual dependency parsing strategies, namely annotation In: Mark Cieliebak, Don Tuggener and Fernando Benites (eds.): projection as a lexicalised approach, and model trans- Proceedings of the 3rd Swiss Text Analytics Conference (Swiss- fer as a delexicalised approach. We manually create a Text 2018), Winterthur, Switzerland, June 2018 1 Swiss Standard German, one of the four official languages of gold standard in order to evaluate and compare the dif- Switzerland, is not to be confused with Swiss German dialects. ferent strategies. Furthermore, we build and evaluate 16 Figure 1: Universal dependency parse trees for the sentence: We want to be perfect, but we are not. Top: gold standard, bottom: system. a silver standard treebank which, compared to manu- Manning, 2008; de Marneffe et al., 2014). McDonald ally annotating from scratch, accelerates the creation et al. (2013) present the first collection of six tree- of a larger training set for a monolingual Swiss Ger- banks with homogenous syntactic dependency anno- man parser. tation, which has continually been expanded since. The next section presents related work on NLP for Swiss German and introduces the two main ap- 2.2 Cross-lingual Dependency Parsing proaches to cross-lingual parsing. In Section 3 and 4 There are two main approaches to cross-lingual syn- we present our data and methods. Section 5 shows and tactic dependency parsing. Firstly, the delexicalized discusses our results. model transfer of which the goal is to abstract away 2 Related Work from language-specific parameters, i.e. train delexi- calised parsers. The idea is based on universal fea- Even though there have been several projects in- tures and model parameters that can be transferred be- volving Swiss German (Hollenstein and Aepli, 2014; tween related languages. Hence, this method assumes Zampieri et al., 2017; Hollenstein and Aepli, 2015; a common feature representation across languages. Samardzic et al., 2016; Samardžić et al., 2015; Scher- The advantage of the model transfer approach is that rer, 2007; Baumgartner, 2016; Dürscheid and Stark, no parallel data is needed. Zeman and Resnik (2008) 2011; Stark et al., 2014; Scherrer and Owen, 2010; train a basic delexicalised parser relying on part-of- Scherrer, 2013, 2012), resources for NLP applications speech (POS) tags only. McDonald et al. (2013); are still rare. As so often for dialects, even data for Petrov et al. (2012) and Naseem et al. (2010) rely Swiss German is sparse. Therefore, the approach is to on universal features while Täckström et al. (2013) use tools and data of related resource-rich languages adapt model parameters to the target language in order and apply transfer methods. to cross-linguistically transfer syntactic dependency parses. 2.1 Universal Dependencies The main idea of the second approach, the lexi- Research in dependency parsing has increased signifi- calised annotation projection method, is the mapping cantly since a collection of dependency treebanks has of labels across languages using parallel sentences and become available, in particular through the CoNLL automatic alignment. It includes projection heuristics shared tasks on dependency parsing (Buchholz and and usually post projection rules. The main drawback Marsi, 2006; Nivre et al., 2007a; Zeman et al., 2017) of this approach is that it relies on sentence-aligned which have provided many data sets. In order to facil- parallel corpora. In order to deal with this restriction, itate cross-lingual research on syntactic structure and treebank translation has emerged where the training to standardise best-practices, Universal POS (UPOS) data is automatically translated with a machine trans- tags (Petrov et al., 2012) as well as Universal Depen- lation system. The central point of this method is the dencies (Nivre et al., 2016) have been introduced. The alignment along which the annotations are mapped annotation scheme is originally based on Stanford de- from one language to the other. Automatic word pendencies (de Marneffe et al., 2006; de Marneffe and alignment has already been used by Yarowsky et al. 27 (2001); Aepli et al. (2014) and Snyder et al. (2008) for improving resources and tools for POS tagging of supervised and unsupervised learning respectively. Hwa et al. (2005), Tiedemann (2014) and Tiedemann (2015) use annotation projection approaches for pars- ing, and Tiedemann et al. (2014) as well as Rosa et al. (2017) use machine translation in addition instead of relying on parallel corpora. For Swiss German, tree- bank translation is not viable because of sparse data and the lack of a Machine Translation system for Swiss German. Hence, in this paper we apply anno- tation projection as a lexicalised approach and model transfer as a delexicalised approach. 3 Materials 3.1 Standard German Data We use the German Universal Dependency treebank2 consisting of 13,814 sentences. It is annotated ac- Figure 2: Workflow of the model transfer. cording to the UD guidelines3 and contains Univer- fered too much in length or Levenshtein edit distance5 sal POS (UPOS) tags (Petrov et al., 2012). The tree- from the Swiss German source sentence. bank comes in CoNLL-U format but as some tools cannot handle it, we convert it to CoNLL-X. This 4 Methods includes one major tokenization change concerning We apply two classical parsing approaches presented the Stuttgart-Tübingen-TagSet (STTS) (Schiller et al., in Section 2: model transfer with a delexicalised 1999) POS tag APPRART. In CoNLL-U the preposi- parser and annotation projection with crowdsourced tions with fused articles are split into two syntactical parallel data. Within both approaches we test two words. We undo this split, merge the information in parsing frameworks; the MaltParser (Nivre et al., one token and correspondingly adapt the dependency 2007b) and the more recent UDPipe (Straka and relations. Straková, 2017). Both parsers are provided with to- kenised input. 3.2 Swiss German Data 4.1 Model Transfer Approach Annotation projection requires a parallel corpus. The AGORA citizen linguistics project4 crowdsourced The delexicalised model transfer approach is straight- Standard German translations of 6,197 Swiss German forward, working on the basis of POS tags only. For sentences via the web site dindialaekt.ch. The sen- the training, the words in the Standard German corpus tences are taken from the NOAH corpus (Hollenstein are replaced by their POS tags. Accordingly, at pars- and Aepli, 2014), additionally, sentences from novels ing time the Swiss German words are replaced by their in Bernese and St Gallen dialect were added to bet- POS tag before parsing and re-inserted afterwards. ter represent syntactic word order differences. By the end of November 2017, the citizen linguists produced 4.1.1 POS tagging 41,670 translations. We aggregated and cleaned the Part-of-speech tagging is an important step prior to data into a parallel GSW/DE corpus of 26,015 sen- parsing because the syntactic structure builds upon the tences. In particular, we filtered translations that dif- 5 The Levenshtein distance (Levenshtein, 1966) measures the 2 difference between two sequences of characters. Hence, the min- https://github.com/UniversalDependencies/UD German imal edit distance between two words is the minimum number of 3 http://universaldependencies.org/guidelines.html characters to be changed (i.e. inserted, deleted or substituted), in 4 https://www.linguistik.uzh.ch/de/forschung/agora.html order to make them equal. 38 use the tool (here: parser) of a resource-rich language on that language (here: German) and then project the generated information (here: universal dependency structures) along the word alignment to the target lan- guage (here: Swiss German). In practice, this means we train the parser on the Standard German tree- bank (see Section 3.1) and parse the Standard German translations of the Swiss German original sentences. Then we project the resulting parse structure along the word alignments from the German word to the corre- sponding Swiss German word. 4.2.1 Transfer of the Annotation The transfer is the core component of annotation pro- jection. The parse of the Standard German trans- lation is projected along the word alignment to its Swiss German correspondent. The input consists of Figure 3: Workflow of the annotation projection. the Standard German parse and the alignment between the Standard German sentence and its Swiss German POS information. Obviously, when training delexi- version (GSW:DE). Algorithm 1 describes the projec- calised parsers, this step is crucial as the tags are the tion process. only information available to the parser. For POS tagging Swiss German sentences, we used Data: DE parse & alignment GSW:DE the Wapiti (Lavergne et al., 2010) model trained on Result: DE parse transferred to GSW Release 2.2 of the NOAH corpus, where average ac- for word alignment in sentence do curacy in 10-fold crossvalidation is 92.25%. if 1:1 alignment then The CoNLL-format includes UPOS tags in addition transfer parse of DE to the fine-grained language-specific POS tags (STTS else if 1:0 alignment (i.e. no DE word in the case of German and Swiss German). We used aligned) then the mapping provided by the UD project in order to attach GSW word to root as POS tag infer the UPOS tags from the given STTS tags. ADV and dependency label advmod else 1:n alignment (i.e. several DE words aligned) 4.2 Annotation Projection Approach transfer parse of aligned DE word with Annotation projection is not only more complex in smallest edit (Levenshtein) distance processing compared to model transfer but also needs end more resources. Most importantly, annotation projec- end tion requires a word-aligned parallel corpus. Starting Algorithm 1: Transfer of parses. from the crowdsourced sentences which are sentence- aligned, it is the task of a word aligner to compute The case of 1:1 alignment where exactly one Ger- the most probable word alignments, i.e. the informa- man word is aligned to the Swiss German word is tion about which word of the (Swiss German) source easy; the only thing to do is projecting the depen- sentence corresponds to which word of the target sen- dency of the German word to the Swiss German word. tence, i.e. the translation. There are many tools for If, however, there are several German words aligned this as it is a basic step also in machine translation to one Swiss German word (1:n), the algorithm has systems. We tested three of them: GIZA++ (Och and to decide which parse to transfer.6 In order to take Ney, 2003), FastAlign (Dyer et al., 2013) and Mono- this decision, the algorithm computes the Levenshtein lingual Greedy Aligner (MGA) (Rosa et al., 2012). 6 Note that the case of a 1:n alignment between a GSW word The idea of the annotation projection process is to and a DE multiword expression is not covered in this approach. 49 distances (Levenshtein, 1966) between the Swiss Ger- Algorithm 2 goes through every sentence of the in- man word and every aligned German word and takes put file and first makes sure that there is one root for the one with the smallest edit distance. The most chal- the sentence. If the root of the Standard German parse lenging case is when no German word is aligned to has not been transferred to the Swiss German sen- the Swiss German token. A simple baseline approach tence (missing word alignment), the first verb (UPOS attaches the corresponding Swiss German word as ad- VERB) is taken as root and if there is no VERB in the verbial modifier to the root of the sentence. sentence, the first NOUN is considered the root. The decision to treat every unaligned Swiss Ger- man word as an adverb is taken on the basis of the 4.3 Optimisation frequency distribution of POS tags; ADV is the second We tested two approaches for optimisation; prepro- most frequent POS tag (after NN) in the Swiss German cessing of the training set and postprocessing rules to data. However, taking into consideration the word be applied after parsing. itself, some more sophisticated rules can be elabo- rated. Considering the differences between Standard 4.3.1 Preprocessing German and Swiss German as described by Hollen- One frequent mistake mostly observed in the delexi- stein and Aepli (2014), we can expect some words calised approach is the wrong assignment of passive like infinitive particles (PTKINF) (e.g. go) or the past dependency labels instead of their active counterpart. participle gsi (been) to remain unaligned. The for- The passive construction in Standard German is built mer because these words do not exist in Standard Ger- with the auxiliary werden, which can, however, also man, the latter because Standard German simple past be used in non-passive constructions. The combina- tense is expressed by perfect tense in Swiss German, tion of VA* and a perfect participle (VVPP) is very typically resulting in a “spare” past participle in the frequent in Swiss German, however, it is usually not a alignment. Furthermore, there are unaligned articles passive construction but rather a perfect tense. There- because Swiss German requires articles in front of fore, a simple but effective solution is the introduction proper names. Also punctuation including the apos- of a new “set” of POS tags in the German UD training trophe is a source of errors which can easily be cor- set: VWFIN, VWINF, VWPP for finite verbs, infinitives rected. The application of these more elaborate rules and participles respectively of the verb werden. This have an impact of around 2 points on the evaluation means, all occurrences of the lemma werden as an scores. auxiliary (i.e. UPOS: AUX and STTS: VA{INF|PP}) Algorithm 1 transfers the German parses as they are replaced by VW{INF|PP}. In this way, the system are, as a consequence the numbering of the token IDs learns to discriminate between the usage of werden as is mixed up. Correcting the token IDs to be in as- auxiliary versus the usage as full verb and, most of all, cending order (from 1 to the length of the sentence) it learned to differentiate between the auxiliary wer- requires the corresponding adjustment of the head ref- den and the other auxiliaries haben (to have) and sein erences. Furthermore, one needs to make sure that (to be). Hence, the number of wrongly assigned pas- there is exactly one root in a sentence. sive dependency labels decreased, which leads to an Data: transferred DE parse to GSW words improvement of around 2.5 to 3.5 points as presented Result: valid GSW parse in Section 5. for sentence in parse do if DE root was not projected to GSW parse 4.3.2 Postprocessing then Some of the errors can easily be corrected with sim- take 1st VERB as root, else 1st NOUN ple rules in a postprocessing step. One example is a else if head of a projected word was not frequent error caused by a remnant of the 1st UD ver- projected to GSW parse then sion which is handled differently in UD version 2. The attach it to the root two labels oblique nominal (obl) and nominal mod- end ifier (nmod) are confused because the latter was used end to modify nominals and predicates in UD v1. How- Algorithm 2: Correction of transferred parses. ever, in UD v2, obl is used for a nominal functioning 5 10 as an oblique argument, while nmod is used for nom- This means, we used the training set of the German inal dependents of another noun (phrase) only. This UD treebank to train the MaltParser (using MaltOp- means, if the head is a verb, adjective or adverb, the timizer to get the best hyperparameter settings) and dependency label has to be obl. If, instead, the head UDPipe. Before training, we removed the morphol- is a noun, pronoun, name or number, the dependency ogy and lemma information because this information label is nmod. is not available in the Swiss German test set and there- fore the parsers cannot rely on it. Furthermore, for 5 Results & Discussion the MaltParser we converted the training set from This section presents the different settings and combi- CoNLL-U to CoNLL-X format because MaltOpti- nations of aforementioned resources, approaches and mizer cannot handle the former. Testing the Malt- tools. For the evaluation, we manually created a gold Parser model on the gold standard with automati- standard consisting of 100 Swiss German sentences cally assigned POS tags by Wapiti results in an LAS taken from the resources presented in Section 3.2. of 55.28%. UDPipe only reaches 21.19% LAS, one We evaluated the approaches according to Labelled reason for this low accuracy could be that UDPipe Attachment Score (LAS) and Unlabelled Attachment relies on word embedding information (Straka and Score (UAS)7 , not excluding punctuation. The results Straková, 2017), which results in a low recall when we present here are macro accuracy scores, that is, applying a model trained on German to Swiss Ger- the scores are computed separately for each sentence man. and then averaged8 . Note that there is a mismatch in the actual annotation of punctuation between the 5.3 Delexicalised Model Transfer the Standard German UD treebank v2 and the official Instead of giving the parser the Standard German guidelines we were applying. This difference in the words as input like in the direct cross-lingual ap- punctuation dependencies has an effect on the scores, proach, in the delexicalised approach we provide the i.e. it lowers the scores presented here. Furthermore, parser with POS information only. This means, the note that the test set containing 100 gold standard sen- words are replaced by STTS POS tags while all the tences is small and therefore these results have to be other columns stay the same. Given the small eval- taken with a grain of salt. uation set and a negligible difference in the results, the two parsers’ performance can be considered the 5.1 German Parser Accuracy same: ∼57% LAS for both when trained on the pre- In order to put the results into context, we checked processed training set, i.e. differentiating the auxiliary the performance of the parsers on the German UD v2 werden vs. the auxiliaries haben (to have) and sein (to treebank using their split of training and test set. In be) (see Section 4.3.1). this setting, we left all the available information for the parser to use, including morphology and lemmas. 5.4 Annotation Projection The APPRART splitting is undone for the CoNLL-X The results for the annotation projection approach MaltParser input, not so for the UDPipe which takes vary substantially depending on the combination of CoNLL-U as input format (and performs worse with aligner and parser. Starting from 46.45% LAS (Malt- the MaltParser-CoNLL-X input). MaltParser reaches Parser + Fastalign), the combination of UDPipe and a LAS of 79.71%, UDPipe 70.31% respectively. Monolingual Greedy Aligner scores best in this ap- proach with 53.39% LAS. This score is reached with 5.2 Direct Cross-lingual Parsing the baseline transfer rules where unaligned words are As a comparison to the main approaches, we applied simply attached to the root as adverbs. Applying more Standard German parsers directly to Swiss German. elaborate transfer rules (Section 4.2.1) results in an 7 UAS is the percentage of tokens with the correct syntactic improvement of 2.09 points to 55.65% LAS. The pre- head, LAS the percentage of tokens assigned the correct syntactic processing step does not improve the results in this head as well as the correct dependency label. approach. These results show that the Monolingual 8 Macro accuracy scores as opposed to the word-based micro scores, where the true positives are summed up over the whole Greedy Aligner performs best in the task of DE/GSW treebank and divided by the total number of words. alignment. MGA takes character-based word similar- 6 11 ity into account which intuitively makes sense as the information about similar letters is valuable informa- tion when dealing with closely related languages such as Standard German and Swiss German. 5.5 Postprocessing The postprocessing rules do not show a huge impact on the parsing results; the nmod/obl confusions for example are still present. The reason for this is that Figure 4: Frequencies of type frequencies (x) in a the parser assigned wrong heads to many of the words Swiss German text. and therefore the rule to correct the nmod/obl con- fusions does not work. The LAS scores improve by with the Standard German parser accuracy, which 1.62 points for the cross-lingual MaltParser and 2.07 reaches almost 80% LAS on the German UD v2 with points for delexicalised model transfer and annotation standard settings of the parsers, there is room for im- projection UDPipe approachs respectively, reaching provement. However, these numbers have to be set in nearly 60% LAS accuracy. relation to the data we worked with. Even though we could make use of Swiss German novels and crowd- 5.6 Discussion sourced data, it is still a small data set. Furthermore, Table 1 shows the best results including the corre- the enormous spelling variability in Swiss German di- sponding setting for every approach. The best LAS alects poses a serious challenge for all tools. Statisti- results of all the applied approaches are very close, cal tools work best if the observed events are frequent. hence there is no clear answer to the question of which However, they do not work well with sparse data con- approach works best. Annotation projection is the sisting of a large amount of hapax legomena, i.e. word most laborious among the three and as such not the form which appears only once. Figure 4 shows the fre- first option to choose. Furthermore, the transfer of the quencies (on y-axis) of type frequencies (x) in a Swiss annotation is strongly dependent on the performance German text collection9 consisting of 6,155 sentences of the aligner, which in turn benefits from big parallel with 105,692 tokens and 20,882 unique token types. corpora to be trained on. However, such big parallel 14,099 types appear only once (i.e. hapax legomena), corpora do not exist yet for Swiss German dialects. 2,804 appear twice (i.e. hapax dislegomena) 19,874 Contrary to our expectations, training specific mod- less than 10 times and 20,767 less than 100 times. els for different dialects does not have a huge impact on the results. The word ordering for the St Gallen di- 5.7 Silver Treebank Parsing Model alect is closer to the Standard German word ordering while Bernese dialect speakers often change the order Following the direct cross-lingual parsing ap- of the verbs. Due to these differences, we expected the proach, we automatically parse 6,155 Swiss German model transfer approach to perform worse on the Bern sentences9 in order to create a silver treebank. A sil- dialect than annotation projection, where the word or- ver standard treebank, as opposed to a gold standard der changes should be handled by the aligner. Look- treebank which is assumed to be correctly annotated, ing specifically at Bernese sentences with “switched” is automatically annotated and may therefore contain word order (e.g. ha aafo gränne (’I started to cry’), errors. Then, we use this silver treebank to train a gfunge hei gha (’have found’), het übercho (’have got- monolingual Swiss German parser and hence, create ten’)), there is no significant difference between the a first monolingual Swiss German dependency pars- two approaches in our test set. ing model. The advantage of using a silver treebank is the fact that it becomes a monolingual task. How- 5.6.1 Swiss German Variability ever, this comes with the price of a faulty training set, which is not the best resource to build a parser. The results presented here are not perfect and cer- tainly require further improvement in order for a sys- 9 NOAH corpus plus 396 sentences from novels by Pedro Lenz tem to be used in real-life applications. Compared and Renato Kaiser, excluding gold standard sentences. 7 12 Table 1: Comparison of the best score of every approach. Approach Setting LAS UAS Annotation Projection UDPipe + MGA + Postprocessing 57.73 66.57 Model Transfer UDPipe + Pre- and postprocessing 60.64 72.48 Direct Cross-lingual MaltParser (+ Wapiti) + Pre- and postprocessing 59.78 70.80 Interestingly, the performance of the MaltParser to be parsed, which are still not available for Swiss trained on the silver treebank reaches the same per- German. formance as the direct cross-lingual parsing approach itself, which was used to generate the silver treebank: 6 Conclusion LAS 57.10%. Given that 6,000 sentences do not con- In this work, we experimented with a variety of cross- stitute a large training set for a statistical parser, a lingual approaches for parsing texts written in Swiss parser could probably profit from additional related German. For statistically driven systems, languages Standard German material. However, combining the with non-standardised orthography are a demanding two training sets, i.e. the German Universal Depen- task. Swiss German dialects feature challenging Nat- dency treebank and the silver treebank gives slightly ural Language Processing (NLP) problems with their worse results (LAS 55.46%). lack of orthographic spelling rules and a huge pronun- ciation variety. This is a situation which leads to a 5.8 Future Work high degree of data sparseness and with it, a lack of resources and tools for NLP. There are several opportunities for further improve- We tested a lexicalised annotation projection ment. Concerning the annotation projection approach, method as well as a delexicalised model transfer the crucial alignment information needs to be im- method. The annotation projection method requires proved for example by ensembling over results from parallel sentences in both the resource-rich and the different word aligners. In cases where alignment low-resourced language while the delexicalised model does not work, adding further transfer and postpro- transfer approach only requires a monolingual tree- cessing rules would be important. In addition, a bank of a closely related resource-rich language. spelling normalisation strategy can help to deal with The evaluation on a manually annotated gold stan- the data sparseness imposed by the phonetic and dard consisting of 100 sentences shows a 60% La- orthographic variability in Swiss German dialects. belled Attachment Score (LAS) with negligible differ- Moreover, the outputs of the three parsing approaches ences between the different parsing approaches. How- could be ensembled, e.g. via majority vote like for ever, the annotation projection approach is more com- alignment as aforementioned, to get rid of the weak- plex than model transfer due to the transfer rules and nesses of each approach. Furthermore, the silver tree- the crucial word alignment process. bank created could be manually corrected in order to This work provides a first substantial step towards generate a treebank which can be used as training set closing a big gap in Natural Language Processing for a monolingual dependency parser for Swiss Ger- tools for Swiss German and provides data10 to work man. Finally, once the data sparseness for Swiss Ger- on further improvements. man varieties is mitigated, modern neural methods are promising as shown for example in the work by Acknowledgments Ammar et al. (2016). Ammar et al. train one mul- We thank the AGORA project “Citizen Linguistics” tilingual model that can be used to parse sentences for making their translation data available and in par- in several languages. In order to do so, they use ticular all our volunteer translators. We also thank Re- many resources including a bilingual dictionary for nato Kaiser and Pedro Lenz for their permission to use adding cross-lingual lexical information, and a mono- their novels in our experiments. lingual corpus for training word embeddings. Such approaches need a big amount of data of the language 10 https://github.com/noe-eva/SwissGermanUD 8 13 References German Society for Computational Linguistics and Lan- guage Technology. German Society for Computational Noëmi Aepli, Tanja Samardz̆ić, and Ruprecht von Walden- Linguistics and Language Technology, Duisburg-Essen, fels. 2014. Part-of-Speech Tag Disambiguation by Germany, pages 108–109. Cross-Linguistic Majority Vote. In First Workshop on Applying NLP Tools to Similar Languages, Varieties and Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Dialects (VarDial). Dublin. Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Waleed Ammar, George Mulcaire, Miguel Ballesteros, language engineering 11(03):311–325. Chris Dyer, and Noah A. Smith. 2016. Many languages, one parser. TACL 4:431–444. Thomas Lavergne, Olivier Cappé, and François Yvon. 2010. Practical Very Large Scale CRFs. In Proceedings Reto Baumgartner. 2016. Morphological analysis and the 48th Annual Meeting of the Association for Com- lemmatization for Swiss German using weighted trans- putational Linguistics (ACL). Uppsala, Sweden, pages ducers. In Stefanie Dipper, Friedrich Neubarth, and 504–513. Heike Zinsmeister, editors, Proceedings of the 13th Conference on Natural Language Processing (KON- VI Levenshtein. 1966. Binary Codes Capable of Correct- VENS 2016). Bochum, Germany. ing Deletions, Insertions and Reversals. Soviet Physics Doklady 10:707. Sabine Buchholz and Erwin Marsi. 2006. Conll-x shared task on multilingual dependency parsing. In In Proceed- Ryan McDonald, Joakim Nivre, Yvonne Quirmbach- ings of CoNLL. pages 149–164. Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Marie-Catherine de Marneffe, Timothy Dozat, Natalia Sil- Täckström, Claudia Bedini, Núria Bertomeu Castelló, veira, Katri Haverinen, Filip Ginter, Joakim Nivre, and and Jungmee Lee. 2013. Universal dependency annota- Christopher D. Manning. 2014. Universal stanford de- tion for multilingual parsing. In Proceedings of the 51st pendencies: A cross-linguistic typology. In Proceed- Annual Meeting of the Association for Computational ings of the Ninth International Conference on Language Linguistics (Volume 2: Short Papers). pages 92–97. Resources and Evaluation (LREC-2014). Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Marie-Catherine de Marneffe, Bill MacCartney, and Johnson. 2010. Using universal linguistic knowledge to Christopher D. Manning. 2006. Generating typed de- guide grammar induction. In Proceedings of the 2010 pendency parses from phrase structure parses. In 5th Conference on Empirical Methods in Natural Language International Conference on Language Resources and Processing. pages 1234–1244. Evaluation (LREC 2006). Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin- ter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Marie-Catherine de Marneffe and Christopher D. Manning. Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia 2008. The Stanford typed dependencies representation. Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Uni- In COLING Workshop on Cross-framework and Cross- versal dependencies v1: A multilingual treebank collec- domain Parser Evaluation. tion. In Proceedings of the Tenth International Con- ference on Language Resources and Evaluation (LREC Christa Dürscheid and Elisabeth Stark. 2011. 2016). Paris, France. SMS4science: An international corpus-based tex- ting project and the specific challenges for multilingual Joakim Nivre, Johan Hall, Sandra Kübler, Ryan Mc- Switzerland. Digital Discourse: Language in the New Donald, Jens Nilsson, Sebastian Riedel, and Deniz Media pages 299–320. Yuret. 2007a. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. CoNLL Shared Task Session of EMNLP-CoNLL A simple, fast, and effective reparameterization of IBM 2007. Prague, Czech Republic, pages 915–932. model 2. In HLT-NAACL. pages 644–648. http://www.aclweb.org/anthology/D/D07/D07-1096. Nora Hollenstein and Noëmi Aepli. 2014. Compilation Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, of a swiss german dialect corpus and its application GülŞen Eryigit, Sandra Kübler, Svetoslav Marinov, to pos tagging. In Proceedings of the First Work- and Erwin Marsi. 2007b. Maltparser: A language- shop on Applying NLP Tools to Similar Languages, independent system for data-driven dependency pars- Varieties and Dialects. Dublin, Ireland, pages 85–94. ing. Natural Language Engineering 13(2):95–135. http://www.aclweb.org/anthology/W14-5310. https://doi.org/10.1017/S1351324906004505. Nora Hollenstein and Noëmi Aepli. 2015. A resource for Franz Josef Och and Hermann Ney. 2003. A system- natural language processing of swiss german dialects. atic comparison of various statistical alignment models. In Proceedings of the International Conference of the Computational Linguistics 29(1):19–51. 9 14 Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. forschung/ressourcen/lexika/TagSets/ A universal part-of-speech tagset. In Proceedings of stts-1999.pdf. the Eighth International Conference on Language Re- sources and Evaluation (LREC-2012). pages 2089– Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and 2096. Regina Barzilay. 2008. Unsupervised multilingual learning for POS tagging. In Proceedings of the 2008 Rudolf Rosa, Ondřej Dušek, David Mareček, and Mar- Conference on Empirical Methods in Natural Lan- tin Popel. 2012. Using parallel features in parsing of guage Processing. Honolulu, Hawaii, pages 1041–1050. machine-translated sentences for correction of gram- http://www.aclweb.org/anthology/D08-1109. matical errors. In Proceedings of the Sixth Work- shop on Syntax, Semantics and Structure in Statisti- Elisabeth Stark, Simone Ueberwasser, and Anne Göhrig. cal Translation. Jeju, Republic of Korea, pages 39–48. 2014. Corpus ”What’s up, Switzerland?”. www. http://www.aclweb.org/anthology/W12-4205. whatsup-switzerland.ch. Rudolf Rosa, Daniel Zeman, David Mareček, and Milan Straka and Jana Straková. 2017. Tokeniz- Zdenek Žabokrtský. 2017. Slavic forest, Norwe- ing, pos tagging, lemmatizing and parsing ud 2.0 gian wood. In Proceedings of the Fourth Work- with udpipe. In Proceedings of the CoNLL 2017 shop on NLP for Similar Languages, Varieties and Shared Task: Multilingual Parsing from Raw Text to Dialects (VarDial). Valencia, Spain, pages 210–219. Universal Dependencies. Vancouver, Canada, pages http://www.aclweb.org/anthology/W17-1226. 88–99. http://www.aclweb.org/anthology/K/K17/K17- 3009.pdf. Tanja Samardžić, Yves Scherrer, and Elvira Glaser. 2015. Oscar Täckström, Ryan McDonald, and Joakim Nivre. Normalising orthographic and dialectal variants for the 2013. Target language adaptation of discriminative automatic processing of Swiss German. In Language transfer parsers. In Proceedings of the 2013 Con- and Technology Conference: Human Language Tech- ference of the North American Chapter of the Asso- nologies as a Challenge for Computer Science and Lin- ciation for Computational Linguistics: Human Lan- guistics. Poznan, Poland, pages 294–298. guage Technologies. Atlanta, Georgia, pages 1061– Tanja Samardzic, Yves Scherrer, and Elvira Glaser. 2016. 1071. http://www.aclweb.org/anthology/N13-1126. ArchiMob – a corpus of spoken Swiss German. In Pro- Jörg Tiedemann. 2014. Rediscovering annotation projec- ceedings of the Tenth International Conference on Lan- tion for cross-lingual parser induction. In Proceedings guage Resources and Evaluation (LREC 2016). Paris, of COLING 2014, the 25th International Conference France. on Computational Linguistics: Technical Papers. pages 1854–1864. Yves Scherrer. 2007. Adaptive string distance mea- sures for bilingual dialect lexicon induction. In Jörg Tiedemann. 2015. Cross-lingual dependency parsing Proceedings of the ACL 2007 Student Research with universal dependencies and predicted PoS labels. Workshop. Prague, Czech Republic, pages 55–60. In Proceedings of the Third International Conference http://www.aclweb.org/anthology/P/P07/P07-3010. on Dependency Linguistics (Depling 2015). pages 340– 349. Yves Scherrer. 2012. Machine translation into multiple di- alects: The example of Swiss German. In 7th SIDG Jörg Tiedemann, Željko Agić, and Joakim Nivre. Congress - Dialect 2.0. 2014. Treebank translation for cross-lingual parser induction. In Proceedings of the Eighteenth Con- Yves Scherrer. 2013. Continuous variation in com- ference on Computational Natural Language putational morphology - the example of Swiss Ger- Learning. Ann Arbor, Michigan, pages 130–140. man. In TheoreticAl and Computational MOrphol- http://www.aclweb.org/anthology/W14-1614. ogy: New Trends and Synergies (TACMO). 19th In- ternational Congress of Linguists, Genève, Suisse. David Yarowsky, Grace Ngai, and Richard Wicentowski. http://hal.inria.fr/hal-00851251. 2001. Inducing multilingual text analysis tools via ro- bust projection across aligned corpora. In Proceedings Yves Scherrer and Rambow Owen. 2010. Natural Lan- of the first international conference on Human language guage Processing for the Swiss German Dialect Area. technology research. pages 1–8. In Proceedings of the Conference on Natural Language Processing (KONVENS). Saarbrücken, Germany, pages Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, 93–102. Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves Scherrer, and Noëmi Aepli. 2017. Findings of the Var- Anne Schiller, Simone Teufel, Christine Stöckert, Dial evaluation campaign 2017. In Proceedings of the and Christine Thielen. 1999. Guidelines für Fourth Workshop on NLP for Similar Languages, Va- das Tagging deutscher Textkorpora mit STTS. rieties and Dialects (VarDial). Valencia, Spain, pages http://www.ims.uni-stuttgart.de/ 1–15. http://www.aclweb.org/anthology/W17-1201. 10 15 D. Zeman and Philip Resnik. 2008. Cross-language parser adaptation between related languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. pages 35–42. Daniel Zeman, Martin Popel, Milan Straka, Jan Ha- jic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Fran- cis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkova, Jan Hajic jr., Jaroslava Hlavacova, Václava Kettnerová, Zdenka Uresova, Jenna Kanerva, Stina Ojala, Anna Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hi- roshi Kanayama, Valeria dePaiva, Kira Droganova, Héctor Martı́nez Alonso, Çağr Çöltekin, Umut Suluba- cak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Bur- chardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, An- tonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonca, Tatiana Lando, Rattima Nitis- aroj, and Josie Li. 2017. Conll 2017 shared task: Multilingual parsing from raw text to universal de- pendencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Vancouver, Canada, pages 1–19. http://www.aclweb.org/anthology/K/K17/K17- 3001.pdf. 11 16