Parsing Approaches for Swiss German

                          Noëmi Aepli                                            Simon Clematide
                       University of Zurich                                      University of Zurich
                     noemi.aepli@uzh.ch                                    simon.clematide@cl.uzh.ch


                                                                          educated as it is the case in other countries. On the
                                                                          basis of their high acceptance in the Swiss culture and
                          Abstract                                        with the introduction of digital communication, Swiss
                                                                          German has undergone a spread over all kinds of com-
     This paper presents different approaches
                                                                          munication forms and social media. Despite being
     towards universal dependency parsing for
                                                                          oral languages, the dialects are used increasingly in
     Swiss German. Dealing with dialects is a
                                                                          written contexts, and writers spell as they please.
     challenging task in Natural Language Pro-
     cessing because of the huge linguistic vari-                            For Natural Language Processing (NLP), low-
     ability, which is partly due to the lack of                          resourced languages are challenging, particularly in
     standard spelling rules. Building a statistical                      cases like Swiss German where no orthographic rules
     parser requires expensive resources which are                        are followed. Compiling NLP resources from scratch
     only available for a few dozen high-resourced                        such as syntactically annotated text corpora (tree-
     languages. In order to overcome the low-                             banks) is a laborious and expensive process. Thus, in
     resource problem for dialects, approaches to                         such cases, cross-lingual approaches offer a perspec-
     cross-lingual learning are exploited. We ap-                         tive to get started with automatic processing of the
     ply different cross-lingual parsing strategies                       respective language. Such approaches are especially
     to Swiss German, making use of Standard                              promising if a closely related resource-rich language
     German resources. The methods applied                                is available, which is the case for Swiss German.
     are annotation projection and model transfer.                           The Universal Dependencies (UD) project aims
     The results show around 60% Labelled At-                             at developing and setting a standard for cross-
     tachment Score for all approaches and pro-                           linguistically consistently annotated treebanks in or-
     vide a first substantial step towards Swiss                          der to facilitate multilingual parsing research. We sup-
     German dependency parsing. The resources                             port this idea by adopting the current UD standard as
     are available for further research on NLP ap-                        much as possible.
     plications for Swiss German dialects.                                   The information about which word of the sentence
                                                                          is dependent on which other one is important in or-
1    Introduction                                                         der to correctly understand the meaning of a sentence.
                                                                          Thus, it is needed for numerous NLP applications like
Swiss German is a dialect continuum of the Aleman-
                                                                          information extraction or grammar checking. The task
nic dialect group, comprising numerous varieties used
                                                                          of identifying these dependencies is done by a depen-
in the German-speaking part of Switzerland.1 Unlike
                                                                          dency parser (see Figure 1 for a Swiss German exam-
other dialect situations, the Swiss German dialects are
                                                                          ple in UD).
deeply rooted in the Swiss culture and enjoy a high
reputation, i.e. dialect speakers are not considered less                    In this paper, we apply two different cross-lingual
                                                                          dependency parsing strategies, namely annotation
In: Mark Cieliebak, Don Tuggener and Fernando Benites (eds.):             projection as a lexicalised approach, and model trans-
Proceedings of the 3rd Swiss Text Analytics Conference (Swiss-            fer as a delexicalised approach. We manually create a
Text 2018), Winterthur, Switzerland, June 2018
    1
      Swiss Standard German, one of the four official languages of        gold standard in order to evaluate and compare the dif-
Switzerland, is not to be confused with Swiss German dialects.            ferent strategies. Furthermore, we build and evaluate


                                                                     16
       Figure 1: Universal dependency parse trees for the sentence: We want to be perfect, but we are not.
                                       Top: gold standard, bottom: system.
a silver standard treebank which, compared to manu-        Manning, 2008; de Marneffe et al., 2014). McDonald
ally annotating from scratch, accelerates the creation     et al. (2013) present the first collection of six tree-
of a larger training set for a monolingual Swiss Ger-      banks with homogenous syntactic dependency anno-
man parser.                                                tation, which has continually been expanded since.
   The next section presents related work on NLP
for Swiss German and introduces the two main ap-           2.2 Cross-lingual Dependency Parsing
proaches to cross-lingual parsing. In Section 3 and 4
                                                           There are two main approaches to cross-lingual syn-
we present our data and methods. Section 5 shows and
                                                           tactic dependency parsing. Firstly, the delexicalized
discusses our results.
                                                           model transfer of which the goal is to abstract away
2 Related Work                                             from language-specific parameters, i.e. train delexi-
                                                           calised parsers. The idea is based on universal fea-
Even though there have been several projects in-           tures and model parameters that can be transferred be-
volving Swiss German (Hollenstein and Aepli, 2014;         tween related languages. Hence, this method assumes
Zampieri et al., 2017; Hollenstein and Aepli, 2015;        a common feature representation across languages.
Samardzic et al., 2016; Samardžić et al., 2015; Scher-   The advantage of the model transfer approach is that
rer, 2007; Baumgartner, 2016; Dürscheid and Stark,        no parallel data is needed. Zeman and Resnik (2008)
2011; Stark et al., 2014; Scherrer and Owen, 2010;         train a basic delexicalised parser relying on part-of-
Scherrer, 2013, 2012), resources for NLP applications      speech (POS) tags only. McDonald et al. (2013);
are still rare. As so often for dialects, even data for    Petrov et al. (2012) and Naseem et al. (2010) rely
Swiss German is sparse. Therefore, the approach is to      on universal features while Täckström et al. (2013)
use tools and data of related resource-rich languages      adapt model parameters to the target language in order
and apply transfer methods.                                to cross-linguistically transfer syntactic dependency
                                                           parses.
2.1 Universal Dependencies
                                                              The main idea of the second approach, the lexi-
Research in dependency parsing has increased signifi-      calised annotation projection method, is the mapping
cantly since a collection of dependency treebanks has      of labels across languages using parallel sentences and
become available, in particular through the CoNLL          automatic alignment. It includes projection heuristics
shared tasks on dependency parsing (Buchholz and           and usually post projection rules. The main drawback
Marsi, 2006; Nivre et al., 2007a; Zeman et al., 2017)      of this approach is that it relies on sentence-aligned
which have provided many data sets. In order to facil-     parallel corpora. In order to deal with this restriction,
itate cross-lingual research on syntactic structure and    treebank translation has emerged where the training
to standardise best-practices, Universal POS (UPOS)        data is automatically translated with a machine trans-
tags (Petrov et al., 2012) as well as Universal Depen-     lation system. The central point of this method is the
dencies (Nivre et al., 2016) have been introduced. The     alignment along which the annotations are mapped
annotation scheme is originally based on Stanford de-      from one language to the other. Automatic word
pendencies (de Marneffe et al., 2006; de Marneffe and      alignment has already been used by Yarowsky et al.


                                                         27
(2001); Aepli et al. (2014) and Snyder et al. (2008)
for improving resources and tools for POS tagging
of supervised and unsupervised learning respectively.
Hwa et al. (2005), Tiedemann (2014) and Tiedemann
(2015) use annotation projection approaches for pars-
ing, and Tiedemann et al. (2014) as well as Rosa et al.
(2017) use machine translation in addition instead of
relying on parallel corpora. For Swiss German, tree-
bank translation is not viable because of sparse data
and the lack of a Machine Translation system for
Swiss German. Hence, in this paper we apply anno-
tation projection as a lexicalised approach and model
transfer as a delexicalised approach.

3       Materials
3.1 Standard German Data
We use the German Universal Dependency treebank2
consisting of 13,814 sentences. It is annotated ac-                         Figure 2: Workflow of the model transfer.
cording to the UD guidelines3 and contains Univer-                 fered too much in length or Levenshtein edit distance5
sal POS (UPOS) tags (Petrov et al., 2012). The tree-               from the Swiss German source sentence.
bank comes in CoNLL-U format but as some tools
cannot handle it, we convert it to CoNLL-X. This                   4       Methods
includes one major tokenization change concerning
                                                                   We apply two classical parsing approaches presented
the Stuttgart-Tübingen-TagSet (STTS) (Schiller et al.,
                                                                   in Section 2: model transfer with a delexicalised
1999) POS tag APPRART. In CoNLL-U the preposi-
                                                                   parser and annotation projection with crowdsourced
tions with fused articles are split into two syntactical
                                                                   parallel data. Within both approaches we test two
words. We undo this split, merge the information in
                                                                   parsing frameworks; the MaltParser (Nivre et al.,
one token and correspondingly adapt the dependency
                                                                   2007b) and the more recent UDPipe (Straka and
relations.
                                                                   Straková, 2017). Both parsers are provided with to-
                                                                   kenised input.
3.2 Swiss German Data
                                                                   4.1      Model Transfer Approach
Annotation projection requires a parallel corpus.
The AGORA citizen linguistics project4 crowdsourced                The delexicalised model transfer approach is straight-
Standard German translations of 6,197 Swiss German                 forward, working on the basis of POS tags only. For
sentences via the web site dindialaekt.ch. The sen-                the training, the words in the Standard German corpus
tences are taken from the NOAH corpus (Hollenstein                 are replaced by their POS tags. Accordingly, at pars-
and Aepli, 2014), additionally, sentences from novels              ing time the Swiss German words are replaced by their
in Bernese and St Gallen dialect were added to bet-                POS tag before parsing and re-inserted afterwards.
ter represent syntactic word order differences. By the
end of November 2017, the citizen linguists produced               4.1.1     POS tagging
41,670 translations. We aggregated and cleaned the                 Part-of-speech tagging is an important step prior to
data into a parallel GSW/DE corpus of 26,015 sen-                  parsing because the syntactic structure builds upon the
tences. In particular, we filtered translations that dif-              5
                                                                        The Levenshtein distance (Levenshtein, 1966) measures the
    2
                                                                   difference between two sequences of characters. Hence, the min-
      https://github.com/UniversalDependencies/UD German           imal edit distance between two words is the minimum number of
    3
      http://universaldependencies.org/guidelines.html             characters to be changed (i.e. inserted, deleted or substituted), in
    4
      https://www.linguistik.uzh.ch/de/forschung/agora.html        order to make them equal.


                                                              38
                                                                 use the tool (here: parser) of a resource-rich language
                                                                 on that language (here: German) and then project the
                                                                 generated information (here: universal dependency
                                                                 structures) along the word alignment to the target lan-
                                                                 guage (here: Swiss German). In practice, this means
                                                                 we train the parser on the Standard German tree-
                                                                 bank (see Section 3.1) and parse the Standard German
                                                                 translations of the Swiss German original sentences.
                                                                 Then we project the resulting parse structure along the
                                                                 word alignments from the German word to the corre-
                                                                 sponding Swiss German word.

                                                                 4.2.1 Transfer of the Annotation
                                                                 The transfer is the core component of annotation pro-
                                                                 jection. The parse of the Standard German trans-
                                                                 lation is projected along the word alignment to its
                                                                 Swiss German correspondent. The input consists of
  Figure 3: Workflow of the annotation projection.               the Standard German parse and the alignment between
                                                                 the Standard German sentence and its Swiss German
POS information. Obviously, when training delexi-
                                                                 version (GSW:DE). Algorithm 1 describes the projec-
calised parsers, this step is crucial as the tags are the
                                                                 tion process.
only information available to the parser.
   For POS tagging Swiss German sentences, we used                  Data: DE parse & alignment GSW:DE
the Wapiti (Lavergne et al., 2010) model trained on                 Result: DE parse transferred to GSW
Release 2.2 of the NOAH corpus, where average ac-                   for word alignment in sentence do
curacy in 10-fold crossvalidation is 92.25%.                            if 1:1 alignment then
   The CoNLL-format includes UPOS tags in addition                           transfer parse of DE
to the fine-grained language-specific POS tags (STTS                    else if 1:0 alignment (i.e. no DE word
in the case of German and Swiss German). We used                          aligned) then
the mapping provided by the UD project in order to                           attach GSW word to root as POS tag
infer the UPOS tags from the given STTS tags.                                  ADV and dependency label advmod
                                                                        else 1:n alignment (i.e. several DE words
                                                                          aligned)
4.2 Annotation Projection Approach                                           transfer parse of aligned DE word with
Annotation projection is not only more complex in                              smallest edit (Levenshtein) distance
processing compared to model transfer but also needs                    end
more resources. Most importantly, annotation projec-                end
tion requires a word-aligned parallel corpus. Starting                      Algorithm 1: Transfer of parses.
from the crowdsourced sentences which are sentence-
aligned, it is the task of a word aligner to compute                 The case of 1:1 alignment where exactly one Ger-
the most probable word alignments, i.e. the informa-             man word is aligned to the Swiss German word is
tion about which word of the (Swiss German) source               easy; the only thing to do is projecting the depen-
sentence corresponds to which word of the target sen-            dency of the German word to the Swiss German word.
tence, i.e. the translation. There are many tools for            If, however, there are several German words aligned
this as it is a basic step also in machine translation           to one Swiss German word (1:n), the algorithm has
systems. We tested three of them: GIZA++ (Och and                to decide which parse to transfer.6 In order to take
Ney, 2003), FastAlign (Dyer et al., 2013) and Mono-              this decision, the algorithm computes the Levenshtein
lingual Greedy Aligner (MGA) (Rosa et al., 2012).                   6
                                                                      Note that the case of a 1:n alignment between a GSW word
   The idea of the annotation projection process is to           and a DE multiword expression is not covered in this approach.


                                                            49
distances (Levenshtein, 1966) between the Swiss Ger-               Algorithm 2 goes through every sentence of the in-
man word and every aligned German word and takes                put file and first makes sure that there is one root for
the one with the smallest edit distance. The most chal-         the sentence. If the root of the Standard German parse
lenging case is when no German word is aligned to               has not been transferred to the Swiss German sen-
the Swiss German token. A simple baseline approach              tence (missing word alignment), the first verb (UPOS
attaches the corresponding Swiss German word as ad-             VERB) is taken as root and if there is no VERB in the
verbial modifier to the root of the sentence.                   sentence, the first NOUN is considered the root.
   The decision to treat every unaligned Swiss Ger-
man word as an adverb is taken on the basis of the              4.3     Optimisation
frequency distribution of POS tags; ADV is the second           We tested two approaches for optimisation; prepro-
most frequent POS tag (after NN) in the Swiss German            cessing of the training set and postprocessing rules to
data. However, taking into consideration the word               be applied after parsing.
itself, some more sophisticated rules can be elabo-
rated. Considering the differences between Standard             4.3.1    Preprocessing
German and Swiss German as described by Hollen-                 One frequent mistake mostly observed in the delexi-
stein and Aepli (2014), we can expect some words                calised approach is the wrong assignment of passive
like infinitive particles (PTKINF) (e.g. go) or the past        dependency labels instead of their active counterpart.
participle gsi (been) to remain unaligned. The for-             The passive construction in Standard German is built
mer because these words do not exist in Standard Ger-           with the auxiliary werden, which can, however, also
man, the latter because Standard German simple past             be used in non-passive constructions. The combina-
tense is expressed by perfect tense in Swiss German,            tion of VA* and a perfect participle (VVPP) is very
typically resulting in a “spare” past participle in the         frequent in Swiss German, however, it is usually not a
alignment. Furthermore, there are unaligned articles            passive construction but rather a perfect tense. There-
because Swiss German requires articles in front of              fore, a simple but effective solution is the introduction
proper names. Also punctuation including the apos-              of a new “set” of POS tags in the German UD training
trophe is a source of errors which can easily be cor-           set: VWFIN, VWINF, VWPP for finite verbs, infinitives
rected. The application of these more elaborate rules           and participles respectively of the verb werden. This
have an impact of around 2 points on the evaluation             means, all occurrences of the lemma werden as an
scores.                                                         auxiliary (i.e. UPOS: AUX and STTS: VA{INF|PP})
   Algorithm 1 transfers the German parses as they              are replaced by VW{INF|PP}. In this way, the system
are, as a consequence the numbering of the token IDs            learns to discriminate between the usage of werden as
is mixed up. Correcting the token IDs to be in as-              auxiliary versus the usage as full verb and, most of all,
cending order (from 1 to the length of the sentence)            it learned to differentiate between the auxiliary wer-
requires the corresponding adjustment of the head ref-          den and the other auxiliaries haben (to have) and sein
erences. Furthermore, one needs to make sure that               (to be). Hence, the number of wrongly assigned pas-
there is exactly one root in a sentence.                        sive dependency labels decreased, which leads to an
  Data: transferred DE parse to GSW words                       improvement of around 2.5 to 3.5 points as presented
  Result: valid GSW parse                                       in Section 5.
  for sentence in parse do
      if DE root was not projected to GSW parse                 4.3.2    Postprocessing
        then                                                    Some of the errors can easily be corrected with sim-
          take 1st VERB as root, else 1st NOUN                  ple rules in a postprocessing step. One example is a
      else if head of a projected word was not                  frequent error caused by a remnant of the 1st UD ver-
        projected to GSW parse then                             sion which is handled differently in UD version 2. The
          attach it to the root                                 two labels oblique nominal (obl) and nominal mod-
      end                                                       ifier (nmod) are confused because the latter was used
  end                                                           to modify nominals and predicates in UD v1. How-
  Algorithm 2: Correction of transferred parses.                ever, in UD v2, obl is used for a nominal functioning


                                                           5
                                                           10
as an oblique argument, while nmod is used for nom-                      This means, we used the training set of the German
inal dependents of another noun (phrase) only. This                      UD treebank to train the MaltParser (using MaltOp-
means, if the head is a verb, adjective or adverb, the                   timizer to get the best hyperparameter settings) and
dependency label has to be obl. If, instead, the head                    UDPipe. Before training, we removed the morphol-
is a noun, pronoun, name or number, the dependency                       ogy and lemma information because this information
label is nmod.                                                           is not available in the Swiss German test set and there-
                                                                         fore the parsers cannot rely on it. Furthermore, for
5   Results & Discussion                                                 the MaltParser we converted the training set from
This section presents the different settings and combi-                  CoNLL-U to CoNLL-X format because MaltOpti-
nations of aforementioned resources, approaches and                      mizer cannot handle the former. Testing the Malt-
tools. For the evaluation, we manually created a gold                    Parser model on the gold standard with automati-
standard consisting of 100 Swiss German sentences                        cally assigned POS tags by Wapiti results in an LAS
taken from the resources presented in Section 3.2.                       of 55.28%. UDPipe only reaches 21.19% LAS, one
We evaluated the approaches according to Labelled                        reason for this low accuracy could be that UDPipe
Attachment Score (LAS) and Unlabelled Attachment                         relies on word embedding information (Straka and
Score (UAS)7 , not excluding punctuation. The results                    Straková, 2017), which results in a low recall when
we present here are macro accuracy scores, that is,                      applying a model trained on German to Swiss Ger-
the scores are computed separately for each sentence                     man.
and then averaged8 . Note that there is a mismatch
in the actual annotation of punctuation between the                      5.3   Delexicalised Model Transfer
the Standard German UD treebank v2 and the official                      Instead of giving the parser the Standard German
guidelines we were applying. This difference in the                      words as input like in the direct cross-lingual ap-
punctuation dependencies has an effect on the scores,                    proach, in the delexicalised approach we provide the
i.e. it lowers the scores presented here. Furthermore,                   parser with POS information only. This means, the
note that the test set containing 100 gold standard sen-                 words are replaced by STTS POS tags while all the
tences is small and therefore these results have to be                   other columns stay the same. Given the small eval-
taken with a grain of salt.                                              uation set and a negligible difference in the results,
                                                                         the two parsers’ performance can be considered the
5.1 German Parser Accuracy                                               same: ∼57% LAS for both when trained on the pre-
In order to put the results into context, we checked                     processed training set, i.e. differentiating the auxiliary
the performance of the parsers on the German UD v2                       werden vs. the auxiliaries haben (to have) and sein (to
treebank using their split of training and test set. In                  be) (see Section 4.3.1).
this setting, we left all the available information for
the parser to use, including morphology and lemmas.                      5.4   Annotation Projection
The APPRART splitting is undone for the CoNLL-X                          The results for the annotation projection approach
MaltParser input, not so for the UDPipe which takes                      vary substantially depending on the combination of
CoNLL-U as input format (and performs worse with                         aligner and parser. Starting from 46.45% LAS (Malt-
the MaltParser-CoNLL-X input). MaltParser reaches                        Parser + Fastalign), the combination of UDPipe and
a LAS of 79.71%, UDPipe 70.31% respectively.                             Monolingual Greedy Aligner scores best in this ap-
                                                                         proach with 53.39% LAS. This score is reached with
5.2 Direct Cross-lingual Parsing
                                                                         the baseline transfer rules where unaligned words are
As a comparison to the main approaches, we applied                       simply attached to the root as adverbs. Applying more
Standard German parsers directly to Swiss German.                        elaborate transfer rules (Section 4.2.1) results in an
    7
      UAS is the percentage of tokens with the correct syntactic         improvement of 2.09 points to 55.65% LAS. The pre-
head, LAS the percentage of tokens assigned the correct syntactic        processing step does not improve the results in this
head as well as the correct dependency label.                            approach. These results show that the Monolingual
    8
      Macro accuracy scores as opposed to the word-based micro
scores, where the true positives are summed up over the whole            Greedy Aligner performs best in the task of DE/GSW
treebank and divided by the total number of words.                       alignment. MGA takes character-based word similar-


                                                                    6
                                                                    11
ity into account which intuitively makes sense as the
information about similar letters is valuable informa-
tion when dealing with closely related languages such
as Standard German and Swiss German.

5.5 Postprocessing
The postprocessing rules do not show a huge impact
on the parsing results; the nmod/obl confusions for
example are still present. The reason for this is that
                                                                Figure 4: Frequencies of type frequencies (x) in a
the parser assigned wrong heads to many of the words
                                                                Swiss German text.
and therefore the rule to correct the nmod/obl con-
fusions does not work. The LAS scores improve by                with the Standard German parser accuracy, which
1.62 points for the cross-lingual MaltParser and 2.07           reaches almost 80% LAS on the German UD v2 with
points for delexicalised model transfer and annotation          standard settings of the parsers, there is room for im-
projection UDPipe approachs respectively, reaching              provement. However, these numbers have to be set in
nearly 60% LAS accuracy.                                        relation to the data we worked with. Even though we
                                                                could make use of Swiss German novels and crowd-
5.6 Discussion
                                                                sourced data, it is still a small data set. Furthermore,
Table 1 shows the best results including the corre-             the enormous spelling variability in Swiss German di-
sponding setting for every approach. The best LAS               alects poses a serious challenge for all tools. Statisti-
results of all the applied approaches are very close,           cal tools work best if the observed events are frequent.
hence there is no clear answer to the question of which         However, they do not work well with sparse data con-
approach works best. Annotation projection is the               sisting of a large amount of hapax legomena, i.e. word
most laborious among the three and as such not the              form which appears only once. Figure 4 shows the fre-
first option to choose. Furthermore, the transfer of the        quencies (on y-axis) of type frequencies (x) in a Swiss
annotation is strongly dependent on the performance             German text collection9 consisting of 6,155 sentences
of the aligner, which in turn benefits from big parallel        with 105,692 tokens and 20,882 unique token types.
corpora to be trained on. However, such big parallel            14,099 types appear only once (i.e. hapax legomena),
corpora do not exist yet for Swiss German dialects.             2,804 appear twice (i.e. hapax dislegomena) 19,874
    Contrary to our expectations, training specific mod-        less than 10 times and 20,767 less than 100 times.
els for different dialects does not have a huge impact
on the results. The word ordering for the St Gallen di-         5.7   Silver Treebank Parsing Model
alect is closer to the Standard German word ordering
while Bernese dialect speakers often change the order           Following the direct cross-lingual parsing ap-
of the verbs. Due to these differences, we expected the         proach, we automatically parse 6,155 Swiss German
model transfer approach to perform worse on the Bern            sentences9 in order to create a silver treebank. A sil-
dialect than annotation projection, where the word or-          ver standard treebank, as opposed to a gold standard
der changes should be handled by the aligner. Look-             treebank which is assumed to be correctly annotated,
ing specifically at Bernese sentences with “switched”           is automatically annotated and may therefore contain
word order (e.g. ha aafo gränne (’I started to cry’),          errors. Then, we use this silver treebank to train a
gfunge hei gha (’have found’), het übercho (’have got-         monolingual Swiss German parser and hence, create
ten’)), there is no significant difference between the          a first monolingual Swiss German dependency pars-
two approaches in our test set.                                 ing model. The advantage of using a silver treebank
                                                                is the fact that it becomes a monolingual task. How-
5.6.1 Swiss German Variability                                  ever, this comes with the price of a faulty training set,
                                                                which is not the best resource to build a parser.
The results presented here are not perfect and cer-
tainly require further improvement in order for a sys-             9
                                                                     NOAH corpus plus 396 sentences from novels by Pedro Lenz
tem to be used in real-life applications. Compared              and Renato Kaiser, excluding gold standard sentences.


                                                           7
                                                           12
                            Table 1: Comparison of the best score of every approach.

           Approach                  Setting                                                        LAS      UAS
           Annotation Projection     UDPipe + MGA + Postprocessing                                  57.73    66.57
           Model Transfer            UDPipe + Pre- and postprocessing                               60.64    72.48
           Direct Cross-lingual      MaltParser (+ Wapiti) + Pre- and postprocessing                59.78    70.80


    Interestingly, the performance of the MaltParser            to be parsed, which are still not available for Swiss
trained on the silver treebank reaches the same per-            German.
formance as the direct cross-lingual parsing approach
itself, which was used to generate the silver treebank:         6        Conclusion
LAS 57.10%. Given that 6,000 sentences do not con-              In this work, we experimented with a variety of cross-
stitute a large training set for a statistical parser, a        lingual approaches for parsing texts written in Swiss
parser could probably profit from additional related            German. For statistically driven systems, languages
Standard German material. However, combining the                with non-standardised orthography are a demanding
two training sets, i.e. the German Universal Depen-             task. Swiss German dialects feature challenging Nat-
dency treebank and the silver treebank gives slightly           ural Language Processing (NLP) problems with their
worse results (LAS 55.46%).                                     lack of orthographic spelling rules and a huge pronun-
                                                                ciation variety. This is a situation which leads to a
5.8 Future Work                                                 high degree of data sparseness and with it, a lack of
                                                                resources and tools for NLP.
There are several opportunities for further improve-               We tested a lexicalised annotation projection
ment. Concerning the annotation projection approach,            method as well as a delexicalised model transfer
the crucial alignment information needs to be im-               method. The annotation projection method requires
proved for example by ensembling over results from              parallel sentences in both the resource-rich and the
different word aligners. In cases where alignment               low-resourced language while the delexicalised model
does not work, adding further transfer and postpro-             transfer approach only requires a monolingual tree-
cessing rules would be important. In addition, a                bank of a closely related resource-rich language.
spelling normalisation strategy can help to deal with              The evaluation on a manually annotated gold stan-
the data sparseness imposed by the phonetic and                 dard consisting of 100 sentences shows a 60% La-
orthographic variability in Swiss German dialects.              belled Attachment Score (LAS) with negligible differ-
Moreover, the outputs of the three parsing approaches           ences between the different parsing approaches. How-
could be ensembled, e.g. via majority vote like for             ever, the annotation projection approach is more com-
alignment as aforementioned, to get rid of the weak-            plex than model transfer due to the transfer rules and
nesses of each approach. Furthermore, the silver tree-          the crucial word alignment process.
bank created could be manually corrected in order to               This work provides a first substantial step towards
generate a treebank which can be used as training set           closing a big gap in Natural Language Processing
for a monolingual dependency parser for Swiss Ger-              tools for Swiss German and provides data10 to work
man. Finally, once the data sparseness for Swiss Ger-           on further improvements.
man varieties is mitigated, modern neural methods
are promising as shown for example in the work by               Acknowledgments
Ammar et al. (2016). Ammar et al. train one mul-
                                                                We thank the AGORA project “Citizen Linguistics”
tilingual model that can be used to parse sentences
                                                                for making their translation data available and in par-
in several languages. In order to do so, they use
                                                                ticular all our volunteer translators. We also thank Re-
many resources including a bilingual dictionary for
                                                                nato Kaiser and Pedro Lenz for their permission to use
adding cross-lingual lexical information, and a mono-
                                                                their novels in our experiments.
lingual corpus for training word embeddings. Such
approaches need a big amount of data of the language                10
                                                                         https://github.com/noe-eva/SwissGermanUD


                                                           8
                                                           13
References                                                            German Society for Computational Linguistics and Lan-
                                                                      guage Technology. German Society for Computational
Noëmi Aepli, Tanja Samardz̆ić, and Ruprecht von Walden-             Linguistics and Language Technology, Duisburg-Essen,
  fels. 2014. Part-of-Speech Tag Disambiguation by                    Germany, pages 108–109.
  Cross-Linguistic Majority Vote. In First Workshop on
  Applying NLP Tools to Similar Languages, Varieties and           Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara
  Dialects (VarDial). Dublin.                                        Cabezas, and Okan Kolak. 2005. Bootstrapping parsers
                                                                     via syntactic projection across parallel texts. Natural
Waleed Ammar, George Mulcaire, Miguel Ballesteros,                   language engineering 11(03):311–325.
  Chris Dyer, and Noah A. Smith. 2016. Many languages,
  one parser. TACL 4:431–444.                                      Thomas Lavergne, Olivier Cappé, and François Yvon.
                                                                     2010. Practical Very Large Scale CRFs. In Proceedings
Reto Baumgartner. 2016. Morphological analysis and                   the 48th Annual Meeting of the Association for Com-
  lemmatization for Swiss German using weighted trans-               putational Linguistics (ACL). Uppsala, Sweden, pages
  ducers. In Stefanie Dipper, Friedrich Neubarth, and                504–513.
  Heike Zinsmeister, editors, Proceedings of the 13th
  Conference on Natural Language Processing (KON-                  VI Levenshtein. 1966. Binary Codes Capable of Correct-
  VENS 2016). Bochum, Germany.                                       ing Deletions, Insertions and Reversals. Soviet Physics
                                                                     Doklady 10:707.
Sabine Buchholz and Erwin Marsi. 2006. Conll-x shared
  task on multilingual dependency parsing. In In Proceed-          Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-
  ings of CoNLL. pages 149–164.                                      Brundage, Yoav Goldberg, Dipanjan Das, Kuzman
                                                                     Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar
Marie-Catherine de Marneffe, Timothy Dozat, Natalia Sil-             Täckström, Claudia Bedini, Núria Bertomeu Castelló,
  veira, Katri Haverinen, Filip Ginter, Joakim Nivre, and            and Jungmee Lee. 2013. Universal dependency annota-
  Christopher D. Manning. 2014. Universal stanford de-               tion for multilingual parsing. In Proceedings of the 51st
  pendencies: A cross-linguistic typology. In Proceed-               Annual Meeting of the Association for Computational
  ings of the Ninth International Conference on Language             Linguistics (Volume 2: Short Papers). pages 92–97.
  Resources and Evaluation (LREC-2014).
                                                                   Tahira Naseem, Harr Chen, Regina Barzilay, and Mark
Marie-Catherine de Marneffe, Bill MacCartney, and                    Johnson. 2010. Using universal linguistic knowledge to
  Christopher D. Manning. 2006. Generating typed de-                 guide grammar induction. In Proceedings of the 2010
  pendency parses from phrase structure parses. In 5th               Conference on Empirical Methods in Natural Language
  International Conference on Language Resources and                 Processing. pages 1234–1244.
  Evaluation (LREC 2006).                                          Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin-
                                                                      ter, Yoav Goldberg, Jan Hajic, Christopher D. Manning,
Marie-Catherine de Marneffe and Christopher D. Manning.               Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia
  2008. The Stanford typed dependencies representation.               Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Uni-
  In COLING Workshop on Cross-framework and Cross-                    versal dependencies v1: A multilingual treebank collec-
  domain Parser Evaluation.                                           tion. In Proceedings of the Tenth International Con-
                                                                      ference on Language Resources and Evaluation (LREC
Christa Dürscheid and Elisabeth Stark. 2011.
                                                                      2016). Paris, France.
  SMS4science: An international corpus-based tex-
  ting project and the specific challenges for multilingual        Joakim Nivre, Johan Hall, Sandra Kübler, Ryan Mc-
  Switzerland. Digital Discourse: Language in the New                 Donald, Jens Nilsson, Sebastian Riedel, and Deniz
  Media pages 299–320.                                                Yuret. 2007a.      The CoNLL 2007 shared task
                                                                      on dependency parsing.       In Proceedings of the
Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013.                CoNLL Shared Task Session of EMNLP-CoNLL
  A simple, fast, and effective reparameterization of IBM             2007. Prague, Czech Republic, pages 915–932.
  model 2. In HLT-NAACL. pages 644–648.                               http://www.aclweb.org/anthology/D/D07/D07-1096.
Nora Hollenstein and Noëmi Aepli. 2014. Compilation               Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev,
  of a swiss german dialect corpus and its application                GülŞen Eryigit, Sandra Kübler, Svetoslav Marinov,
  to pos tagging. In Proceedings of the First Work-                   and Erwin Marsi. 2007b. Maltparser: A language-
  shop on Applying NLP Tools to Similar Languages,                    independent system for data-driven dependency pars-
  Varieties and Dialects. Dublin, Ireland, pages 85–94.               ing. Natural Language Engineering 13(2):95–135.
  http://www.aclweb.org/anthology/W14-5310.                           https://doi.org/10.1017/S1351324906004505.
Nora Hollenstein and Noëmi Aepli. 2015. A resource for            Franz Josef Och and Hermann Ney. 2003. A system-
  natural language processing of swiss german dialects.               atic comparison of various statistical alignment models.
  In Proceedings of the International Conference of the               Computational Linguistics 29(1):19–51.


                                                              9
                                                              14
Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012.                  forschung/ressourcen/lexika/TagSets/
   A universal part-of-speech tagset. In Proceedings of              stts-1999.pdf.
   the Eighth International Conference on Language Re-
   sources and Evaluation (LREC-2012). pages 2089–                Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and
   2096.                                                            Regina Barzilay. 2008. Unsupervised multilingual
                                                                    learning for POS tagging. In Proceedings of the 2008
Rudolf Rosa, Ondřej Dušek, David Mareček, and Mar-               Conference on Empirical Methods in Natural Lan-
  tin Popel. 2012. Using parallel features in parsing of            guage Processing. Honolulu, Hawaii, pages 1041–1050.
  machine-translated sentences for correction of gram-              http://www.aclweb.org/anthology/D08-1109.
  matical errors. In Proceedings of the Sixth Work-
  shop on Syntax, Semantics and Structure in Statisti-            Elisabeth Stark, Simone Ueberwasser, and Anne Göhrig.
  cal Translation. Jeju, Republic of Korea, pages 39–48.             2014. Corpus ”What’s up, Switzerland?”. www.
  http://www.aclweb.org/anthology/W12-4205.                          whatsup-switzerland.ch.

Rudolf Rosa, Daniel Zeman, David Mareček, and                    Milan Straka and Jana Straková. 2017.      Tokeniz-
  Zdenek Žabokrtský. 2017.    Slavic forest, Norwe-               ing, pos tagging, lemmatizing and parsing ud 2.0
  gian wood.     In Proceedings of the Fourth Work-                 with udpipe. In Proceedings of the CoNLL 2017
  shop on NLP for Similar Languages, Varieties and                  Shared Task: Multilingual Parsing from Raw Text to
  Dialects (VarDial). Valencia, Spain, pages 210–219.               Universal Dependencies. Vancouver, Canada, pages
  http://www.aclweb.org/anthology/W17-1226.                         88–99. http://www.aclweb.org/anthology/K/K17/K17-
                                                                    3009.pdf.
Tanja Samardžić, Yves Scherrer, and Elvira Glaser. 2015.        Oscar Täckström, Ryan McDonald, and Joakim Nivre.
  Normalising orthographic and dialectal variants for the           2013. Target language adaptation of discriminative
  automatic processing of Swiss German. In Language                 transfer parsers. In Proceedings of the 2013 Con-
  and Technology Conference: Human Language Tech-                   ference of the North American Chapter of the Asso-
  nologies as a Challenge for Computer Science and Lin-             ciation for Computational Linguistics: Human Lan-
  guistics. Poznan, Poland, pages 294–298.                          guage Technologies. Atlanta, Georgia, pages 1061–
Tanja Samardzic, Yves Scherrer, and Elvira Glaser. 2016.            1071. http://www.aclweb.org/anthology/N13-1126.
  ArchiMob – a corpus of spoken Swiss German. In Pro-             Jörg Tiedemann. 2014. Rediscovering annotation projec-
  ceedings of the Tenth International Conference on Lan-              tion for cross-lingual parser induction. In Proceedings
  guage Resources and Evaluation (LREC 2016). Paris,                  of COLING 2014, the 25th International Conference
  France.                                                             on Computational Linguistics: Technical Papers. pages
                                                                      1854–1864.
Yves Scherrer. 2007. Adaptive string distance mea-
  sures for bilingual dialect lexicon induction.  In              Jörg Tiedemann. 2015. Cross-lingual dependency parsing
  Proceedings of the ACL 2007 Student Research                        with universal dependencies and predicted PoS labels.
  Workshop. Prague, Czech Republic, pages 55–60.                      In Proceedings of the Third International Conference
  http://www.aclweb.org/anthology/P/P07/P07-3010.                     on Dependency Linguistics (Depling 2015). pages 340–
                                                                      349.
Yves Scherrer. 2012. Machine translation into multiple di-
  alects: The example of Swiss German. In 7th SIDG                Jörg Tiedemann, Željko Agić, and Joakim Nivre.
  Congress - Dialect 2.0.                                             2014. Treebank translation for cross-lingual parser
                                                                      induction. In Proceedings of the Eighteenth Con-
Yves Scherrer. 2013.        Continuous variation in com-              ference on Computational Natural Language
  putational morphology - the example of Swiss Ger-                   Learning. Ann Arbor, Michigan, pages 130–140.
  man. In TheoreticAl and Computational MOrphol-                      http://www.aclweb.org/anthology/W14-1614.
  ogy: New Trends and Synergies (TACMO). 19th In-
  ternational Congress of Linguists, Genève, Suisse.             David Yarowsky, Grace Ngai, and Richard Wicentowski.
  http://hal.inria.fr/hal-00851251.                                 2001. Inducing multilingual text analysis tools via ro-
                                                                    bust projection across aligned corpora. In Proceedings
Yves Scherrer and Rambow Owen. 2010. Natural Lan-                   of the first international conference on Human language
  guage Processing for the Swiss German Dialect Area.               technology research. pages 1–8.
  In Proceedings of the Conference on Natural Language
  Processing (KONVENS). Saarbrücken, Germany, pages              Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić,
  93–102.                                                           Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves
                                                                    Scherrer, and Noëmi Aepli. 2017. Findings of the Var-
Anne Schiller, Simone Teufel, Christine Stöckert,                  Dial evaluation campaign 2017. In Proceedings of the
  and Christine Thielen. 1999.    Guidelines für                   Fourth Workshop on NLP for Similar Languages, Va-
  das Tagging deutscher Textkorpora mit STTS.                       rieties and Dialects (VarDial). Valencia, Spain, pages
  http://www.ims.uni-stuttgart.de/                                  1–15. http://www.aclweb.org/anthology/W17-1201.


                                                             10
                                                             15
D. Zeman and Philip Resnik. 2008. Cross-language parser
   adaptation between related languages. In Proceedings
   of the IJCNLP-08 Workshop on NLP for Less Privileged
   Languages. pages 35–42.
Daniel Zeman, Martin Popel, Milan Straka, Jan Ha-
  jic, Joakim Nivre, Filip Ginter, Juhani Luotolahti,
  Sampo Pyysalo, Slav Petrov, Martin Potthast, Fran-
  cis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna
  Nedoluzhko, Silvie Cinkova, Jan Hajic jr., Jaroslava
  Hlavacova, Václava Kettnerová, Zdenka Uresova, Jenna
  Kanerva, Stina Ojala, Anna Missilä, Christopher D.
  Manning, Sebastian Schuster, Siva Reddy, Dima
  Taji, Nizar Habash, Herman Leung, Marie-Catherine
  de Marneffe, Manuela Sanguinetti, Maria Simi, Hi-
  roshi Kanayama, Valeria dePaiva, Kira Droganova,
  Héctor Martı́nez Alonso, Çağr Çöltekin, Umut Suluba-
  cak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Bur-
  chardt, Kim Harris, Katrin Marheinecke, Georg Rehm,
  Tolga Kayadelen, Mohammed Attia, Ali Elkahky,
  Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael
  Mandl, Jesse Kirchner, Hector Fernandez Alcalde,
  Jana Strnadová, Esha Banerjee, Ruli Manurung, An-
  tonio Stella, Atsuko Shimada, Sookyoung Kwak,
  Gustavo Mendonca, Tatiana Lando, Rattima Nitis-
  aroj, and Josie Li. 2017. Conll 2017 shared task:
  Multilingual parsing from raw text to universal de-
  pendencies.     In Proceedings of the CoNLL 2017
  Shared Task: Multilingual Parsing from Raw Text to
  Universal Dependencies. Vancouver, Canada, pages
  1–19. http://www.aclweb.org/anthology/K/K17/K17-
  3001.pdf.


                                                              11
                                                              16