<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluating Neural Sequence Models for Splitting (Swiss) German Compounds</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Don</forename><surname>Tuggener</surname></persName>
							<email>don.tuggener@zhaw.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluating Neural Sequence Models for Splitting (Swiss) German Compounds</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DBB9BAE6366BEF00151402520CE353F3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:39+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper evaluates unsupervised and supervised neural sequence models for the task of splitting (Swiss) German compound words. The models are compared to a state-of-theart approach based on character ngrams and a simple heuristic that accesses a dictionary. We find that the neural models do not outperform the baselines on the German data, but excel when applied to out-of-domain data, i.e. splitting Swiss German compounds. We release our code and data 1 , namely the first annotated data set of Swiss German compounds.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Splitting compound words is an important task when setting up pipelines for Natural Language Processing of the (Swiss 2 ) German language. (Swiss) German features a long tail regarding word frequencies due to the phenomenon that compound words are not orthographically separated by whitespaces as in e.g. English (e.g. Autobahnraststätte vs. highway service area). Thus, when mapping words to lexical resources such as word nets or embeddings, it is likely that there are compounds which are not represented in the resource.</p><p>Neural sequence models capture characteristics of word or character sequences (ngrams) in a latent representation and are thus hypothetically well-suited for compound splitting. In this paper, we evaluate several neural sequence models on the task of (Swiss) German compound splitting. The models are compared to an unsupervised character ngram-based approach and a baseline that uses a dictionary. Furthermore, we present the first gold standard for splitting Swiss German compounds and evaluate the models on this newly available resource. Swiss German dialects feature no official spelling, and they are closely related to Standard German, but differ regarding some language changes throughout history, and we use the Swiss German compounds as a sort of out-of-domain test set for the models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Related Work</head><p>Several methods for automatic splitting of word compounds exist. One approach is to use a dictionary to perform a full morphological analysis <ref type="bibr" target="#b14">(Schmid et al., 2004)</ref>. Others apply corpora statistics <ref type="bibr" target="#b10">(Koehn and Knight, 2003)</ref> and combine them with linguistic heuristics <ref type="bibr" target="#b18">(Weller-Di Marco, 2017)</ref>. There are both supervised <ref type="bibr" target="#b0">(Alfonseca et al., 2008;</ref><ref type="bibr" target="#b8">Henrich and Hinrichs, 2011)</ref> and unsupervised approaches <ref type="bibr" target="#b11">(Macherey et al., 2011;</ref><ref type="bibr" target="#b17">Tuggener, 2016;</ref><ref type="bibr">Ziering and van der Plas, 2016)</ref> to the task. <ref type="bibr" target="#b13">Riedl and Biemann (2016)</ref>; <ref type="bibr">Ziering et al. (2016)</ref> explore distributional semantics on the premise that the constituents of a compound are semantically similar to the compound. Other work researches the semantic relation between the constituents of the compounds (Schulte im <ref type="bibr" target="#b15">Walde et al., 2016)</ref>. While most approaches focus on compounds in a single language, there exists work that explores the task in a multilingual context <ref type="bibr" target="#b0">(Alfonseca et al., 2008;</ref><ref type="bibr" target="#b11">Macherey et al., 2011)</ref>.</p><p>While there exists work on morphological segmentation <ref type="bibr" target="#b9">(Kann et al., 2016)</ref>, to the best of our knowledge, there is no prior work on using neural sequence models to identify head constituents of compounds. 1 2 Neural sequence models for compound splitting</p><p>We first introduce the neural sequence models that we apply in the experiments and how we adapt them to fit the task of (Swiss) German compound splitting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Unsupervised Recurrent Neural Network</head><p>The first model we explore is a character-based language model using a recurrent neural network (RNN).</p><p>Language models aim to predict the next token in a sequence given the sequence history. Different from ngram-based language models <ref type="bibr">(Heafield et al., 2013, e.g.)</ref>, RNN-based models feature a hidden state that is updated after consuming a token (a character or a word). Based on the hidden state, a probability distribution over the token vocabulary is calculated using the softmax function, and the most likely token is selected when generating sequences. During training, the difference between the predicted probability and the given next token constitutes the loss that is backpropagated to update the model parameters (i.e. weights). Such RNN-based language models have been shown to outperform ngram-based models in terms of perplexity on general language modelling tasks <ref type="bibr">(Mikolov et al., 2010, inter alia)</ref>.</p><p>To employ RNNs for unsupervised compound splitting, we exploit an implementation detail that is commonly used when working with such approaches: Before training an RNN on a given token sequence, a special END token is appended to the sequence. This token is inserted into the vocabulary of the model. When using the trained RNN to generate a sequence, the END token is used as a stopping criteria, i.e. if the END token is the most likely next token, generation is terminated and the sequence considered complete.</p><p>We adapt this idea and monitor the probability of emitting the END token when consuming a compound word character-wise. That is, in a character sequence x, we consider the position x i with the highest probability of emitting the END token as the split position:</p><formula xml:id="formula_0">argmax i p(END|x 0 . . . x i )<label>(1)</label></formula><p>Additionally, we are interested in positions in the sequence where the probability of generating the next given character is low, based on the hypothesis that that such positions are indicators for a suitable split position:</p><formula xml:id="formula_1">argmin i p(x i+1 |x 0 . . . x i ) (2)</formula><p>To combine the two features and determine the best split position, we sum the END token probability and the inverse of the probability of the next character at each position in the sequence x of length n and take the position with the highest score:</p><formula xml:id="formula_2">argmax i p(END|x i ) + (1 − p(x i+1 |x 0 . . . x i )) (3)</formula><p>During initial experiments, we found that this approach did indeed yield correct boundaries of free morphems in German compounds, but struggled to find the correct boundary for splitting when compounds consist of more then two such free morphems. For example, for the compound Autobahnraststätte (highway service area) with four free morphems (Auto, Bahn, Rast, Stätte), the approach identified Autobahnrast as the body and Stätte as the head, instead of Autobahn and Raststätte. We hypothesized that one flaw of the approach is that when determining the best split position i in a sequence x, only the character sequence up to position i, i.e. x 0 . . . x i , is considered, and the remaining string, x i+1 . . . x n , is neglected. Therefore, we introduced a backward-looking RNN that consumes the sequence x in reverse order, which constitutes a bi-directional RNN (biRNN). We calculate the same two features as for the forward-looking RNN for the backwardlooking RNN and sum the scores of the forward and backward-looking RNN for each position i in the sequence to determine the best position for a split.</p><p>Furthermore, we noted that the approach fails at placing boundaries correctly if the compounds contain so called Fugen elements (linking elements), as in e.g. Installation-s-Anweisung (installation instruction), where a Fugen-S glues together the free morphemes Installation and Anweisung. The approach places the split after the first free morpheme (Installation), and thus attaches the Fugen-S to the head (sanweisung), which yields an incorrect split in the result. Therefore, we applied a regular expression (capturing bound morphemes that often occur before compound boundaries, e.g. -ions, -täts, -keits) that aims to remove Fugen-S heuristically before identifying a 2 splitting position (e.g. Installationsanweisung → Installationanweisung).</p><p>For this approach, we only need a collection of German words to train the character-based RNNs, as it is unsupervised regarding the compound splitting task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Supervised Recurrent Neural Network</head><p>A natural extension to the neural character-based language model is to add supervision. To determine the split in the unsupervised model, we summed probabilities that we deemed relevant but did not train the RNN itself on the task of compound splitting. In the supervised approach, we use the hidden states of the trained character-based language model biRNNs when consuming a German compound word character-wise as features to train a binary classifier regarding the splitting decision. That is, at each position in the sequence, we concatenate the hidden states of the forward and backward RNN to create a feature vector. This vector is then fed to a fully connected layer, as shown in Figure <ref type="figure" target="#fig_1">1</ref>. For determining the split, we take the position in the sequence that has the highest probability according to the binary classifier. During training, the split position is known and used to calculate and backpropagate the loss. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Sequence-to-sequence with attention</head><p>An alternative supervised neural model is the Sequence-to-sequence model <ref type="bibr" target="#b16">(Sutskever et al., 2014)</ref>.</p><p>The model is applied to transform sequences into other sequences in e.g. Machine Translation <ref type="bibr" target="#b2">(Cho et al., 2014)</ref>. It consists of an encoder component that encodes the input sequence into a latent representation, and a decoder that generates the desired output sequence based on the encoded input.</p><p>The initial version of the seq2seq model encodes the whole input sequence into one vector (i.e. the last hidden state of the encoder) on which the generation of the output is based. To address this limitation, Bahdanau et al. ( <ref type="formula">2014</ref>) introduced an attention mechanism that lets the decoder peak at all hidden states in the input and apply a weight to each which represents its respective importance while generating the correct output token at each step during generation.</p><p>We apply this model to the compound splitting task by using the compounds as input and the head noun as the desired output. <ref type="foot" target="#foot_0">3</ref> We hypothesize that the attention mechanism is helpful in our case, because not all characters in the input sequence are equally important when identifying the split boundary. The attention mechanism enables the decoder to focus on different character groups, which, ideally, represent (groups of) relevant free morphems. We thus apply the seq2seq with attention model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>Having outlined our models, we now describe our data, the baselines, followed by evaluation results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data</head><p>We use the dataset discussed in <ref type="bibr" target="#b8">Henrich and Hinrichs (2011)</ref>, i.e. a list of 75 000 German compounds and their head nouns extracted form GermaNet <ref type="bibr" target="#b7">(Henrich and Hinrichs, 2010)</ref>, a German wordnet. <ref type="foot" target="#foot_1">4</ref> Henrich and Hinrichs (2011) used this resource to evaluate several approaches to compound splitting. They also included non-compounds in their evaluation, however, these are not provided in the resource. Therefore, we only compare the ability of our approaches to determine the correct split positions in known compounds (corresponding to the task in section 7.2 in Henrich and Hinrichs (2011)).<ref type="foot" target="#foot_2">5</ref> Furthermore, we remove compounds from the list that contain a hyphen, since splitting them at the hyphen is straight-forward. We randomly split the remaining 73 1333 compounds into 80% train and 20% test data.</p><p>Additionally, we created the first gold standard of Swiss German compounds and their head nouns. We extracted the 150 longest Swiss German words in the SB-CH corpus <ref type="bibr" target="#b5">(Grubenmann et al., 2018)</ref> which consists of Swiss German texts gathered from different sources (e.g. social media messages, business reports). We then manually annotated their (recursive) head nouns. For example, for the compound Wältuntergangskatastrophefilm (apocalypse catastrophe movie), we extract the heads Katastrophefilm and Film and consider both as a correct head in evaluation. We use this data as a kind of out-of-domain test set in the evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Baselines</head><p>We compare the neural models to two baselines:</p><p>Dictionary-based: The first baseline uses a dictionary and matches its words to the end of the compounds in the test set. If a word from the dictionary is found to be a substring at the end of a compound in the test set, it is taken as the head noun of that compound. Clearly, the order in which the dictionary is traversed matters, because the method stops after finding the first match. We experimented with sorting the dictionary by longest to shortest words and vice versa. Also, we included a subroutine to check if the found body (the remaining word after removing the head noun from the compound; potentially removing the Fugen-S) is also in the dictionary and favored those splits which have both body and head in the dictionary. The algorithm is outlined in Algorithm 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CharSplit:</head><p>The dictionary-based method is prone to fail where a head noun of a compound is never seen in the training corpus. CharSplit, proposed in <ref type="bibr" target="#b17">Tuggener (2016)</ref>, alleviates this by basing the splits on character ngrams rather than words. CharSplit if body ∈ D then 8: break calculates probabilities of ngrams (length 2 to 20) to occur at the beginning, middle, and end of words in an unlabeled corpus and calculates a splitting score at each character position in a (compound) word. <ref type="bibr" target="#b17">Tuggener (2016)</ref> found that this method outperforms several other splitters in the task of identifying correct splitting boundaries on the GermaNet data, achieving 95% accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Evaluation</head><p>Next, we evaluate the models regarding their ability to identify correct splitting boundaries on the German and Swiss German data. Since we only evaluate on known compounds, we measure accuracy, i.e. the percentage of compounds for which the methods find the correct split. Results are given in Table <ref type="table" target="#tab_0">1</ref>.</p><p>The first striking observation is that the dictionarybased method outperforms all other approaches on the GermaNet data. The reason for the high accuracy lies in the overlap of the words in the train and test set. While there exist no direct duplicates in the sets, almost all head nouns (97%) that need to be identified in the test set are included in the train set as constituents of a compound. For example, the test set contains the instance Fruchtnektar (fruit nectar) → Frucht → Nektar and the train set contains Bananennektar (banana nectar) → Banane → Nektar. As we see from the evaluation on the SB-CH data, the approach clearly fails when this overlap diminishes.</p><p>The CharSplit baseline performs 2 accuracy points below the dictionary method on the GermaNet data, but achieves better results on the Swiss German compounds. Relying on ngram representations of the German training data leads to an advantage when moving from the German training data to the related, but not identical domain of Swiss German compounds.</p><p>For the unsupervised biLSTM, we found that training for more than 1 epoch did not improve results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parameters</head><p>Acc. However, increasing the model size in terms of layers and hidden state size benefited accuracy to a certain extent, and removing the Fugen-S is vital. Also, we found that a vanilla biRNN fares poorly compared to using a biLSTM. The results show that for the German compounds, the model falls behind the baselines by a considerable margin. However, on the Swiss German compounds, it leads to a large improvement compared to the baselines (+20 accuracy points).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dictionary</head><p>For the combination of the unsupervised biLSTM coupled with the supervised MLP, we took the best performing biLSTM model (2 layers, 512 hidden size) to create the inputs for the MLP. We experimented with different numbers of layers, hidden state sizes, and epochs for the MLP and report results of the best performing configuration. On the German data, this approach is on par with the dictionary baseline. It also improves performance on the Swiss German compound by +11 accuracy points compared to only using the unsupervised biLSTM, similar to the gains on the German data.</p><p>The seq2seq model was trained independently from the other models. It outperforms the unsupervised biLSTM on the German data, but not on the out-ofdomain Swiss German test set. It falls behind the other supervised approach on both test sets, but features some other interesting properties discussed in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Output analysis</head><p>In this section, we qualitatively compare the outputs and other properties of the different approaches and discuss (dis)advantages of each.</p><p>In general, we hardly ever encountered system outputs that put splits at seemingly random positions within the compounds. The two main errors we observed are related to the Fugen-S and errors in choosing the correct split position for compounds consisting of more than two free morphemes.</p><p>Dictionary-lookup: As this method relies on a predefined list of know words, it is only able split compounds that have a head noun which is contained in the dictionary. Hence, a main error cause are unknown head nouns, which especially affects performance on the Swiss German data. The method is robust against Fugen-S, since it looks for known words at the compound end and hence never attaches a Fugen-S to a found head noun. However, it is the only approach that does not provide a way to distinguish compounds from non-compounds (i.e. it provides no measure for the confidence of a found split). That is, it does not provide any means to detect non-compounds. The word Fahrzeug (vehicle) e.g. includes the head noun Zeug (thing, but it is not a hypernym of the word Fahrzeug and hence cannot be split without straying away from its meaning, and the dictionary approach would perform this split. Since we only evaluate the splitting methods on known compounds, this disadvantage does not affect the accuracy of the approach in evaluation. In conclusion, this methods works well when train and test data feature a largely overlapping vocabulary and the approach is coupled with a method to distinguish compounds from non-compounds.</p><p>CharSplit: This approach often fails when encountering the Fugen-S, i.e. it frequently attaches it to the head noun if the regular expression is not able to remove it before the split. However, this model improves performance on out-of-domain data compared to the dictionary approach, as it relies on an ngram representation of the training data. This enables it to better handle vocabulary differences between training and testing domain compared to the dictionary baseline. As an unsupervised approach, it only relies on an unannotated corpus to calculate the ngram probability distributions.</p><p>Unsupervised biLSTM: Similar to CharSplit, the approach also often fails attaching Fugen-S to the body instead of to the head if removing the Fugen-S using the regular expression fails. In that regard, the unsupervised biLSTM output is similar to CharSplit. However, the latent representation of the character sequences (compared to the ngram representation in CharSplit that directly relies on the surface forms) allows it to better generalize to the out-of-domain data than CharSplit, yielding better splitting accuracy.</p><p>Unsupervised biLSTM + MLP: The combination of the unsupervised biLSTM and the supervised MLP yieled best overall results in our experiments. As one of the supervised approaches, it does not seem to struggle with occurrences of the Fugen-S and most errors stem from splitting compounds with multiple free morphemes at the wrong free morpheme boundary.</p><p>seq2seq: As the second supervised model, this model also does not struggle with removing Fugen elements. The main error cause is thus splitting compounds with multiple free morphems at the incorrect free morpheme boundary, e.g. for the compound Parkleitsystem (parking guiding system), it generates the head System instead of Leitsystem. A unique error source for this model is that it often generates spelling errors in the (otherwise) correctly identified heads, e.g. for Neukonstruktion (new construction), it produces konstroktion as the head. Clearly, generating the head noun instead of just finding its starting character seems to overcomplicate the task in our setting and has an unnecessary impact on performance. When moving to the out-of-domain data, the model shows a bigger performance impact than the biLSTM coupled with the MLP. One possible reason could be that the model does more heavily rely on full words during testing, i.e. the input representation is constructed over the full compound and the attention mechanism does not focus strongly enough on relevant character sequences. A seq2seq model that replaces the input representation with convolution operations is presented in <ref type="bibr" target="#b4">Gehring et al. (2017)</ref>. An interesting experiment would be to evaluate if character convolutions are the better option for representing important character sequences than the biLSTM. Finally, we noted that when training the model to generate both the body and head constituents given the compound, it often produces the correct lemmatization of the body by removing Fugen elements from it (-es, -en, -s, -n, e.g. bundesforschungsminister → bund, forschungsminister or landesverwaltung → land, verwaltung). It thus seems to be the appropriate candidate for a neural model if identifying the lemmatized body of a compound is a goal of splitting compounds.</p><p>Output combination: To gain insight on how complementary the outputs of the different models are, we calculated the percentage of compounds that are split identically by all models, which is 76.63%. This suggests that there is a substantial difference in the outputs. We also calculated the upper bound splitting performance for ensembling the models. To do so, we regarded a compound as split correctly if at least one of the outputs contained the correct split. This upper bound achieves an accuracy of 99.56% on the GermaNet data, which indicates that the model 6 outputs are sufficiently different to be combined in an ensemble system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>We evaluated three common neural sequence models for the task of splitting (Swiss) German compounds and compared them to ngram-and dictionary-based baselines. We found that for in-domain data (German compounds), the neural sequence models were not able to outperform the baselines, but that for outof-domain data (Swiss German), they achieved vastly better accuracy in identifying splitting positions. We hypothesize that the latent representations in the neural models of character sequences (in the form of vectors) allows them to process similar character sequences in a similar way. Thus, when applying models trained on German data to Swiss German data, the models are able to resolve the differences between the languages because slightly modified but similar and corresponding ngram sequences lead to similar hidden representations, which in turn yield similar outputs (i.e. splitting positions), where ngram-or dictionary approaches, which directly rely on surface forms, fail. Future work in the direction of <ref type="bibr" target="#b0">Alfonseca et al. (2008)</ref> will have to determine if the approaches based on neural sequence models is applicable to other language groups with similar spelling.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>In: Mark Cieliebak, Don Tuggener and Fernando Benites (eds.): Proceedings of the 3rd Swiss Text Analytics Conference (Swiss-Text 2018), Winterthur, Switzerland, June 2018 1 https://github.engineering.zhaw.ch/tuge/ neural_compound_splitter 2 Swiss German subsumes the Alemannic dialects spoken in Switzerland.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Supervised RNN-MLP architecture, shown for the compound Türschloss (door lock), where the split position is at the character s.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Algorithm 1</head><label>1</label><figDesc>Dictionary-based compound splitting 1: Create dictionaryD from train set 2: Sort D based on word length 3: procedure SPLIT(compound)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Models, parameters, and splitting accuracy for different approaches. Trained and evaluated on German compounds (GermaNet). Best results per category are in bold, best overall underlined.</figDesc><table><row><cell></cell><cell>check long words first</cell><cell>94.07</cell></row><row><cell>Dictionary</cell><cell>check short words first</cell><cell>58.35</cell></row><row><cell>Dictionary</cell><cell>+favor known words as body</cell><cell>95.18</cell></row><row><cell>CharSplit</cell><cell>include Fugen-S</cell><cell>90.65</cell></row><row><cell>CharSplit</cell><cell>remove Fugen-S</cell><cell>93.26</cell></row><row><cell>Unsup. biRNN</cell><cell>1 layer, 32 hidden size, 1 epoch</cell><cell>67.34</cell></row><row><cell>Unsup. biLSTM</cell><cell>1 layer, 32 hidden size, 1 epoch</cell><cell>78.13</cell></row><row><cell>Unsup. biLSTM</cell><cell>1 layer, 32 hidden size, 1 epoch, keep Fugen-S</cell><cell>71.51</cell></row><row><cell>Unsup. biLSTM</cell><cell>2 layer, 512 hidden size, 1 epoch</cell><cell>87.04</cell></row><row><cell cols="3">Unsup. biLSTM+MLP MLP: 3 layers, 128-64-16 hidden size, 2 epochs 95.10</cell></row><row><cell>Seq2seq+attention</cell><cell>1 layer, 128 hidden size, 3 epochs</cell><cell>92.40</cell></row><row><cell>Model</cell><cell>Acc.</cell><cell></cell></row><row><cell>Dictionary</cell><cell>20.67</cell><cell></cell></row><row><cell>CharSplit</cell><cell>36.67</cell><cell></cell></row><row><cell>Unsup. biLSTM</cell><cell>57.33</cell><cell></cell></row><row><cell cols="2">Unsup. biLSTM+MLP 68.67 Seq2seq+attention 52.00</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Splitting accuracy for different approaches evaluated on Swiss German compounds (SB-CH). Using best performing models trained on GermaNet.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">We also experimented with generating both the body and the head constituents, but obtained slightly better results with generating the head only.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">4 http://www.sfs.uni-tuebingen.de/lsd/ compounds.shtml, we use version v12.0 (2017) 3</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">Clearly, an end-to-end system for compound splitting needs to be able to identify whether a given word constitutes a compound. Unfortunately, we are not able to evaluate our approaches in this regard here. Another resource containing non-compounds is discussed in Escartín (2014), but it does not seem to be available.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Decompounding query keywords from compounding languages</title>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Alfonseca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Slaven</forename><surname>Bilac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Pharies</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics</title>
				<meeting>the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="253" to="256" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Neural machine translation by jointly learning to align and translate</title>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.0473</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Learning phrase representations using RNN encoder-decoder for statistical machine translation</title>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bart</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>¸alar Gülc ¸ehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fethi</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Holger</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics<address><addrLine>Doha, Qatar</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1724" to="1734" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Chasing the perfect splitter: A comparison of different compound splitting tools</title>
		<author>
			<persName><forename type="first">Carla</forename><surname>Parra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Escartín</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3340" to="3347" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Convolutional sequence to sequence learning</title>
		<author>
			<persName><forename type="first">Jonas</forename><surname>Gehring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Auli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Grangier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Denis</forename><surname>Yarats</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yann</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>ArXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SB-CH: A Swiss German corpus with sentiment annotations</title>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Grubenmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Don</forename><surname>Tuggener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jan</forename><surname>Pius Von Dniken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Deriu</surname></persName>
		</author>
		<author>
			<persName><surname>Cieliebak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th Language Resources and Evaluation Conference (LREC)</title>
				<meeting>the 11th Language Resources and Evaluation Conference (LREC)</meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Scalable modified Kneser-Ney language model estimation</title>
		<author>
			<persName><forename type="first">Kenneth</forename><surname>Heafield</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ivan</forename><surname>Pouzyrevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><forename type="middle">H</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philipp</forename><surname>Koehn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 51st Annual Meeting of the Association for Computational Linguistics<address><addrLine>Sofia, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="690" to="696" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">GernEdiT-the GermaNet editing tool</title>
		<author>
			<persName><forename type="first">Verena</forename><surname>Henrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Erhard</forename><surname>Hinrichs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL 2010 System Demonstrations</title>
				<meeting>the ACL 2010 System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="19" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Determining immediate constituents of compounds in GermaNet</title>
		<author>
			<persName><forename type="first">Verena</forename><surname>Henrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Erhard</forename><surname>Hinrichs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference Recent Advances in Natural Language Processing</title>
				<meeting>the International Conference Recent Advances in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="420" to="426" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Neural morphological analysis: Encodingdecoding canonical segments</title>
		<author>
			<persName><forename type="first">Katharina</forename><surname>Kann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Cotterell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hinrich</forename><surname>Schütze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="961" to="967" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Empirical methods for compound splitting</title>
		<author>
			<persName><forename type="first">Philipp</forename><surname>Koehn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Knight</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1</title>
				<meeting>the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="187" to="193" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Language-independent compound splitting with morphological operations</title>
		<author>
			<persName><forename type="first">Klaus</forename><surname>Macherey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">M</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Talbot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ashok</forename><forename type="middle">C</forename><surname>Popat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Franz</forename><surname>Och</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics</title>
				<meeting>the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1395" to="1404" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Recurrent neural network based language model</title>
		<author>
			<persName><forename type="first">Tomáš</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Karafiát</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lukáš</forename><surname>Burget</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jan</forename><surname>Černockỳ</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanjeev</forename><surname>Khudanpur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eleventh Annual Conference of the International Speech Communication Association</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Unsupervised compound splitting with distributional semantics rivals supervised methods</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Riedl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Biemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="617" to="622" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">SMOR: A German computational morphology covering derivation, composition and inflection</title>
		<author>
			<persName><forename type="first">Helmut</forename><surname>Schmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arne</forename><surname>Fitschen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ulrich</forename><surname>Heid</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC. Lisbon</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="1" to="263" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The role of modifier and head properties in predicting the compositionality of english and german noun-noun compounds: A vector-space perspective</title>
		<author>
			<persName><forename type="first">Sabine</forename><surname>Schulte Im Walde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anna</forename><surname>Hätty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Bott</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics</title>
				<meeting>the Fifth Joint Conference on Lexical and Computational Semantics</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="148" to="158" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Sequence to sequence learning with neural networks</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oriol</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Quoc V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3104" to="3112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Incremental Coreference Resolution for German</title>
		<author>
			<persName><forename type="first">Don</forename><surname>Tuggener</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
		<respStmt>
			<orgName>University of Zurich</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Simple compound splitting for German</title>
		<author>
			<persName><forename type="first">Marion</forename><surname>Weller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">-Di</forename><surname>Marco</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th Workshop on Multiword Expressions</title>
				<meeting>the 13th Workshop on Multiword Expressions<address><addrLine>MWE</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="161" to="166" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Top a splitter: Using distributional semantics for improving compound splitting</title>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Ziering</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lonneke</forename><surname>Van Der Plas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Workshop on Multiword Expressions</title>
				<meeting>the 12th Workshop on Multiword Expressions</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="50" to="55" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Towards unsupervised and language-independent compound splitting using inflectional morphological transformations</title>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Ziering</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lonneke</forename><surname>Van Der Plas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="644" to="653" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
