<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Samuel</forename><surname>Louvan</surname></persName>
							<email>slouvan@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fondazione</forename><forename type="middle">Bruno</forename><surname>Kessler</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bernardo</forename><forename type="middle">Magnini</forename><surname>Fondazione</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bruno</forename><surname>Kessler</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">12248AC51F5ADECED063A69E62A5B111</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data augmentation has shown potential in alleviating data scarcity for Natural Language Understanding (e.g. slot filling and intent classification) in task-oriented dialogue systems. As prior work has been mostly experimented on English datasets, we focus on five different languages, and consider a setting where limited data are available. We investigate the effectiveness of non-gradient based augmentation methods, involving simple text span substitutions and syntactic manipulations. Our experiments show that (i) augmentation is effective in all cases, particularly for slot filling; and (ii) it is beneficial for a joint intent-slot model based on multilingual BERT, both for limited data settings and when full training data is used.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Natural Language Understanding (NLU) in taskoriented dialogue systems is responsible for parsing user utterances to extract the intent of the user and the arguments of the intent (i.e. slots) into a semantic representation, typically a semantic frame <ref type="bibr" target="#b19">(Tur and De Mori, 2011)</ref>. For example, the utterance "Play Jeff Pilson on Youtube" has the intent PLAYMUSIC and "Youtube" as value for the slot SERVICE. As more skills are added to the dialogue system, the NLU model frequently needs to be updated to scale to new domains and languages, a situation which typically becomes problematic when labeled data are limited (data scarcity).</p><p>One way to combat data scarcity is through data augmentation (DA) techniques performing label preserving operations to produce auxiliary training data. Recently, DA has shown potential in tasks such as machine translation <ref type="bibr" target="#b4">(Fadaee et al., 2017)</ref>, constituency and dependency parsing <ref type="bibr" target="#b16">(S ¸ahin and Steedman, 2018;</ref><ref type="bibr" target="#b21">Vania et al., 2019)</ref>, and text classification <ref type="bibr" target="#b23">(Wei and Zou, 2019;</ref><ref type="bibr" target="#b10">Kumar et al., 2020)</ref>. As for slot filling (SF) and intent classification (IC), a number of DA methods have been proposed to generate synthetic utterances using sequence to sequence models <ref type="bibr" target="#b8">(Hou et al., 2018;</ref><ref type="bibr" target="#b25">Zhao et al., 2019</ref><ref type="bibr">), Conditional Variational Auto Encoder (Yoo et al., 2019)</ref>, or pretrained NLG models <ref type="bibr" target="#b14">(Peng et al., 2020)</ref>. To date, most of the DA methods are evaluated on English and it is not clear whether the same finding apply to other languages.</p><p>In this paper, we study the effectiveness of DA on several non-English datasets for NLU in task-oriented dialogue systems. We experiment with existing lightweight, non-gradient based, DA methods from <ref type="bibr" target="#b11">Louvan and Magnini (2020)</ref> that produces varying slot values through substitution and sentence structure manipulation by leveraging syntactic information from a dependency parser. We evaluate the DA methods on NLU datasets from five languages: Italian, Hindi, Turkish, Spanish, and Thai. The contributions of our paper are as follows: 1. We assess the applicability of DA methods for NLU in task-oriented dialogue systems in five languages. 2. We demonstrate that simple DA can improve performance on all languages despite different characteristic of the languages. 3. We show that a large pre-trained multilingual BERT (M-BERT) <ref type="bibr" target="#b3">(Devlin et al., 2019)</ref> can still benefit from DA, in particular for slot filling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Slot Filling and Intent Classification</head><p>The NLU component of a task-oriented dialogue system is responsible in a parsing user utterance into a semantic representation, such as semantic Given an input utterance of n tokens, x = (x 1 , x 2 , .., x n ), the system needs to assign a particular intent y intent for the whole utterance x and the corresponding slots that are mentioned in the utterance y slot = (y slot 1 , y slot 2 , .., y slot n ). In practice, IC is typically modeled as text classification and SF as a sequence tagging problem. As an example, for the utterance "Play Jeff Pilson on Youtube", y intent is PLAYMUSIC, as the intent of the user is to ask the system to play a song from a musician and y slot = ( O, B-ARTIST, I-ARTIST, O, B-SERVICE ) in which the artist is "Jeff Pilson" and the service is "Youtube"". Slot labels are in BIO format: B indicates the start of a slot span, I the inside of a span while O denotes that the word does not belong to any slot. Recent approaches for SF and IC are based on neural network methods that models SF and IC jointly <ref type="bibr" target="#b6">(Goo et al., 2018;</ref><ref type="bibr" target="#b1">Chen et al., 2019)</ref> by sharing model parameter among both tasks. methods from <ref type="bibr" target="#b11">Louvan and Magnini (2020)</ref>, that has shown promising results on English datasets. We describe the augmentation operations in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Slot Substitution (SLOT-SUB)</head><p>SLOT-SUB (Figure <ref type="figure" target="#fig_0">1</ref> left) performs augmentation by substituting a particular text span (slot-value pair) in an utterance with a different text span that is semantically consistent i.e., the slot label is the same. For example, in the utterance "Quali film animati stanno proiettando al cinema più vicino", one of the spans that can be substituted is the slot value pair (più vicino, SPATIAL RELATION). Then, we collect other spans in D in which the slot values are different, but the slot label is the same. For instance, we found the substitute candidates SP = {("distanza a piedi", SPATIAL RE-LATION), ("lontano", SPATIAL RELATION), ("nel quartiere", SPATIAL RELATION), . . . }, and then we sample one span to replace the original span in the utterance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">CROP and ROTATE</head><p>In order to produce sentence variations, we apply the crop and rotate operations proposed in S ¸ahin and Steedman (2018), which manipulate the sentence structure through its dependency parse tree. The goal of CROP (Figure <ref type="figure" target="#fig_0">1</ref> middle) is to simplify the sentence so that it focuses on a particular fragment (e.g. subject/object) by removing other fragments in the sentence. CROP uses the dependency tree to identify the fragment and then remove it and its children from the dependency tree. The ROTATE (Figure <ref type="figure" target="#fig_0">1</ref> right) operation is performed by moving a particular fragment (including subject/object) around the root of the tree, typically the verb in the sentence. For each operation, all possible combinations are generated, and one of them is picked randomly as the augmented sentence. Both CROP and ROTATE rely on the universal dependency labels <ref type="bibr" target="#b12">(Nivre et al., 2017)</ref> to identify relevant fragments, such as NSUBJ (nominal subject), DOBJ (direct object), OBJ (object), IOBJ (indirect object).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>Our primary goal is to verify the effectiveness of data augmentation on Italian, Hindi, Turkish, Spanish and Thai NLU datasets with limited labeled data. To this end, we compare the performance of a baseline NLU model trained on the original training data (D) with a NLU model that incorporates the augmented data as additional training instances (D + D ). To simulate the limited labeled data situation we randomly sample 10% of the training data for each dataset.</p><p>Baseline and Data Augmentation (DA) Methods. We use the state of the art BERT-based joint intent slot filling model <ref type="bibr" target="#b1">(Chen et al., 2019)</ref> as the baseline model. We leverage the pre-trained multilingual BERT (M-BERT), which is trained on 104 languages. During training, M-BERT is fine tuned on the slot filling and intent classification tasks. Given a sentence representation x = ([CLS] t 1 t 2 . . . t L ), we use the hidden state h [CLS] to predict the intent, and h t i to predict the slot label. As for DA methods, in addition to the methods described in Section 2, we add one configuration COMBINE, which combines the result of SLOT-SUB and ROTATE, as ROTATE obtains better results than CROP on the development set.</p><p>Settings. The model is trained with the BertAdam optimizer for 30 epochs with early stopping. The learning rate is set to 10 −5 and batch size is 16. All the hyperparameters are listed in Appendix A. For SLOT-SUB the number of augmentation per sentence N is tuned on the development set. To produce the dependency tree, we parse the sentence using Stanza <ref type="bibr" target="#b14">(Qi et al., 2020)</ref>. For both CROP and ROTATE we follow the default hyperparameters from S ¸ahin and Steedman (2018). We did not experiment with Thai for CROP and ROTATE as Thai is not supported by Stanza. The number of augmented sentences (D ) for each method is listed in Table <ref type="table" target="#tab_0">1</ref>. For evaluation metric, we use the standard CoNLL script to compute F1 score for slot filling and accuracy for intent classification.</p><p>Datasets. For the Italian language, we use the data from <ref type="bibr" target="#b0">Bellomaria et al. (2019)</ref>, translated from the English SNIPS dataset <ref type="bibr" target="#b2">(Coucke et al., 2018)</ref>. SNIPS has been widely used for evaluating NLU models and consists of utterances in multiple domains. As for Hindi and Turkish, we use the ATIS dataset from <ref type="bibr" target="#b20">Upadhyay et al. (2018)</ref>, derived from <ref type="bibr" target="#b7">Hemphill et al. (1990)</ref>. ATIS is a well known NLU dataset on flight domain. As for Spanish and Thai we use the FB dataset from <ref type="bibr" target="#b17">Schuster et al. (2019)</ref> that contains utterances in alarm, weather, and reminder domains. The overall statistics of the datasets are shown in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>The overall results reported in Table <ref type="table" target="#tab_1">2</ref> show that applying DA improves performance on slot filling and intent classification across all languages. In particular, for SF, the SLOT-SUB method yields the best result, while for IC, ROTATE obtains better performance compared to CROP in most cases. These results are consistent with the finding from Louvan and Magnini (2020) on the English dataset, where SLOT-SUB improves SF and CROP or ROTATE improve IC. In general, ROTATE is better than CROP for most cases on IC, and we think this is because CROP may change the intent of the original sentence. Intents typically depend on the occurrence of specific slots, so when the cropped part is a slot-value, it may change the sentence's overall semantics.</p><p>We can see that languages with different typological features (e.g. subject/verb/object ordering)<ref type="foot" target="#foot_1">1</ref> benefit from ROTATE operation for IC. This result suggests that augmentation can produce useful noise (regularization) for the model to alleviate overfitting when labeled data is limited. When we use COMBINE, it still helps the performance of both SF and IC, although the improvements are not as high as when only one of the augmentation method is applied. The only language that gets the benefits the most from COMBINE is Turkish. We hypothesize that as Turkish has a more flexible word order than the other languages it benefits the most when ROTATE is performed.</p><p>Performance on varying data size. To better understand the effectiveness of SLOT-SUB, we perform further analysis on different training data size (see Figure <ref type="figure" target="#fig_1">2</ref>). Overall, we observe that as we increase the training size, the benefit of SLOT-SUB is decreasing for all datasets. For some datasets, namely ATIS-HI and FB-ES, SLOT-SUB can cause performance drop for larger data size, although it is reasonably small (less than 1 F1 point). FB-TH consistently benefits from SLOT-SUB even when full training data is used. Until which training data size the improvement is significant vary across datasets<ref type="foot" target="#foot_2">2</ref> . For SNIPS-IT, improvement is clear for all training data size and they are statistically significant up until the training data size is 80%. For ATIS-HI improvements are significant until data size of 40%. As for FB datasets, improvements are significant only until the training data size is 10%. Overall, we can see that SLOT-SUB is effective for cases where data is scarce (5%, 10%), while it is still relatively robust for larger data size on all datasets. Performance on different numbers of augmentation per utterance (N ). We examine the effect of a larger number of augmentations per utterance (N ) to the model performance, specifically for SF (see Figure <ref type="figure" target="#fig_2">3</ref>). For FB-ES, similarly to the results in Table <ref type="table" target="#tab_1">2</ref>, increasing N does not affect the performance. For the other datasets, increasing N brings performance improvement. For ATIS-HI, SNIPS-IT, and FB-TH the trend is that, as we increase N , performance goes up and plateau. For ATIS-TR, changing N does not really affect the gain of the performance as the performance trend is quite steady across number of augmentations. For most combinations of N in each dataset (except FB-ES), the difference between the performance of model that using SLOT-SUB and the model that does not use SLOT-SUB is significant<ref type="foot" target="#foot_3">3</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>Data augmentation methods that has been proposed in NLP aims to automatically produce additional training data through different kinds of methods ranging from simple word substitution <ref type="bibr" target="#b23">(Wei and Zou, 2019)</ref> to more complex methods that aims to produce semantically preserving sentence generation <ref type="bibr" target="#b8">(Hou et al., 2018;</ref><ref type="bibr" target="#b5">Gao et al., 2020)</ref>. In the context of slot filling and intent classification, recent augmentation methods typically apply deep learning models to produce augmented utterances. <ref type="bibr" target="#b8">Hou et al. (2018)</ref> proposes a two-stages methods to produce the delexicalized utterances generation and slot values realization. Their method is based on a sequence to sequence based model <ref type="bibr" target="#b18">(Sutskever et al., 2014)</ref> to produce a paraphrase of an utterance with its slot values placeholder (delexicalized) for a given intent. For the slot values lexicalization, they use the slot values in the training data that occur in similar contexts. <ref type="bibr" target="#b25">Zhao et al. (2019)</ref>   <ref type="formula">2020</ref>) make use of Transformer <ref type="bibr" target="#b22">(Vaswani et al., 2017)</ref> based pre-trained NLG namely GPT-2 <ref type="bibr" target="#b15">(Radford et al., 2019)</ref>, and fine-tune it to slot filling dataset to produce synthetic utterances. We consider these deep learning based approaches as heavyweight as they often require several stages in the augmentation process namely generating augmentation candidates, ranking and filtering the candidates before producing the final augmented data. Consequently, the computation time of these approaches is generally more expensive as separate training is required to train the augmentation and joint SF-IC models. Recent work from Louvan and Magnini (2020) apply a set of lightweight methods in which most of the augmentation methods do not require model training. The augmentation methods focus on varying the slot values through substitution mechanisms and varying sentence structure through dependency tree manipulation. While the methods are relatively simple it obtains competitive results with deep learning based approaches on the standard English slot filling benchmark datasets namely ATIS <ref type="bibr" target="#b7">(Hemphill et al., 1990)</ref>, SNIPS <ref type="bibr" target="#b2">(Coucke et al., 2018)</ref>, and FB <ref type="bibr" target="#b17">(Schuster et al., 2019)</ref> datasets.</p><p>Existing methods mostly evaluate their approaches on English datasets, and little work has been done on other languages. Our work focuses on investigating the effect of data augmentation on five non-English languages. We apply a subset of lightweight augmentation methods from Louvan and Magnini (2020) that do not require separate model training to produce augmentation data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>We evaluate the effectiveness of data augmentation for slot filling and intent classification tasks in five typologically diverse languages. Our results show that by applying simple augmentation, namely slot values substitutions and dependency tree manipulations, we can obtain substantial improvement in most cases when only small amount of training data is available. We also show that a large pre-trained multilingual BERT benefits from data augmentation.   </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Augmentation operations performed on an utterance, "Quali film animati stanno proiettando al cinema piu vicino" ("Which animated films are showing at the nearest cinema"). The utterance is taken from the Italian SNIPS dataset.</figDesc><graphic coords="2,72.01,62.81,453.52,158.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Improvement (∆F 1) obtained by SLOT-SUB (SS) on different training data size. Positive numbers mean that the model with SS yields gain.</figDesc><graphic coords="4,307.28,62.81,218.27,145.51" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Gain (∆F 1) obtained by SLOT-SUB (SS) on various number of augmented sentence (N). Positive numbers mean that the model with SS yields gain.</figDesc><graphic coords="4,307.28,535.50,218.27,120.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>trains a sequence to sequence model with training instances that consist of a pair of atomic templates of dialogue acts and its sentence realization. Yoo et al. (2019) proposes a solution by extending Variational Auto Encoder (VAE) (Kingma and Welling, 2014) into a Conditional VAE (CVAE) to generate synthetic utterances. The CVAE controls the utterance generation by conditioning on the intent and slot labels during model training. Recent work from Peng et al. (</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Statistics on the datasets. #train indicates our limited training data setup (10% of full training data). D is produced by tuning the number of augmentations per utterance (N ) on the dev set.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>#Label</cell><cell></cell><cell cols="4">#Utterances (D)</cell><cell></cell><cell>#Augmented Utterances (D )</cell></row><row><cell cols="2">Dataset</cell><cell cols="5">Language #slot #intent #train</cell><cell cols="2">#dev</cell><cell cols="3">#test #SLOT-SUB #CROP #ROTATE</cell></row><row><cell cols="3">SNIPS-IT Italian</cell><cell></cell><cell>39</cell><cell>7</cell><cell>574</cell><cell></cell><cell>700</cell><cell>698</cell><cell></cell><cell>5,404</cell><cell>1,431</cell><cell>1,889</cell></row><row><cell cols="2">ATIS-HI</cell><cell>Hindi</cell><cell></cell><cell>73</cell><cell>17</cell><cell>176</cell><cell></cell><cell>440</cell><cell>893</cell><cell></cell><cell>1,286</cell><cell>460</cell><cell>472</cell></row><row><cell cols="2">ATIS-TR</cell><cell>Turkish</cell><cell></cell><cell>70</cell><cell>17</cell><cell>99</cell><cell></cell><cell>248</cell><cell>715</cell><cell></cell><cell>144</cell><cell>161</cell><cell>194</cell></row><row><cell cols="2">FB-ES</cell><cell cols="2">Spanish</cell><cell>11</cell><cell>12</cell><cell cols="4">361 1,983 3,043</cell><cell></cell><cell>1,455</cell><cell>769</cell><cell>1,028</cell></row><row><cell cols="2">FB-TH</cell><cell>Thai</cell><cell></cell><cell>8</cell><cell>10</cell><cell cols="4">215 1,235 1,692</cell><cell></cell><cell>781</cell><cell>-</cell><cell>-</cell></row><row><cell>Model</cell><cell>DA</cell><cell></cell><cell cols="2">SNIPS-IT</cell><cell cols="2">ATIS-HI</cell><cell></cell><cell></cell><cell cols="2">ATIS-TR</cell><cell>FB-ES</cell><cell>FB-TH</cell></row><row><cell></cell><cell></cell><cell></cell><cell>Slot</cell><cell>Intent</cell><cell>Slot</cell><cell cols="2">Intent</cell><cell></cell><cell>Slot</cell><cell>Intent</cell><cell>Slot</cell><cell>Intent</cell><cell>Slot</cell><cell>Intent</cell></row><row><cell cols="2">M-BERT None</cell><cell></cell><cell>78.25</cell><cell>94.99</cell><cell>69.57</cell><cell cols="2">86.57</cell><cell cols="2">64.36</cell><cell>78.98</cell><cell>84.13</cell><cell>97.68</cell><cell>56.06</cell><cell>89.80</cell></row><row><cell></cell><cell cols="5">SLOT-SUB 81.97  † 94.93 72.44  †</cell><cell cols="2">87.29</cell><cell cols="2">66.60  †</cell><cell>79.85</cell><cell>84.27</cell><cell>97.72</cell><cell>59.68  † 91.42  †</cell></row><row><cell></cell><cell>CROP</cell><cell></cell><cell cols="2">80.12  † 94.60</cell><cell>70.04</cell><cell cols="2">86.92</cell><cell cols="2">65.11</cell><cell>79.48</cell><cell>83.85 98.08  †</cell><cell>-</cell><cell>-</cell></row><row><cell></cell><cell cols="2">ROTATE</cell><cell cols="2">79.24  † 95.37</cell><cell>70.69</cell><cell cols="2">87.60  †</cell><cell cols="2">65.20</cell><cell>80.06</cell><cell>83.28</cell><cell>98.20  †</cell><cell>-</cell><cell>-</cell></row><row><cell></cell><cell cols="2">COMBINE</cell><cell cols="3">81.27  † 95.00 72.13  †</cell><cell cols="2">86.93</cell><cell cols="4">66.68  † 81.12  † 83.67</cell><cell>97.94</cell><cell>-</cell><cell>-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Performance comparison of the baseline and augmentation methods on the test set. F1 score is used for slot filling and accuracy for intent classification. Scores are the average of 10 different runs. † indicates statistically significant improvement over the baseline (p-value &lt; 0.05 according to Wilcoxon signed rank test).</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>List of hyperparameters used for the BERT model and data augmentation methods Appendix B. Statistical Significance</figDesc><table><row><cell>Dataset</cell><cell>Nb Aug</cell><cell>p-value</cell></row><row><cell>ATIS-TR</cell><cell>2</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>5</cell><cell>0.01251531869</cell></row><row><cell></cell><cell>10</cell><cell>0.006910429808</cell></row><row><cell></cell><cell>20</cell><cell>0.5001842571</cell></row><row><cell></cell><cell>25</cell><cell>0.07961580146</cell></row><row><cell>ATIS-HI</cell><cell>2</cell><cell>0.1097446387</cell></row><row><cell></cell><cell>5</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>25</cell><cell>0.04311444678</cell></row><row><cell>SNIPS-IT</cell><cell>2</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>5</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>25</cell><cell>0.04311444678</cell></row><row><cell>FB-ES</cell><cell>2</cell><cell>0.0663160313</cell></row><row><cell></cell><cell>5</cell><cell>0.02831405495</cell></row><row><cell></cell><cell>10</cell><cell>0.09260069782</cell></row><row><cell></cell><cell>20</cell><cell>0.3452310718</cell></row><row><cell></cell><cell>25</cell><cell>0.07961580146</cell></row><row><cell>FB-TH</cell><cell>2</cell><cell>0.03665792867</cell></row><row><cell></cell><cell>5</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>25</cell><cell>0.04311444678</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5 :</head><label>5</label><figDesc>The p-values of statistical tests on the experiments on Figure3</figDesc><table><row><cell>Dataset</cell><cell>Training Size (%)</cell><cell>p-value</cell></row><row><cell>ATIS-HI</cell><cell>5</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>40</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>80</cell><cell>0.1380107376</cell></row><row><cell></cell><cell>100</cell><cell>0.2733216783</cell></row><row><cell>ATIS-TR</cell><cell>5</cell><cell>0.224915884</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.7150006547</cell></row><row><cell></cell><cell>40</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>80</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>100</cell><cell>0.1797124949</cell></row><row><cell>SNIPS-IT</cell><cell>5</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>40</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>80</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>100</cell><cell>0.04311444678</cell></row><row><cell>FB-ES</cell><cell>5</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>10</cell><cell>0.02831405495</cell></row><row><cell></cell><cell>20</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>40</cell><cell>0.1755543028</cell></row><row><cell></cell><cell>80</cell><cell>0.1380107376</cell></row><row><cell></cell><cell>100</cell><cell>0.1797124949</cell></row><row><cell>FB-TH</cell><cell>5</cell><cell>0.04311444678</cell></row><row><cell></cell><cell>10</cell><cell>0.005062032126</cell></row><row><cell></cell><cell>20</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>40</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>80</cell><cell>0.1797124949</cell></row><row><cell></cell><cell>100</cell><cell>0.10880943</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 :</head><label>4</label><figDesc>The p-values of statistical tests on the experiments on Figure2.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">Data Augmentation (DA) MethodsDA aims to perform semantically preserving transformations on the training data D to produce auxiliary data D . The union of D and D is then used to train a particular NLU model. For each utterance in D, we produce N augmented utterances by applying a specific augmentation operation. We adopt a subset of existing augmentation</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">Italian, Spanish, and Thai are SVO languages while Hindi and Turkish are SOV languages.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">For more details of the p-value of the statistical tests please refer to Appendix B</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">For more details of the p-value of the statistical tests please refer to Appendix B</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank Valentina Bellomaria for providing the Italian SNIPS dataset. We thank Clara Vania for the feedback on the early draft of the paper.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Almawaveslu: A new dataset for SLU in italian</title>
		<author>
			<persName><forename type="first">Valentina</forename><surname>Bellomaria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giuseppe</forename><surname>Castellucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Favalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raniero</forename><surname>Romagnoli</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Italian Conference on Computational Linguistics</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">Raffaella</forename><surname>Bernardi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Roberto</forename><surname>Navigli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Giovanni</forename><surname>Semeraro</surname></persName>
		</editor>
		<meeting>the Sixth Italian Conference on Computational Linguistics<address><addrLine>Bari, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-11-13">2019. November 13-15. 2019</date>
			<biblScope unit="volume">2481</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Qian</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhu</forename><surname>Zhuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wen</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1902.10909</idno>
		<title level="m">Bert for joint intent classification and slot filling</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Snips voice platform: an embedded spoken language understanding system for privateby-design voice interfaces</title>
		<author>
			<persName><forename type="first">Alice</forename><surname>Coucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alaa</forename><surname>Saade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adrien</forename><surname>Ball</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Théodore</forename><surname>Bluche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexandre</forename><surname>Caulier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Leroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clément</forename><surname>Doumouro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thibault</forename><surname>Gisselbrecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Caltagirone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thibaut</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maël</forename><surname>Primet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joseph</forename><surname>Dureau</surname></persName>
		</author>
		<idno>ArXiv, abs/1805.10190</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-06">2019. June</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Data augmentation for low-resource neural machine translation</title>
		<author>
			<persName><forename type="first">Marzieh</forename><surname>Fadaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arianna</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christof</forename><surname>Monz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017</title>
		<title level="s">Short Papers</title>
		<editor>
			<persName><forename type="first">Regina</forename><surname>Barzilay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Min-Yen</forename><surname>Kan</surname></persName>
		</editor>
		<meeting>the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-07-30">2017. July 30 -August 4</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="567" to="573" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Paraphrase augmented task-oriented dialog generation</title>
		<author>
			<persName><forename type="first">Silin</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yichi</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhijian</forename><surname>Ou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhou</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online</title>
				<editor>
			<persName><forename type="first">Dan</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Joyce</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Natalie</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Joel</forename><forename type="middle">R</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2020-07-05">2020. July 5-10, 2020</date>
			<biblScope unit="page" from="639" to="649" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Slot-gated modeling for joint slot filling and intent prediction</title>
		<author>
			<persName><forename type="first">Chih-Wen</forename><surname>Goo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guang</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yun-Kai</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Li</forename><surname>Huo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tsung-Chieh</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Keng-Wei</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yun-Nung</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="753" to="757" />
		</imprint>
	</monogr>
	<note>Short Papers</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The ATIS spoken language systems pilot corpus</title>
		<author>
			<persName><forename type="first">Charles</forename><forename type="middle">T</forename><surname>Hemphill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><forename type="middle">J</forename><surname>Godfrey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">George</forename><forename type="middle">R</forename><surname>Doddington</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Speech and Natural Language: Proceedings of a Workshop Held at Hidden</title>
				<meeting><address><addrLine>Valley, Pennsylvania, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Morgan Kaufmann</publisher>
			<date type="published" when="1990-06-24">1990. June 24-27, 1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Sequence-to-sequence data augmentation for dialogue language understanding</title>
		<author>
			<persName><forename type="first">Yutai</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yijia</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wanxiang</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ting</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Computational Linguistics</title>
				<meeting>the 27th International Conference on Computational Linguistics<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018-08">2018. August</date>
			<biblScope unit="page" from="1234" to="1245" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Autoencoding variational bayes</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Max</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Welling</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2nd International Conference on Learning Representations, ICLR 2014</title>
		<title level="s">Conference Track Proceedings</title>
		<editor>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</editor>
		<meeting><address><addrLine>Banff, AB, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-04-14">2014. April 14-16, 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">Varun</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ashutosh</forename><surname>Choudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eunah</forename><surname>Cho</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.02245</idno>
		<title level="m">Data augmentation using pre-trained transformer models</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">Samuel</forename><surname>Louvan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bernardo</forename><surname>Magnini</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2009.03695.PACLIC2020-The34thPacificAsia" />
		<title level="m">Simple is better! lightweight data augmentation for low resource slot filling and intent classification</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Conference on Language, Information and Computation</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">Joakim</forename><surname>Nivre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Željko</forename><surname>Agić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lars</forename><surname>Ahrenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lene</forename><surname>Antonsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Jesus Aranzabe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Masayuki</forename><surname>Asahara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luma</forename><surname>Ateyah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohammed</forename><surname>Attia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aitziber</forename><surname>Atutxa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liesbeth</forename><surname>Augustinus</surname></persName>
		</author>
		<title level="m">Universal dependencies 2</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Data augmentation for spoken language understanding via pretrained models</title>
		<author>
			<persName><forename type="first">Baolin</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chenguang</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianfeng</forename><surname>Gao</surname></persName>
		</author>
		<idno>CoRR, abs/2004.13952</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Stanza: A python natural language processing toolkit for many human languages</title>
		<author>
			<persName><forename type="first">Peng</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuhao</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuhui</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Bolton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2020-07">2020. July</date>
			<biblScope unit="page" from="101" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">Alec</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rewon</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dario</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Data augmentation via dependency tree morphing for low-resource languages</title>
		<author>
			<persName><forename type="first">Gözde</forename><surname>Gül</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>¸ahin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Steedman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018-10">2018. October-November</date>
			<biblScope unit="page" from="5004" to="5009" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Cross-lingual transfer learning for multilingual task oriented dialog</title>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sonal</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rushin</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mike</forename><surname>Lewis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-06">2019. June</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="3795" to="3805" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Sequence to sequence learning with neural networks</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oriol</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Quoc</surname></persName>
		</author>
		<author>
			<persName><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014</title>
				<editor>
			<persName><forename type="first">Zoubin</forename><surname>Ghahramani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Max</forename><surname>Welling</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Corinna</forename><surname>Cortes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Neil</forename><forename type="middle">D</forename><surname>Lawrence</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Kilian</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</editor>
		<meeting><address><addrLine>Montreal, Quebec, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-12-08">2014. December 8-13 2014</date>
			<biblScope unit="page" from="3104" to="3112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Spoken language understanding: Systems for extracting semantic information from speech</title>
		<author>
			<persName><forename type="first">Gokhan</forename><surname>Tur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Renato</forename><forename type="middle">De</forename><surname>Mori</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>John Wiley &amp; Sons</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">almost) zero-shot cross-lingual spoken language understanding</title>
		<author>
			<persName><forename type="first">Shyam</forename><surname>Upadhyay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manaal</forename><surname>Faruqui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gokhan</forename><surname>Tür</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hakkani-Tür</forename><surname>Dilek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Larry</forename><surname>Heck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="6034" to="6038" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages</title>
		<author>
			<persName><forename type="first">Clara</forename><surname>Vania</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yova</forename><surname>Kementchedjhieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anders</forename><surname>Søgaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Lopez ; Kentaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jing</forename><surname>Inui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vincent</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaojun</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><surname>Wan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-11-03">2019. November 3-7, 2019</date>
			<biblScope unit="page" from="1105" to="1116" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">Ashish</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Noam</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Niki</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jakob</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Llion</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aidan</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lukasz</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Illia</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017</title>
				<editor>
			<persName><forename type="first">Isabelle</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Samy</forename><surname>Ulrike Von Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Hanna</forename><forename type="middle">M</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Rob</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">V N</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Roman</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><surname>Garnett</surname></persName>
		</editor>
		<meeting><address><addrLine>Long Beach, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-12">2017. December 2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">EDA: easy data augmentation techniques for boosting performance on text classification tasks</title>
		<author>
			<persName><forename type="first">Jason</forename><forename type="middle">W</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Zou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019</title>
				<editor>
			<persName><forename type="first">Kentaro</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Jing</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Vincent</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Xiaojun</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-11-03">2019. November 3-7, 2019</date>
			<biblScope unit="page" from="6381" to="6387" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Data augmentation for spoken language understanding via joint variational generation</title>
		<author>
			<persName><forename type="first">Min</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Youhyun</forename><surname>Yoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sang-Goo</forename><surname>Shin</surname></persName>
		</author>
		<author>
			<persName><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019</title>
				<meeting><address><addrLine>Honolulu, Hawaii, USA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2019-01-27">2019. January 27 -February 1, 2019</date>
			<biblScope unit="page" from="7402" to="7409" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Data augmentation with atomic templates for spoken language understanding</title>
		<author>
			<persName><forename type="first">Zijian</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Su</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019</title>
				<editor>
			<persName><forename type="first">Kentaro</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Jing</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Vincent</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Xiaojun</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019<address><addrLine>Hong Kong, China; November</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019. 3-7, 2019</date>
			<biblScope unit="page" from="3635" to="3641" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
