<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samuel Louvan</string-name>
          <email>slouvan@fbk.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Magnini</string-name>
          <email>magnini@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento</institution>
          ,
          <addr-line>Fondazione Bruno Kessler</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data augmentation has shown potential in alleviating data scarcity for Natural Language Understanding (e.g. slot filling and intent classification) in task-oriented dialogue systems. As prior work has been mostly experimented on English datasets, we focus on five different languages, and consider a setting where limited data are available. We investigate the effectiveness of non-gradient based augmentation methods, involving simple text span substitutions and syntactic manipulations. Our experiments show that (i) augmentation is effective in all cases, particularly for slot filling; and (ii) it is beneficial for a joint intent-slot model based on multilingual BERT, both for limited data settings and when full training data is used.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Natural Language Understanding (NLU) in
taskoriented dialogue systems is responsible for
parsing user utterances to extract the intent of the
user and the arguments of the intent (i.e. slots)
into a semantic representation, typically a
semantic frame
        <xref ref-type="bibr" rid="ref20">(Tur and De Mori, 2011)</xref>
        . For example,
the utterance “Play Jeff Pilson on Youtube” has the
intent PLAYMUSIC and “Youtube” as value for the
slot SERVICE. As more skills are added to the
dialogue system, the NLU model frequently needs to
be updated to scale to new domains and languages,
a situation which typically becomes problematic
when labeled data are limited (data scarcity).
      </p>
      <p>
        One way to combat data scarcity is through
data augmentation (DA) techniques performing
label preserving operations to produce auxiliary
training data. Recently, DA has shown potential
in tasks such as machine translation
        <xref ref-type="bibr" rid="ref5">(Fadaee et
al., 2017)</xref>
        , constituency and dependency parsing
        <xref ref-type="bibr" rid="ref17 ref22 ref9">( S¸ahin and Steedman, 2018; Vania et al., 2019)</xref>
        ,
and text classification
        <xref ref-type="bibr" rid="ref1 ref11 ref18 ref2 ref22 ref24 ref25 ref26 ref4">(Wei and Zou, 2019;
Kumar et al., 2020)</xref>
        . As for slot filling (SF) and
intent classification (IC), a number of DA
methods have been proposed to generate synthetic
utterances using sequence to sequence models
        <xref ref-type="bibr" rid="ref26 ref9">(Hou
et al., 2018; Zhao et al., 2019)</xref>
        , Conditional
Variational Auto Encoder
        <xref ref-type="bibr" rid="ref25">(Yoo et al., 2019)</xref>
        , or
pretrained NLG models
        <xref ref-type="bibr" rid="ref14 ref15">(Peng et al., 2020)</xref>
        . To date,
most of the DA methods are evaluated on English
and it is not clear whether the same finding apply
to other languages.
      </p>
      <p>
        In this paper, we study the effectiveness of
DA on several non-English datasets for NLU in
task-oriented dialogue systems. We experiment
with existing lightweight, non-gradient based, DA
methods from Louvan and Magnini (2020) that
produces varying slot values through substitution
and sentence structure manipulation by leveraging
syntactic information from a dependency parser.
We evaluate the DA methods on NLU datasets
from five languages: Italian, Hindi, Turkish,
Spanish, and Thai. The contributions of our paper are
as follows:
1. We assess the applicability of DA methods for
NLU in task-oriented dialogue systems in five
languages.
2. We demonstrate that simple DA can improve
performance on all languages despite different
characteristic of the languages.
3. We show that a large pre-trained multilingual
BERT (M-BERT)
        <xref ref-type="bibr" rid="ref4">(Devlin et al., 2019)</xref>
        can still
benefit from DA, in particular for slot filling.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Slot Filling and Intent Classification</title>
      <p>The NLU component of a task-oriented dialogue
system is responsible in a parsing user utterance
into a semantic representation, such as semantic</p>
      <p>Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
frame. The semantic frame conveys information,
namely the user intent and the corresponding
arguments of the intent. Extracting such information
involves slot filling (SF) and intent classification
(IC) tasks.</p>
      <p>
        Given an input utterance of n tokens, x =
(x1; x2; ::; xn), the system needs to assign a
particular intent yintent for the whole utterance x and the
corresponding slots that are mentioned in the
utterance yslot = (y1slot; y2slot; ::; yslot). In practice, IC
n
is typically modeled as text classification and SF
as a sequence tagging problem. As an example,
for the utterance “Play Jeff Pilson on Youtube”,
yintent is PLAYMUSIC, as the intent of the user is
to ask the system to play a song from a musician
and yslot = ( O, B-ARTIST, I-ARTIST,
O, B-SERVICE ) in which the artist is “Jeff
Pilson” and the service is “Youtube””. Slot labels
are in BIO format: B indicates the start of a slot
span, I the inside of a span while O denotes that
the word does not belong to any slot. Recent
approaches for SF and IC are based on neural
network methods that models SF and IC jointly
        <xref ref-type="bibr" rid="ref2 ref7">(Goo
et al., 2018; Chen et al., 2019)</xref>
        by sharing model
parameter among both tasks.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Augmentation (DA) Methods</title>
      <p>DA aims to perform semantically preserving
transformations on the training data D to produce
auxiliary data D0. The union of D and D0 is then
used to train a particular NLU model. For each
utterance in D, we produce N augmented
utterances by applying a specific augmentation
operation. We adopt a subset of existing augmentation
methods from Louvan and Magnini (2020), that
has shown promising results on English datasets.
We describe the augmentation operations in the
following sections.</p>
      <sec id="sec-3-1">
        <title>3.1 Slot Substitution (SLOT-SUB)</title>
        <p>SLOT-SUB (Figure 1 left) performs augmentation
by substituting a particular text span (slot-value
pair) in an utterance with a different text span that
is semantically consistent i.e., the slot label is the
same. For example, in the utterance “Quali film
animati stanno proiettando al cinema piu` vicino”,
one of the spans that can be substituted is the
slot value pair (piu` vicino, SPATIAL RELATION).
Then, we collect other spans in D in which the
slot values are different, but the slot label is the
same. For instance, we found the substitute
candidates SP 0 = f(“distanza a piedi”, SPATIAL
RELATION), (“lontano”, SPATIAL RELATION), (“nel
quartiere”, SPATIAL RELATION), . . . g, and then
we sample one span to replace the original span in
the utterance.
In order to produce sentence variations, we apply
the crop and rotate operations proposed in S¸ahin
and Steedman (2018), which manipulate the
sentence structure through its dependency parse tree.
The goal of CROP (Figure 1 middle) is to simplify
the sentence so that it focuses on a particular
fragment (e.g. subject/object) by removing other
fragments in the sentence. CROP uses the dependency
tree to identify the fragment and then remove it
and its children from the dependency tree.
Language #slot
#intent #train
#test #SLOT-SUB
#CROP</p>
        <p>#ROTATE</p>
        <p>
          The ROTATE (Figure 1 right) operation is
performed by moving a particular fragment
(including subject/object) around the root of the tree,
typically the verb in the sentence. For each operation,
all possible combinations are generated, and one
of them is picked randomly as the augmented
sentence. Both CROP and ROTATE rely on the
universal dependency labels
          <xref ref-type="bibr" rid="ref13">(Nivre et al., 2017)</xref>
          to
identify relevant fragments, such as NSUBJ (nominal
subject), DOBJ (direct object), OBJ (object), IOBJ
(indirect object).
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        Our primary goal is to verify the effectiveness
of data augmentation on Italian, Hindi, Turkish,
Spanish and Thai NLU datasets with limited
labeled data. To this end, we compare the
performance of a baseline NLU model trained on
the original training data (D) with a NLU model
that incorporates the augmented data as additional
training instances (D + D0). To simulate the
limited labeled data situation we randomly sample
10% of the training data for each dataset.
Baseline and Data Augmentation (DA)
Methods. We use the state of the art BERT-based joint
intent slot filling model
        <xref ref-type="bibr" rid="ref2">(Chen et al., 2019)</xref>
        as
the baseline model. We leverage the pre-trained
multilingual BERT (M-BERT), which is trained
on 104 languages. During training, M-BERT is
fine tuned on the slot filling and intent
classification tasks. Given a sentence representation
x = ([CLS] t1 t2 : : : tL), we use the hidden state
h[CLS] to predict the intent, and hti to predict the
slot label. As for DA methods, in addition to the
methods described in Section 2, we add one
configuration COMBINE, which combines the result
of SLOT-SUB and ROTATE, as ROTATE obtains
better results than CROP on the development set.
Settings. The model is trained with the
BertAdam optimizer for 30 epochs with early
stopping. The learning rate is set to 10 5 and
batch size is 16. All the hyperparameters are
listed in Appendix A. For SLOT-SUB the number
of augmentation per sentence N is tuned on the
development set. To produce the dependency
tree, we parse the sentence using Stanza
        <xref ref-type="bibr" rid="ref15">(Qi
et al., 2020)</xref>
        . For both CROP and ROTATE we
follow the default hyperparameters from S¸ahin
and Steedman (2018). We did not experiment
with Thai for CROP and ROTATE as Thai is not
supported by Stanza. The number of augmented
sentences (D0) for each method is listed in Table
1. For evaluation metric, we use the standard
CoNLL script to compute F1 score for slot filling
and accuracy for intent classification.
      </p>
      <p>
        Datasets. For the Italian language, we use the
data from Bellomaria et al. (2019), translated from
the English SNIPS dataset
        <xref ref-type="bibr" rid="ref3">(Coucke et al., 2018)</xref>
        .
SNIPS has been widely used for evaluating NLU
models and consists of utterances in multiple
domains. As for Hindi and Turkish, we use the ATIS
dataset from Upadhyay et al. (2018), derived from
Hemphill et al. (1990). ATIS is a well known
NLU dataset on flight domain. As for Spanish and
Thai we use the FB dataset from Schuster et al.
(2019) that contains utterances in alarm, weather,
and reminder domains. The overall statistics of the
datasets are shown in Table 1.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The overall results reported in Table 2 show that
applying DA improves performance on slot filling
and intent classification across all languages. In
particular, for SF, the SLOT-SUB method yields
the best result, while for IC, ROTATE obtains
better performance compared to CROP in most
cases. These results are consistent with the finding
from Louvan and Magnini (2020) on the English
dataset, where SLOT-SUB improves SF and CROP
or ROTATE improve IC. In general, ROTATE is
better than CROP for most cases on IC, and we think
this is because CROP may change the intent of the
original sentence. Intents typically depend on the
occurrence of specific slots, so when the cropped
part is a slot-value, it may change the sentence’s
overall semantics.</p>
      <p>We can see that languages with different
typological features (e.g. subject/verb/object
ordering)1 benefit from ROTATE operation for IC. This
result suggests that augmentation can produce
useful noise (regularization) for the model to
alleviate overfitting when labeled data is limited. When
we use COMBINE, it still helps the performance
of both SF and IC, although the improvements are
not as high as when only one of the augmentation
method is applied. The only language that gets
the benefits the most from COMBINE is Turkish.
We hypothesize that as Turkish has a more
flexible word order than the other languages it benefits
the most when ROTATE is performed.</p>
      <sec id="sec-5-1">
        <title>Performance on varying data size. To better</title>
        <p>understand the effectiveness of SLOT-SUB, we
perform further analysis on different training data
size (see Figure 2). Overall, we observe that as we
increase the training size, the benefit of SLOT-SUB
is decreasing for all datasets. For some datasets,
namely ATIS-HI and FB-ES, SLOT-SUB can cause
performance drop for larger data size, although it
is reasonably small (less than 1 F1 point). FB-TH
consistently benefits from SLOT-SUB even when
full training data is used. Until which training data
size the improvement is significant vary across
datasets2. For SNIPS-IT, improvement is clear for
all training data size and they are statistically
significant up until the training data size is 80%. For
ATIS-HI improvements are significant until data
size of 40%. As for FB datasets, improvements
are significant only until the training data size is
10%. Overall, we can see that SLOT-SUB is
effective for cases where data is scarce (5%, 10%),
while it is still relatively robust for larger data size
on all datasets.</p>
        <p>1Italian, Spanish, and Thai are SVO languages while
Hindi and Turkish are SOV languages.</p>
        <p>2For more details of the p-value of the statistical tests
please refer to Appendix B
Performance on different numbers of
augmentation per utterance (N ). We examine the
effect of a larger number of augmentations per
utterance (N ) to the model performance, specifically
for SF (see Figure 3). For FB-ES, similarly to the
results in Table 2, increasing N does not affect the
performance. For the other datasets, increasing
N brings performance improvement. For
ATISHI, SNIPS-IT, and FB-TH the trend is that, as
we increase N , performance goes up and plateau.
For ATIS-TR, changing N does not really affect
the gain of the performance as the performance
trend is quite steady across number of
augmentations. For most combinations of N in each dataset
(except FB-ES), the difference between the
performance of model that using SLOT-SUB and the
model that does not use SLOT-SUB is significant
3.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        Data augmentation methods that has been
proposed in NLP aims to automatically produce
additional training data through different kinds of
methods ranging from simple word substitution
        <xref ref-type="bibr" rid="ref1 ref18 ref2 ref22 ref24 ref25 ref26 ref4">(Wei and Zou, 2019)</xref>
        to more complex methods
that aims to produce semantically preserving
sentence generation
        <xref ref-type="bibr" rid="ref6 ref9">(Hou et al., 2018; Gao et al.,
2020)</xref>
        . In the context of slot filling and intent
classification, recent augmentation methods typically
apply deep learning models to produce augmented
utterances.
      </p>
      <p>
        Hou et al. (2018) proposes a two-stages
methods to produce the delexicalized utterances
generation and slot values realization. Their method
is based on a sequence to sequence based model
        <xref ref-type="bibr" rid="ref19">(Sutskever et al., 2014)</xref>
        to produce a paraphrase
of an utterance with its slot values placeholder
(delexicalized) for a given intent. For the slot
values lexicalization, they use the slot values in
the training data that occur in similar contexts.
Zhao et al. (2019) trains a sequence to sequence
model with training instances that consist of a pair
of atomic templates of dialogue acts and its
sentence realization. Yoo et al. (2019) proposes a
solution by extending Variational Auto Encoder
(VAE)
        <xref ref-type="bibr" rid="ref10 ref19">(Kingma and Welling, 2014)</xref>
        into a
Conditional VAE (CVAE) to generate synthetic
utterances. The CVAE controls the utterance
generation by conditioning on the intent and slot labels
3For more details of the p-value of the statistical tests
please refer to Appendix B
during model training. Recent work from
        <xref ref-type="bibr" rid="ref15">Peng et
al. (2020)</xref>
        make use of Transformer
        <xref ref-type="bibr" rid="ref23">(Vaswani et
al., 2017)</xref>
        based pre-trained NLG namely GPT-2
        <xref ref-type="bibr" rid="ref16">(Radford et al., 2019)</xref>
        , and fine-tune it to slot
filling dataset to produce synthetic utterances. We
consider these deep learning based approaches as
heavyweight as they often require several stages
in the augmentation process namely generating
augmentation candidates, ranking and filtering the
candidates before producing the final augmented
data. Consequently, the computation time of these
approaches is generally more expensive as
separate training is required to train the augmentation
and joint SF-IC models. Recent work from
Louvan and Magnini (2020) apply a set of lightweight
methods in which most of the augmentation
methods do not require model training. The
augmentation methods focus on varying the slot values
through substitution mechanisms and varying
sentence structure through dependency tree
manipulation. While the methods are relatively simple
it obtains competitive results with deep learning
based approaches on the standard English slot
filling benchmark datasets namely ATIS
        <xref ref-type="bibr" rid="ref8">(Hemphill
et al., 1990)</xref>
        , SNIPS
        <xref ref-type="bibr" rid="ref3">(Coucke et al., 2018)</xref>
        , and FB
        <xref ref-type="bibr" rid="ref18">(Schuster et al., 2019)</xref>
        datasets.
      </p>
      <p>Existing methods mostly evaluate their
approaches on English datasets, and little work has
been done on other languages. Our work focuses
on investigating the effect of data augmentation on
five non-English languages. We apply a subset of
lightweight augmentation methods from Louvan
and Magnini (2020) that do not require separate
model training to produce augmentation data.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We evaluate the effectiveness of data
augmentation for slot filling and intent classification tasks
in five typologically diverse languages. Our
results show that by applying simple augmentation,
namely slot values substitutions and dependency
tree manipulations, we can obtain substantial
improvement in most cases when only small amount
of training data is available. We also show that a
large pre-trained multilingual BERT benefits from
data augmentation.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank Valentina Bellomaria for providing the
Italian SNIPS dataset. We thank Clara Vania for
the feedback on the early draft of the paper.</p>
      <p>Hyperparameter
Learning rate
Dropout
Mini-batch size
Optimizer
Number of epoch
Early stopping
N
Max rotation
Max crop</p>
    </sec>
    <sec id="sec-9">
      <title>Appendix B. Statistical Significance</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valentina</given-names>
            <surname>Bellomaria</surname>
          </string-name>
          , Giuseppe Castellucci, Andrea Favalli, and
          <string-name>
            <given-names>Raniero</given-names>
            <surname>Romagnoli</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Almawaveslu: A new dataset for SLU in italian</article-title>
          . In Raffaella Bernardi, Roberto Navigli, and Giovanni Semeraro, editors,
          <source>Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          , Bari, Italy,
          <source>November 13-15</source>
          ,
          <year>2019</year>
          , volume
          <volume>2481</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Qian</given-names>
            <surname>Chen</surname>
          </string-name>
          , Zhu Zhuo, and
          <string-name>
            <given-names>Wen</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert for joint intent classification and slot filling</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .10909.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Alice</given-names>
            <surname>Coucke</surname>
          </string-name>
          , Alaa Saade, Adrien Ball, The´odore Bluche, Alexandre Caulier, David Leroy, Cle´ment Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mae¨l Primet, and
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Dureau</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Snips voice platform: an embedded spoken language understanding system for privateby-design voice interfaces</article-title>
          .
          <source>ArXiv</source>
          , abs/
          <year>1805</year>
          .10190.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Marzieh</given-names>
            <surname>Fadaee</surname>
          </string-name>
          , Arianna Bisazza, and
          <string-name>
            <given-names>Christof</given-names>
            <surname>Monz</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Data augmentation for low-resource neural machine translation</article-title>
          .
          <source>In Regina Barzilay and MinYen Kan</source>
          , editors,
          <source>Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2017</year>
          , Vancouver, Canada,
          <source>July 30 - August 4</source>
          , Volume
          <volume>2</volume>
          :
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          , pages
          <fpage>567</fpage>
          -
          <lpage>573</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Silin</given-names>
            <surname>Gao</surname>
          </string-name>
          , Yichi Zhang, Zhijian Ou, and
          <string-name>
            <given-names>Zhou</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Paraphrase augmented task-oriented dialog generation</article-title>
          . In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, editors,
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July</source>
          <volume>5</volume>
          -
          <issue>10</issue>
          ,
          <year>2020</year>
          , pages
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chih-Wen</surname>
            <given-names>Goo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Guang</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yun-Kai</surname>
            <given-names>Hsu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chih-Li</surname>
            <given-names>Huo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsung-Chieh</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keng-Wei</surname>
            <given-names>Hsu</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>YunNung</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Slot-gated modeling for joint slot filling and intent prediction</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <issue>Short Papers)</issue>
          , pages
          <fpage>753</fpage>
          -
          <lpage>757</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Charles</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hemphill</surname>
          </string-name>
          , John J. Godfrey, and
          <string-name>
            <surname>George</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Doddington</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>The ATIS spoken language systems pilot corpus</article-title>
          .
          <source>In Speech and Natural Language: Proceedings of a Workshop Held</source>
          at Hidden Valley, Pennsylvania, USA, June 24-27,
          <year>1990</year>
          . Morgan Kaufmann.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Yutai</given-names>
            <surname>Hou</surname>
          </string-name>
          , Yijia Liu, Wanxiang Che, and Ting Liu.
          <year>2018</year>
          .
          <article-title>Sequence-to-sequence data augmentation for dialogue language understanding</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1234</fpage>
          -
          <lpage>1245</lpage>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA,
          <year>August</year>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Diederik P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Max</given-names>
            <surname>Welling</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Autoencoding variational bayes</article-title>
          .
          <source>In Yoshua Bengio and Yann LeCun</source>
          , editors,
          <source>2nd International Conference on Learning Representations, ICLR</source>
          <year>2014</year>
          ,
          <article-title>Banff</article-title>
          ,
          <string-name>
            <surname>AB</surname>
          </string-name>
          , Canada,
          <source>April 14-16</source>
          ,
          <year>2014</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Varun</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Ashutosh Choudhary, and
          <string-name>
            <given-names>Eunah</given-names>
            <surname>Cho</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Data augmentation using pre-trained transformer models</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .02245.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Louvan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Magnini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Simple is better! lightweight data augmentation for low resource slot filling and intent classification</article-title>
          . arXiv preprint https://arxiv.org/abs/
          <year>2009</year>
          .03695.
          <source>PACLIC 2020 - The 34th Pacific Asia Conference on Language, Information and Computation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          , Zˇ eljko Agic´,
          <string-name>
            <surname>Lars</surname>
            <given-names>Ahrenberg</given-names>
          </string-name>
          , Lene Antonsen, Maria Jesus Aranzabe, Masayuki Asahara, Luma Ateyah, Mohammed Attia, Aitziber Atutxa,
          <string-name>
            <given-names>Liesbeth</given-names>
            <surname>Augustinus</surname>
          </string-name>
          , et al.
          <year>2017</year>
          .
          <article-title>Universal dependencies 2.1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Baolin</given-names>
            <surname>Peng</surname>
          </string-name>
          , Chenguang Zhu,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zeng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jianfeng</given-names>
            <surname>Gao</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Data augmentation for spoken language understanding via pretrained models</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>2004</year>
          .13952.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Peng</given-names>
            <surname>Qi</surname>
          </string-name>
          , Yuhao Zhang, Yuhui Zhang, Jason Bolton, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Stanza: A python natural language processing toolkit for many human languages</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>108</lpage>
          , Online, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Jeffrey Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>Go¨zde Gu¨l S¸ ahin</article-title>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steedman</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Data augmentation via dependency tree morphing for low-resource languages</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>5004</fpage>
          -
          <lpage>5009</lpage>
          , Brussels, Belgium, October-November.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Schuster</surname>
          </string-name>
          , Sonal Gupta, Rushin Shah, and
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Cross-lingual transfer learning for multilingual task oriented dialog</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>3795</fpage>
          -
          <lpage>3805</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , Oriol Vinyals, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Zoubin Ghahramani</source>
          , Max Welling, Corinna Cortes,
          <string-name>
            <given-names>Neil D.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          , and Kilian Q. Weinberger, editors,
          <source>Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems</source>
          <year>2014</year>
          , December 8-
          <issue>13</issue>
          <year>2014</year>
          , Montreal, Quebec, Canada, pages
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Gokhan</given-names>
            <surname>Tur and Renato De Mori</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Spoken language understanding: Systems for extracting semantic information from speech</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Shyam</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          , Manaal Faruqui, Gokhan Tu¨r, Hakkani-Tu¨r Dilek, and
          <string-name>
            <given-names>Larry</given-names>
            <surname>Heck</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>(almost) zero-shot cross-lingual spoken language understanding</article-title>
          .
          <source>In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , pages
          <fpage>6034</fpage>
          -
          <lpage>6038</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Clara</given-names>
            <surname>Vania</surname>
          </string-name>
          , Yova Kementchedjhieva, Anders Søgaard, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Lopez</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages</article-title>
          .
          <source>In Kentaro Inui</source>
          , Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          , pages
          <fpage>1105</fpage>
          -
          <lpage>1116</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, and
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Isabelle Guyon, Ulrike von Luxburg</source>
          , Samy Bengio,
          <string-name>
            <surname>Hanna M. Wallach</surname>
            , Rob Fergus,
            <given-names>S. V. N.</given-names>
          </string-name>
          <string-name>
            <surname>Vishwanathan</surname>
          </string-name>
          , and Roman Garnett, editors,
          <source>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems</source>
          <year>2017</year>
          ,
          <fpage>4</fpage>
          -9
          <source>December</source>
          <year>2017</year>
          , Long Beach, CA, USA, pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Jason W.</given-names>
            <surname>Wei</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kai</given-names>
            <surname>Zou</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>EDA: easy data augmentation techniques for boosting performance on text classification tasks</article-title>
          . In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          , pages
          <fpage>6381</fpage>
          -
          <lpage>6387</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Kang</given-names>
            <surname>Min</surname>
          </string-name>
          <string-name>
            <surname>Yoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Youhyun</given-names>
            <surname>Shin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Sang-goo Lee</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Data augmentation for spoken language understanding via joint variational generation</article-title>
          .
          <source>In The Thirty-Third AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2019</year>
          ,
          <source>The Thirty-First Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <string-name>
            <surname>IAAI</surname>
          </string-name>
          <year>2019</year>
          ,
          <source>The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2019</year>
          , Honolulu, Hawaii, USA, January 27 - February 1,
          <year>2019</year>
          , pages
          <fpage>7402</fpage>
          -
          <lpage>7409</lpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Zijian</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Su</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Kai</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Data augmentation with atomic templates for spoken language understanding</article-title>
          .
          <source>In Kentaro Inui</source>
          , Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          , pages
          <fpage>3635</fpage>
          -
          <lpage>3641</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>