Towards Data Augmentation for DRS-to-Text Generation
Muhammad Saad Amin 1, Alessandro Mazzei 1and Luca Anselma 1
1
    University of Turin, Corso Svizzera 185, Turin, 10149, Italy


                                  Abstract
                                  The data augmentation approach is becoming very popular in Natural Language Generation
                                  (NLG). Different approaches have been utilized in NLP and NLG to augment data and increase
                                  training examples for the neural model. Yet no studies have performed augmentation on logical
                                  input i.e., Discourse Representation Structures (DRS). We present data augmentation in DRS
                                  i.e., DRS taken from the PMB corpus, for the DRS-to-Text generation task. We conducted our
                                  experiments on a standard bi-LSTM-based sequence-to-sequence model thus creating an end-
                                  to-end neural approach for generating English sentences from DRS. We evaluated the output
                                  generated from word-level and character-level decoders with the help of reference-based
                                  evaluation metrics like BLEU, ROUGE, METEOR, NIST, and CIDEr. The practical
                                  implementation of augmented DRS succeeded in achieving better results compared to DRS
                                  without augmentation. To prove the significance of our model, we conducted statistical
                                  significance tests i.e., the Shapiro-Wilk Test (to check data normality) and the Wilcoxon Test
                                  (to test model significance). Wilcoxon results states that our model is significantly better with
                                  the p-value = 2.37e-05 for Char-level model and p-value = 7.78e-07 for Word-level model.

                                  Keywords 1
                                  Bi-LSTM, Data Augmentation, DRS-to-Text Generation, Neural Network, Parallel Meaning
                                  Bank (PMB), Statistical Significance Test, Shapiro-Wilk Test, Wilcoxon Test

1. Introduction

    Data augmentation is an approach utilized to increase the number of examples for training a neural
model without explicitly adding new data examples [1]. This approach is becoming very trendy in many
NLP and NLG applications nowadays. This is due to the complex nature of tasks being addressed.
Previously, most of the researchers working in the Computer Vision (CV) domain use different
augmentation techniques i.e., cropping, flipping, color jittering, rotating, etc. [2]. This CV augmentation
approach is very applicable to increase the number of examples as rotated, flipped or cropped versions
of an image are also an image. But augmentation approach for NLP and NLG is not so easy to implement
due to the discrete nature of sentences [3]. That means, if our sentence augmentation is not good, it will
result in ungrammatical sentences and thus result in the bad performance of the model.
    Discourse Representation Structure (DRS) is derived from Discourse Representation Theory (DRT)
that is the formal representation of data as first order logic. Initial works in formal meaning
representation focused on the generation of DRS from text, an approach referred to as parsing [4]. This
work was directed toward mapping of words with their relevant logical representation and formulation.
But very few works have been implemented in translation i.e., generating sentences from Discourse
Representation Structures (DRS). Recently, different authors have implemented a bi-LSTM-based
neural sequence-to-sequence model to generate sentences from DRS [5]. But till now to our knowledge,
no work has been done to augment DRS i.e., formal logical representation and translation of the logical
representation. Keeping in mind this research gap, we worked on DRS augmentation to check whether
this approach will help in improving model performance as increased metrics scores.

1
 NL4AI 2022: Sixth Workshop on Natural Language for Artificial Intelligence, November 30-11, 2022, Udine, Italy [33]
EMAIL: muhammadsaad.amin@unito.it (A. 1); alessandro.mazzei@unito.it (A. 2); luca.anselma@unito.it (A. 3)
ORCID: 0000-0002-7002-9373 (A. 1); 0000-0003-3072-0108 (A. 2); 0000-0003-2292-6480 (A. 3)
                               © 2022 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)
The research questions that we addressed in these experiments are listed as follows:
   1. Is it possible to augment Formal Meaning Representation based on logical inputs i.e., DRS?
   2. How augmentation can be performed in DRS and the translation of DRS as both belong to two
   different directions?
   3. Does augmentation in DRS result in increased model performance?
   4. How to statistically justify the results with the help of Significance Tests?
   So, in a nutshell, we can say that our main contribution is twofold. First, we have developed a way
of augmenting logical inputs (DRS) and their respective translations. The initial format of DRS is the
Box Format, and this version of DRS cannot be embedded into the neural network directly. To make
DRS an input for the neural network we must flatten the Box format of DRS into Clausal format and
then Clausal format is preprocessed into Absolute DRS format to be fed into a Neural Network (NN).
Getting corpus data from PMB, we performed an augmentation approach on the Clausal format of DRS
so that it can be preprocessed and passed to the neural model. A graphical depiction of the Box and
Clausal format of DRS along with the translation is shown in Figure 1 below.


Figure 1: Box format of DRS (left-side) is flattened and converted into Clausal format of DRS (right-
side) [5].

    Both formats of DRS have the same meaning but to augment and embed DRS into NN, we must
transform from Box format into Clausal format. So, we argued that the NN trained with augmented data
produces better results. Secondly, we have applied statistical significance tests on the DRS-to-Text
generation task to verify that better results are not achieved accidentally. For the implementation of
statistical significance tests, the choice of the right test is another problem. Among a series of parametric
and non-parametric tests, the choice of the right significance test is a tricky move. A detailed description
of both contributions will be discussed in the latter sections.
    The remaining paper is structured as follows: literature insights are described in Section 2. Section
3 describes the data and the approach used to augment logical input and respective translation of DRS.
The methodology implemented to conduct the experiment is discussed in Section 4. Results are
discussed in Section 5, and the conclusion and future work are described in Section 6.

2. Literature Insights
    Literature insights into data augmentation in Natural Language Processing (NLP) and Generation
(NLG) clearly state that this domain is still underexplored [6]. Many researchers in NLP have used
different approaches to augment the data examples. Based on the text processing challenges, different
Rule-based and Model-based approaches have been proposed by researchers in this domain [7].
Comparing the approaches, there exist some pros and cons of augmentation. Rule-based techniques are
easily implementable but sometimes create more diverse data which is not required for data
augmentation [8]. The data which is neither too similar nor too different from the original examples are
considered good augmented data. Because similar or too different data moves towards overfitting of the
model. Similarly, model-based approaches are considered good for augmentation, but it is very difficult
to develop and utilize model-based augmentation approaches for increasing data every time [9].
    Considering Rule-based techniques, different researchers proposed different approaches based on
the nature of the task being executed. Feature Space Data Augmentation [10], Easy Data Augmentation
based on random insertion, deletion, and swap [11], Paraphrase Identification [12], and Dependency
Tree Morphing [13] are some of the rule-based approaches implemented in the literature. Similarly,
MixUp (also referred as Mixed Sample Data Augmentation Technique, MSDA) [14], CutMix [15],
CutOut [16], Copy-Paste [17], and Seq2MixUp [18] approaches are derived from In-interpolation-
based techniques. Different Model-based techniques include BackTranslation [19], SCPN [20],
Semantic Text Exchange (STE) [21], ContextualAux [22], Lambada [23], XLDA [24], SeqMix [25], Slot-
Sub-LM [26], UBT & TBT [27], Soft Con-textual DA [28], Data Diversification [29], DiPS [30], and
Augmented SBERT [31].
    In our implementation, we have used a Rule-based approach to augment the data. We defined a rule
of verb change with the help of SpaCy NLP pipeline to transform the data in present, past, and future
tenses. Basically, in the DRS-to-Text generation system we have two formats as input to the Neural
Network i.e., DRS and its respective translation as shown in fig. 1. Keeping in mind the aspect and
nature of data used in our experimental implementation, we have to augment DRS and also the
translation of the DRS. The nature of both types of data is totally different i.e., one is a logical input
(DRS) and the other on is a linear text i.e., translation of DRS. By using a Rule-based approach, we
successfully augment the DRS and the translation of DRS to increase the number of relevant examples,
thus achieving higher results.

3. Data and Augmentation Approach

   Originally, DRS is presented in Box format as it is easy to understand and analyze the structure. Box
representation has unique labels i.e., b1, b2, b3… Each box has 2 layers stated as top-layer and the
bottom layer. The top layer of DRS contains Discourse Referents i.e., x1, t1, and the bottom layer of
DRS contains conditions over these Discourse Referents. Each referent or condition belongs to a unique
box label. For example, b2 person.n.01 x1 contains three types of information i.e., b2 as box label, x1 as
discourse referent, and person.n.01 as a predicate that is disambiguated with senses (senses are provided
in wordnet, synsets) e.g., person.n.01, time.n.08.
   The box format of DRS is not convenient for modeling purposes; therefore, we convert the Box
format into the clausal format. The clausal format or the absolute format is easily readable by the neural
network. In clausal format, the variables and the conditions of the box format are converted into clauses.
For example, top box layer variables are converted into clauses by a special condition called “REF” i.e.,
b2 REF x1 which states that discourse variable x1 is bound in box b2.


Figure 2: Graphical representation of data augmentation in DRS. On left there is original example of
DRS with respective translation which is transformed into present, past, and future tense in both DRS
and translation version.
   DRS is also referred as the logical representation of components like semantic relations (Agent,
Patient, Theme), operators (REF, NOT), the concepts (touch.v.01), variable indices (b1, x1), and deictic
constants (now, speaker, hearer). By altering the values of these components, one can augment the
DRS. There are multiple ways of augmenting a DRS based on tense-change, polarity-change, name-
change, quantity-change, and by changing numbers. Among all these possible formats of DRS
augmentation, we worked on tense-change approach. In tense-change, the tense of original DRS is
converted into the present, past, and future tense as shown in Figure 2.
   Tense-change augmentation is also referred to as a verb-based (word that describes the action in the
sentence) augmentation approach because we are transforming verbs i.e., present à past and future,
past à present and future, and future à present and past. By default, the tense change variants are
taken as a present, past, and future indefinite tenses.

    3.1.         Left side: DRS Augmentation

    DRS is a logical combination of events, and entities, and the relationships between these entities.
Certain semantic phenomena are also covered in DRS including pronouns, presuppositions,
quantification, negation, discourse relations, etc. Among different variants of DRS available on The
Parallel Meaning Bank (PMB) corpus, we have used fully interpretable version of DRS. The reason
behind this choice is the representation of information in DRS. In this version of DRS, we have WordNet
synset-based verbs, adverbs, nouns, and adjectives. And Verbnet based semantic relations.
    For augmenting DRS, we worked on a verb-based augmentation approach. To change the relation
between entities of DRS, we adopted a simple string-replacement approach to replace one string with
another string as shown in Fig.2. While iterating through each DRS, we first identified the time in which
a verb is presented e.g., EQU t1 “now”, TPR t1 “now”, and TPR “now” t1. These three formats
represent verbs in any format of the present, past, or future tense. After, the identification of DRS in
one format, we performed string replacement to convert a verb happening only in one type of tense into
multiple types of different tenses e.g., have not à does not, did not, will not etc. This is how to augment
the DRS which is the logical section of our input data. But during the augmentation of DRS, we kept
track of the relevant translations of respective DRS as well. But just like DRS, augmentation of its
translation is not just a string replacement approach. For the augmentation of linear text into different
sentences, we used a Rule-based approach to convert sentences discussed in section 3.2 below.

    3.2.         Right side: Text Augmentation
    Text augmentation as tense change is a very challenging task in NLP. For our implementation, we
have used SpaCy pipeline to transform English sentences from one type of tense into another type based
on the transformation performed in DRS. For implementation, we used SQLite database to keep track
of the sentences with a max length of 1000 characters. We applied this pipeline to process the initial
sentence and worked on sentence patterns to learn the structure of the sentence (conjugates, singular,
plural, past, present, and future).
    In tense transformation e.g., tense change, there are also other factors that must be kept in mind
while reconstructing the sentence. Some major points of consideration include active and passive,
imperative, negation, singular and plural, subject and object, nouns, progressive and perfect, infinitive,
first person, ambiguous, POS, and perfect participles sentences. We have not worked only on simple
and positive sentences but based on the translation of DRS, we have to deal with all types of tenses
mentioned above. Table 1 elaborates on the examples associated with each type of tense form to identify
the complexity of the task addressed.
    If a sentence is presented as present perfect, present perfect continuous, or present continuous than
it is converted into present indefinite as the default mode of tense change is the indefinite mode. The
same strategy is also applied to other types of continuous, perfect and perfect continuous forms of past
and future sentences.
Table 1
All cases of tense change encountered in our implementation
        Conversion Type               Original Sentence                       Converted Sentence
                                                                                     I caught you
   Present to Past & Future                  I catch you
                                                                                  I will catch you
                                                                                 He cheats on me
   Past to Present & Future              He cheated on me
                                                                              He will cheat on me
                                                                                        I love you
   Future to Present & Past                I will love you
                                                                                      I loved you
                                               I said no                                  I say no
          First person
                                              He said no                              He says no
         Infinitive                         I love to love                      I will love to love
       Ambiguous-POS                      It was a thought                     It will be a thought
                                           The rabbits ran                        The rabbits run
             Plural
                                           The rabbit ran                         The rabbit runs
     Third person singular                   It will work                                It works
                                                                            The will said otherwise
      Taking will as noun             The will says otherwise
                                                                          The will will say otherwise
                                                                             He walks to the store
         Perfect tense              He had walked to the store
                                                                           He will walk to the store
                                                                            I am going to the store
       Continuous tense               I was going to the store
                                                                          I will be going to the store
     Double tense change         I win because I have five cookies     I won because I had five cookies
                                                                                      I do not go
           Negation                         I did not go
                                                                                     I will not go
                                                                                        I am alive
        Future perfect                 I will have been alive                          I was alive
                                                                                    I will be alive
        Passive tenses                       I am filled                           I will be filled


4. Experimental Implementation

   For the implementation of the experiment, a series of experimental steps are executed to perform the
task under observation. For implementing augmentation in DRS-to-Text generation, we performed Rule-
based and string replacement based on operations on DRS data. After performing data augmentation,
we must put the augmented data into a bi-LSTM-based neural network to analyze the performance of
our approach. For Neural Machine Translation (NMT) tasks, LSTM has been considered as the best
model due to its ability to remember the connection between long-term input sequences [4]. Depending
on literature-based suggestions, we also used bi-LSTM-based sequence-to-sequence model to translate
DRS into English sentences.
   DRS-to-Text is a particular logic to language generation task where input is the first-order logic and
output is the corresponding linear text. This is not a generalized text generation task from graphs, tables,
or images. Therefore, we must use a sequence-to-sequence model capable of remembering long
sequences, and bi-LSTM is proven successful in remembering long logical input sequences [5].
Different pre-trained language models like BERT, ELMo, and ROBERTa have been used previously
for parsing e.g., Text-to-AMR and Text-to-DRS. Still, for translation and generation, most of the
researchers have focused only on bi-LSTM-based architectures [4]. Dealing with a very specific task,
we have not tried other Transformer-based i.e., BERT, GPT, and BART architectures for logic-to-
language implementation. But this can be a very interesting future direction to explore further
architectures that can beat bi-LSTM for logic to language-based text generation task.
   Neural Architecture. For the implementation of the experiment, we have used the encoder-decoder
architecture of the NMT module. Bi-directional LSTM operates input sequences in both directions. The
encoder part of the model encodes DRS representation, and the decoder module decodes DRS into its
respective English sentences. To conduct this experiment, we have used GPUs with CUDA based
parallel computing platform to speed up the experimental performance. The hyperparameter setting for
our experiment is shown in Table 2 mentioning the parameters and their corresponding values.

Table 2
Hyperparameters of neural architecture for this experiment
                          Parameters                          Values
                Dimensions Embedding & RNN                      300
                          Enc/Dec Cell                         LSTM
                        Enc/Dec Depth                             2
                           Mini-batch                            48
                      Normalization Rate                        0.9
                            lr-decay                            0.5
                       lr-decay-strategy                       Epoch
                           Optimizer                           Adam
                       Validation Metric                   Cross-Entropy
                           Cost-Type                         ce-mean
                           Beam Size                             10
                         Learning Rate                         0.002

   Dataset. We have used the English version of the Parallel Meaning Bank (PMB) 3.0.0 dataset for
our experiment, having gold standard (fully annotated corpus) 6620, 885, and 898 training, validation,
and testing examples. Based on the nature of our implementation, we have used Gold-PMB dataset in
both formats i.e., with augmentation and without augmentation, to check the increase in the evaluation
scores. Then we expanded the training examples by adding Silver-PMB (partially manually annotated
data) 97,598 training examples with Gold-PMB training examples. Collectively, to train our model
without data augmentation, we have 104,218 training, 885 validation, and 898 testing examples. In the
second experiment i.e., DRS-to-Text generation with augmentation, we only performed data
augmentation on training examples. We did not augment, validation, or testing examples of the dataset.
After train augmentation, we were having 26,480 training examples in the case of augmentation in
Gold-PMB, and 4,16,872 training examples in the case of augmentation in Gold-Silver-PMB. Validation
and testing files of PMB data are not augmented in our experiment. We also added only training
examples of Silver-PMB with Gold-PMB to increase the number of training examples for our neural
model. All dataset examples with and without augmentation are mentioned in Table 3 below.

Table 3
Dataset training, validation, and testing examples with and without data augmentation
             Without Augmentation                                 With Augmentation
         Training (Gold-PMB)                6620         Training (Gold-PMB)          26480
      Training (Gold+Silver-PMB)           104218     Training (Gold+Silver-PMB)      416872
               Validation                    885              Validation               885
                Testing                      898                Testing                898

   Implementation Pipeline. The implementation pipeline includes all the steps involved in English
text generation from DRS. Our main focus of this experiment is to perform data augmentation in DRS
and analyze the accuracy improvement. So, we choose the clausal format of augmented DRS and
preprocess it to make meaningful entities as atomic entities. This representation of DRS is meaningful
for a neural network to understand the input pattern and perform well. The complete implementation
pipeline is shown in Figure 3 below.
Figure 3: Complete pipeline of DRS to text generation. Encoder part encodes DRS to its respective
vectorized form and then vectorized form is converted into English sentence with the help of decoder.

    The encoder part of bi-LSTM encodes the DRS and converts it into vector form. This vector form is
then embedded into the decoder part to be converted into respective English sentences. The neural
model-generated English sentences are then compared with the reference English sentences to calculate
the evaluation scores. For the evaluation of generated sentences, we are using 5 different automatic
evaluation metrics like BLEU, ROUGE, NIST, METEOR, and CIDEr to check the syntax, semantics,
relevance, and grammatical structure of the generated text. We have compared our results with state-
of-the-art DRS-to-Text results of authors in [5] and proved that augmentation is helpful in getting better
results as compared to results generated without augmentation.

5. Results
    Results are the outcomes received after the implementation of the proposed methodology. Here we
discuss our findings and try to prove the research questions addressed previously. In the implementation
of DRS-to-Text generation, we conducted two experiments based on the types of PMB datasets. Our
first experiment is also referred to as the baseline experiment conducted on the Gold-PMB dataset. We
performed two different experiments on the gold dataset i.e., an experiment without augmentation on
the PMB-Gold dataset, and an experiment with augmentation on the PMB-Gold dataset. We analyzed
character-level and word-level results of the model and achieved high evaluation scores in all formats
of evaluation metrics. Baseline results are mentioned in Table 4 with all descriptions of the dataset and
evaluation metrics.

Table 4
Comparison of evaluation scores with and without augmentation
       Dataset Type         Result Type        BLEU     NIST            METEOR      ROUGE_L      CIDEr
    Gold-PMB (Without        Char Level       47.72     7.68             39.42       72.59       4.84
      augmentation)          Word Level       32.91     5.80             29.99       61.39       3.49
        Gold-PMB             Char Level       52.30     7.94             41.53       74.63       5.09
  (With augmentation)        Word Level       41.89     6.84             35.79       68.37       4.25
     Gold-Silver-PMB         Char Level       69.30      ---             51.80       84.90         ---
       (Wang et al.)         Word Level       64.70      ---             47.80       81.10         ---
     Gold-Silver-PMB         Char Level       70.18     9.44             52.20       85.74       6.85
   (Without augmenta-        Word Level       64.11     8.93             47.59       81.31       6.11
           tion)
     Gold-Silver-PMB         Char Level       72.38    10.49             53.18        86.40       7.01
  (With augmentation)        Word Level       65.58     9.37             47.83        82.26       6.25

    Our second experiment is based on certain findings: first, if we add training examples of Silver-PMB
data (not fully manually annotated corpus) with Gold-PMB data (fully annotated corpus), will it also
go for an increase in evaluation scores? Secondly, can we achieve higher evaluation scores as compared
to the Gold-PMB augmentation? Finally, we also must compare our augmentation-based results with
literature models. So, to prove our hypothesis, we augmented the Gold and Silver PMB training
examples and conducted the experiment. We succeeded in achieving high evaluation scores of all
metrics but this time the score was not as high as we achieved in the Gold-PMB experiment. This is
possibly due to the addition of certain DRS examples which were not fully manually annotated by the
experts. A noise in SILVER-PMB data propagated through all the variants of dataset with and without
augmentation. This causes into less increase in evaluation scores. Just like the augmentation results of
Gold-PMB, we also analyzed character-level and word-level results of the neural model. We also
compared the results with the literature and our implementation of the model with and without
augmentation. All results are mentioned in Table 4.
    The table reflects the successful implementation of our proposed hypothesis. In the literature, to the
best of our knowledge, there is no implementation of augmentation in DRS but there are other
implementations of DRS for language translations. To strengthen our hypothesis, we conducted a
baseline experiment on a fully manually annotated gold corpus. Our baseline experiment strengthens
our claim and then we further embedded Silver data into Gold and performed augmentation tasks. The
first 2 experimental findings are of baseline experiments with and without augmentation. It is clearly
shown in a bold format that we achieved efficient results for the augmented version of the DRS-to-Text
implementation. The remaining 3 experiments are listed as the literature-based implementation of the
author in 3rd row of Table 4. The 4th and 5th rows are our implementations on the gold and silver
datasets with and without augmentation. And the 5th row (in bold) also highlights our augmentation-
based results as the high scorer in its regard.
    Statistical Significance Tests. To prove our model’s achievement statistically, we conducted certain
statistical significance tests as well [32]. Significance tests are becoming a new trend in the NLG domain
nowadays. Significance tests are applied when two different models are applied to the same data, or the
same model is applied to two different datasets. In our case, we applied the same bi-LSTM-based
sequence-to-sequence model on two different data samples i.e., dataset without augmentation and
dataset with augmentation. The purpose of doing these tests is to verify that the good results of one
model are not achieved accidentally. Therefore, among a series of parametric and non-parametric tests,
we choose the right test for our experiment based on two findings. First, we determined whether our
data is normally distributed or not.
    To check the normality of the data, we conducted Shapiro-Wilk Test. We choose this test because it
is highly effective as compared to other tests used to check data normality. In our case, our data were
not normally distributed and therefore we have to move towards non-parametric tests. If our data was
normally distributed, then only a t-test would be enough to check model significance [32]. Among a list
of non-parametric tests, we choose Wilcoxon Test due to two reasons. First, we choose the Wilcoxon
test because it is highly suitable for the data which is coming from automatic evaluation metrics e.g.,
BLEU, ROUGE, METEOR, etc. Secondly, we choose this because it has the highest statistical
significance as compared to other non-parametric tests working on scores coming from automatic
evaluation metrics.
    For the implementation of significance tests, we calculated the sentence-wise score of BLEU for
model-generated test data and Gold reference data having approximately 1K examples. We conducted
character level and word level significance tests and found that our augmentation models are
significantly better with p-value = 2.37e-05 for the Char-level model and p-value = 7.78e-07 for the
Word-level model.

6. Conclusion and Future Work

    Data augmentation is a very challenging task in NLP and NLG. The main goal of augmentation is
to increase training examples for the neural model without explicitly adding new data for training. In
this contrast, we have implemented a data augmentation approach in DRS for text generation tasks. We
conducted two experiments on PMB gold and gold-silver datasets. We achieved high evaluation scores
of BLEU, ROUGE, METEOR, NIST, and CIDEr in the case of a model trained on augmented data.
Furthermore, we conducted statistical significance tests to prove model performance on both character-
level and word-level translations. We found that our augmentation models are significantly better with
p-value = 2.37e-05 for Char-level model and p-value = 7.78e-07 for Word-level model.
   In future, we will extend this experiment by applying other data augmentation approaches on logical
forms (DRS) with respect to polarity change, number change, quantity change, and name change in the
same DRS. We are also focusing on applying augmentation on low-resource languages like ITALIAN,
FRENCH, and DUTCH.

7. References

[1] Yutai Hou, Yijia Liu, Wanxiang Che, and Ting Liu. 2018. Sequence-to-sequence data
     augmentation for dialogue language understanding. In Proceedings of the 27th International
     Conference on Computational Linguistics, pages 1234–1245, Santa Fe, New Mexico, USA.
     Association for Computational Linguistics.
[2] Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on Image Data Augmentation for
     Deep Learning. Journal of Big Data, 6(1):60.
[3] Ruibo Liu, Guangxuan Xu, Chenyan Jia, Weicheng Ma, Lili Wang, and Soroush Vosoughi. 2020b.
     Data boost: Text data augmentation through reinforcement learning guided conditional generation.
     In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
     (EMNLP), pages 9031–9041, Online. Association for Computational Linguistics.
[4] Rik van Noord, Lasha Abzianidze, Antonio Toral, and Johan Bos. 2018b. Exploring neural
     methods for parsing discourse representation structures. Transactions of the Association for
     Computational Linguistics, 6:619–633.
[5] Wang, C., van Noord, R., Bisazza, A., & Bos, J. (2021). Evaluating Text Generation from
     Discourse Representation Structures. In A. Bosselut, E. Durmus, V. Prashant Gangal, S.
     Gehrmann, Y. Jernite, L. Perez-Beltrachini, S. Shaikh, & W. Xu (Eds.), Proceedings of the 1st
     Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) (pp. 73-83).
     Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.gem-1.8.
[6] Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance
     on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in
     Natural Language Processing and the 9th International Joint Conference on Natural Language
     Processing (EMNLP-IJCNLP), pages 6382–6388, Hong Kong, China. Association for
     Computational Linguistics.
[7] Feng, S. Y., Gangal, V., Wei, J., Chandar, S., Vosoughi, S., Mitamura, T., & Hovy, E. (2021). A
     survey of data augmentation approaches for NLP. arXiv preprint arXiv:2105.03075.
[8] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-Level Convolutional Networks for
     Text Classification. In Proceedings of the 28th International Conference on Neural Information
     Processing Systems - Volume 1, NIPS’15, page 649–657, Cambridge, MA, USA. MIT Press.
[9] Ashutosh Kumar, Satwik Bhattamishra, Manik Bhandari, and Partha Talukdar. 2019a. Submodular
     optimization-based diverse paraphrasing and its effectiveness in data augmentation. In Proceedings
     of the 2019 Conference of the North American Chapter of the Association for Computational
     Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3609–
     3619, Minneapolis, Minnesota. Association for Computational Linguistics.
[10] Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar,
     Rogerio Feris, Raja Giryes, and Alex M Bronstein. 2018. δencoder: an effective sample synthesis
     method for few-shot object recognition. In Proceedings of the 32nd International Conference on
     Neural Information Processing Systems, pages 2850–2860.
[11] Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance
     on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in
     Natural Language Processing and the 9th International Joint Conference on Natural Language
     Processing (EMNLP-IJCNLP), pages 6382–6388, Hong Kong, China. Association for
     Computational Linguistics.
[12] Hannah Chen, Yangfeng Ji, and David Evans. 2020b. Finding friends and flipping frenemies:
     Automatic paraphrase dataset augmentation using graph theory. In Findings of the Association for
     Computational Linguistics: EMNLP 2020, pages 4741–4751, Online. Association for
     Computational Linguistics.
[13] Gözde Gül ¸Sahin and Mark Steedman. 2018. Data augmentation via dependency tree morphing
     for lowresource languages. In Proceedings of the 2018 Conference on Empirical Methods in
     Natural Language Processing, pages 5004–5009, Brussels, Belgium. Association for
     Computational Linguistics.
[14] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond
     empirical risk minimization. Proceedings of ICLR.
[15] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon
     Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In
     Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023–6032.
[16] Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural
     networks with cutout. arXiv preprint.
[17] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le,
     and Barret Zoph. 2020. Simple copy-paste is a strong data augmentation method for instance
     segmentation. arXiv preprint.
[18] Demi Guo, Yoon Kim, and Alexander Rush. 2020. Sequence-level mixed sample data
     augmentation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language
     Processing (EMNLP), pages 5547–5552, Online. Association for Computational Linguistics.
[19] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation
     Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for
     Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association
     for Computational Linguistics.
[20] John Wieting and Kevin Gimpel. 2017. Revisiting Recurrent Networks for Paraphrastic Sentence
     Embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational
     Linguistics (Volume 1: Long Papers), pages 2078–2088, Vancouver, Canada. Association for
     Computational Linguistics.
[21] Steven Y. Feng, Aaron W. Li, and Jesse Hoey. 2019. Keep calm and switch on! Preserving
     sentiment and fluency in semantic text exchange. In Proceedings of the 2019 Conference on
     Empirical Methods in Natural Language Processing and the 9th International Joint Conference on
     Natural Language Processing (EMNLP-IJCNLP), pages 2701–2711, Hong Kong, China.
     Association for Computational Linguistics.
[22] Sosuke Kobayashi. 2018. Contextual augmentation: Data augmentation by words with
     paradigmatic relations. In Proceedings of the 2018 Conference of the North American Chapter of
     the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short
     Papers), pages 452–457, New Orleans, Louisiana. Association for Computational Linguistics.
[23] Ateret Anaby-Tavor, Boaz Carmeli, Esther Goldbraich, Amir Kantor, George Kour, Segev
     Shlomov, Naama Tepper, and Naama Zwerdling. 2020. Do not have enough data? Deep learning
     to the rescue! In Proceedings of AAAI, pages 7383–7390.
[24] Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2019.
     Xlda: Cross-lingual data augmentation for natural language inference and question answering.
     arXiv preprint arXiv:1905.11471.
[25] Rongzhi Zhang, Yue Yu, and Chao Zhang. 2020. SeqMix: Augmenting Active Sequence Labeling
     via Sequence Mixup. In Proceedings of the 2020 Conference on Empirical Methods in Natural
     Language Processing (EMNLP), pages 8566–8579, Online. Association for Computational
     Linguistics.
[26] Samuel Louvan and Bernardo Magnini. 2020. Simple is better! lightweight data augmentation for
     low resource slot filling and intent classification. In Proceedings of the 34th Pacific Asia
     Conference on Language, Information and Computation, pages 167– 177, Hanoi, Vietnam.
     Association for Computational Linguistics.
[27] Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving Robustness
     of Machine Translation with Synthetic Noise. In Proceedings of the 2019 Conference of the North
     American Chapter of the Association for Computational Linguistics: Human Language
     Technologies, Volume 1 (Long and Short Papers), pages 1916–1920, Minneapolis, Minnesota.
     Association for Computational Linguistics.
[28] Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, and Tie-Yan
     Liu. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of the
     57th Annual Meeting of the Association for Computational Linguistics, pages 5539–5544,
     Florence, Italy. Association for Computational Linguistics.
[29] Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, and Ai Ti Aw. 2020. Data diversification: A simple
     strategy for neural machine translation. In Advances in Neural Information Processing Systems,
     volume 33, pages 10018–10029. Curran Associates, Inc.
[30] Ashutosh Kumar, Satwik Bhattamishra, Manik Bhandari, and Partha Talukdar. 2019a. Submodular
     optimization-based diverse paraphrasing and its effectiveness in data augmentation. In Proceedings
     of the 2019 Conference of the North American Chapter of the Association for Computational
     Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3609–
     3619, Minneapolis, Minnesota. Association for Computational Linguistics.
[31] Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. 2021. Augmented
     sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks.
     Proceedings of NAACL.
[32] Dror, R., Baumer, G., Shlomov, S., & Reichart, R. (2018, July). The hitchhiker’s guide to testing
     statistical significance in natural language processing. In Proceedings of the 56th Annual Meeting
     of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1383-1392).
[33] D. Nozza, L. Passaro, M. Polignano, Preface to the Sixth Workshop on Natural Language for
     Artificial Intelligence (NL4AI), in: D. Nozza, L. C. Passaro, M. Polignano (Eds.), Proceedings of
     the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI 2022) co-located with
     21th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2022),
     November 30, 2022, CEUR-WS.org, 2022.