<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Augmentation for Low-Resource Italian NLP: Enhancing Semantic Processing with DRS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Saad Amin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Anselma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Mazzei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Discourse Representation Structure (DRS), a formal meaning representation, has shown promising results in semantic parsing and natural language generation tasks for high-resource languages like English. This paper investigates enhancing the application of DRS to low-resource Italian Natural Language Processing (NLP), in both semantic parsing (Text-to-DRS) and natural language generation (DRS-to-Text). To address the scarcity of annotated corpora for Italian DRS, we propose a novel data augmentation technique that involves the use of external linguistic resources including: (i) WordNet for common nouns, adjectives, adverbs, and verbs; (ii) LLM-generated named entities for proper nouns; and (iii) rule-based algorithms for tense augmentation. This approach not only increases the quantity of training data but also introduces linguistic diversity, which is crucial for improving model performance and robustness. Using this augmented dataset, we developed neural semantic parser and generator models that demonstrated enhanced generalization ability compared to models trained on non-augmented data. We evaluated the efect of semantic data augmentation using two state-of-the-art transformer-based neural sequence-to-sequence models, i.e., byT5 and IT5. Our implementation shows promising results for Italian semantic processing. Data augmentation significantly increased the performance of semantic parsing from 76.10 to 90.56 (+14.46%) F1-SMATCH score and generation with 37.79 to 57.48 (+19.69%) BLEU, 30.83 to 40.95 (+10.12%) METEOR, 81.66 to 90.97 (+9.31%) COMET, 54.84 to 70.88 (+16.04%) chrF, and 88.86 to 92.97 (+4.11%) BERT scores. These results demonstrate the efectiveness of our novel augmentation approach in enhancing semantic processing capabilities for low-resource languages like Italian.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data augmentation</kwd>
        <kwd>Italian semantic processing</kwd>
        <kwd>low-resource NLP</kwd>
        <kwd>semantic parsing and generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>for nouns, adjectives, and articles.</p>
      <p>
        In the context of NLP and Natural Language
GeneraThe field of Natural Language Processing (NLP) has seen tion (NLG), Italian has seen moderate progress. However,
significant advancements in recent years, particularly in compared to high-resource languages like English, Italian
semantic processing tasks. These tasks, which include still lacks extensive task-specific datasets, particularly
semantic parsing and natural language generation, of- in areas requiring deep semantic understanding. This
ten rely heavily on parallel corpora — datasets that align deficiency is especially pronounced in tasks involving
text in one language with its semantic representation or formal semantic representations such as Discourse
Repwith text in another language [
        <xref ref-type="bibr" rid="ref19 ref35">1, 2</xref>
        ]. For languages with resentation Structures (DRS) [
        <xref ref-type="bibr" rid="ref15 ref31">7</xref>
        ].
rich linguistic resources, such as English, the availabil- While Italian is not typically classified as a
lowity of large-scale parallel corpora has facilitated rapid resource language in general NLP terms, it can be
considprogress in semantic processing [3, 4]. However, for ered as such in the specific domain of semantic
processmany languages, including Italian, the scarcity of such ing, especially when dealing with formal semantic
represources poses a significant challenge to advancing se- resentations. This status is characterized by: (i) Named
mantic NLP capabilities [
        <xref ref-type="bibr" rid="ref37">5, 6</xref>
        ]. Italian presents unique Entities: Italian naming conventions difer from those
challenges and opportunities. While Italian shares some in English, requiring adaptation in entity recognition
structural similarities with English, it possesses distinct tasks; (ii) Syntactic Structure: Although Italian follows
linguistic features that complicate NLP tasks. These in- the SVO structure like English, it allows for greater
flexiclude a more flexible word order, a rich system of verb bility, posing challenges, especially in parsing tasks; (iii)
conjugations, and the presence of grammatical gender Grammatical Gender: The presence of grammatical
gender in Italian adds complexity to tasks such as coreference
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, resolution and agreement in the generated text. These
*DCecor0r4es—po0n6,d2in02g4a,uPtihsao,rI.taly linguistic features, combined with the limited availability
$ muhammadsaad.amin@unito.it (M. S. Amin); of semantically annotated corpora, position Italian as a
luca.anselma@unito.it (L. Anselma); alessandro.mazzei@unito.it challenging language for advanced semantic NLP tasks.
(A. Mazzei) Data augmentation (DA), a technique widely used in
0000-0002-7002-9373 (M. S. Amin); 0000-0003-2292-6480 machine learning to increase the size and diversity of
(L. Anselma); 0000-0003-3072-0108 (A. Mazzei)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License training datasets, has shown promise in addressing
reAttribution 4.0 International (CC BY 4.0).
(a) DRS (box notation)
x1 s1 t1
male.n.02(x1)
      </p>
      <p>Name(x1, tom)
rude.a.01(s1)</p>
      <p>Time(s1, t1)</p>
      <p>AttributeOf(s1, x1)
time.n.08(t1)</p>
      <p>t1 ≺ now
(c) DRS/SBN (sequence notation)
male.n.02 Name "Tom”
time.n.08 TPR now
rude.a.01 AttributeOf -2 Time -1</p>
      <sec id="sec-1-1">
        <title>The remaining paper is organized as follows: Section 2</title>
        <p>provides an overview of DRS. Section 3 details semantic
DA for Italian with a focus on named entities, lexical, and
grammatical data transformation techniques. Section 4
presents our experimental implementation, implications
of our results and findings, and their broader impact on
the field. Finally, Section 5 concludes the paper, addresses
certain limitations, and outlines directions for future
research.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        source scarcity in NLP [
        <xref ref-type="bibr" rid="ref61">8</xref>
        ]. For semantic tasks involving
DRS, DA presents unique challenges due to the need to In this Section, we provide an overview of the formal
preserve semantic equivalence while introducing linguis- definition of DRS.
tic variety. DRS is a formal semantic representation, that captures
      </p>
      <p>
        In the context of Italian semantic processing, tradi- the essential meaning of text, equivalent to first-order
tional augmentation techniques such as random word logic. DRS is capable of representing a broad spectrum of
insertion, deletion, substitutions or back-translation have linguistic phenomena, including anaphora,
presupposilimited applicability due to the scarcity of Italian-specific tions, and temporal expressions [
        <xref ref-type="bibr" rid="ref15 ref31">7</xref>
        ]. What sets DRS apart
semantic resources [9]. This necessitates innovative ap- from other meaning representations, such as Abstract
proaches that can leverage resources from high-resource Meaning Representation (AMR) [2], is its proficiency
languages while maintaining the integrity of Italian lin- in handling negation and quantification, as well as its
guistic structures. language-independent nature. Furthermore, DRS can
ef
      </p>
      <p>
        Given the challenges outlined, this study aims to de- fectively represent meaning across multiple sentences in
velop a novel cross-lingual DA technique for Italian, a discourse.
specifically tailored for DRS-based semantic parsing and Initially, DRS utilized box notation to provide scope to
generation tasks. While word substitution techniques meaning representation (see Figure 1(a)). This notation
are established in DA literature, our approach introduces incorporates (e.g. x1) and conditions (e.g. person, Time),
an innovative cross-lingual framework that leverages the with concepts anchored using WordNet synsets and
thelanguage-neutral nature of DRS. The method uniquely matic roles derived from VerbNet. Operators (e.g. =) are
bridges the resource gap between high-resource and low- employed to establish comparative relationships between
resource languages by temporarily transforming Italian entities. Conditions can also embody complex structures
examples into English, enabling access to rich lexical to express logical (e.g. NEGATION, ¬) or rhetorical
relaresources like WordNet, before converting back to Ital- tionships among various condition sets. To address the
ian. This cross-lingual approach leverages the univer- challenges posed by the complexity of box notation in
sal semantic representations of the DRS to enable more neural parser development, Clause Notation was
introadvanced data transformation approaches than Italian duced. This method streamlines DRS by reorganizing the
resources alone would allow, which is particularly advan- structure and placing variables before discourse referents
tageous given the limited availability of Italian-specific and conditions (see Figure 1(b)).
semantic datasets (see Table 1 for Italian examples). Further simplification led to the development of
SeThis paper makes the following key contributions: quence Box Notation (SBN), a variable-free format
de1. A novel cross-lingual augmentation methodol- signed to be more compatible with neural
sequence-toogy that leverages English WordNet to enhance sequence transformer architectures [
        <xref ref-type="bibr" rid="ref15 ref31">7</xref>
        ]. SBN utilizes
inItalian semantic datasets. dices to form connections between concepts, with
the2. Empirical evidence demonstrating the efective- matic roles indicating the nature of these connections
ness of this augmentation technique in improv- (see Figure 1(c)). This notation can also be interpreted in
      </p>
      <sec id="sec-2-1">
        <title>The data-intensive nature of neural networks presents</title>
        <p>a significant challenge for low-resource languages like
Italian, where available data is limited. This challenge is
further compounded when dealing with logical
semantic representations such as DRS-Text pairs, which follow
specific patterns. In DRS, concepts are represented as
a combination of lemma, part of speech, and WordNet
sense numbers. The part of speech component includes
adjectives, adverbs, common nouns, and verbs with
lexical entities, followed by other logical representations
(e.g., “idea.n.01”).</p>
        <p>Our augmentation methodology addresses the scarcity
of Italian lexical resources by utilizing a cross-lingual
approach that takes advantage of the language-neutral
structure of DRS. The process (i) begins with translating
the Italian text into English while keeping the original
DRS unchanged; (ii) allowing us to apply a variety of
augmentation techniques including named-entity, lexical,
and grammatical augmentations—made possible through
access to English WordNet—on English-aligned
examples; (iii) after augmentation, the English examples are
translated back into Italian, ensuring that the semantic
relationships from the DRS are preserved. This strategy
not only generates semantically rich and contextually
relevant data but also overcomes the limitations of
Italianspecific resources by augmenting English-aligned
examples and transforming them into Italian-aligned examples
(see Figure 2 and Table 4 in Appendix), maintaining
semantic accuracy through DRS’s formal representations.</p>
        <sec id="sec-2-1-1">
          <title>3.1. Named Entities Augmentation</title>
          <p>Our initial augmentation approach focused on proper
noun (PN) augmentation, also referred to as Named
Entities (NE) Augmentation. This method targets the
transformation of specific named entities, particularly
person names (PER, both male and female) and
geographical entities (GPE) such as city, state, country, and
island names. These entities are explicitly represented in
the DRS through predicates (e.g., “male.n.02” for person
names). We employed a rule-based approach to extract
NEs from both the DRS and the text. Our NE
augmentation strategy involves replacing existing entities with
those outside the context of the dataset. This approach
aims to evaluate the role of external lexical information
in semantic processing.</p>
          <p>To maintain semantic integrity, we ensure that NEs</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>3.2. Lexical Entities Augmentation</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Our lexical augmentation strategy focuses on four spe</title>
        <p>cific categories: common nouns, adjectives, adverbs, and
verbs. We utilize WordNet synsets to group these entities,
ensuring that transformations maintain the contextual
sense and meaning of the sentences.</p>
        <p>Common Noun Augmentation: CN can
significantly alter sentence meaning, making their
augmentation challenging. We employ a rule-based approach
to extract common nouns from the Sequence Box
Notation (SBN) and use NLTK’s “WordNetLemmatizer” for the
corresponding text. The augmentation process involves
replacing nouns with their hyponyms from WordNet,
which allows for more specific substitutions while
preserving contextual meaning.</p>
        <p>Verb Augmentation: Verbs play a crucial role in
sentence context, making their augmentation complex.
We use WordNet-based troponyms to replace verbs with
more specific, contextually similar alternatives. This
approach helps maintain semantic coherence while
introducing lexical variety.</p>
        <p>Adjective Augmentation: Adjectives, as descriptive
attributes of nouns, are augmented using WordNet-based
antonyms. This method generates new, contextually
similar examples. We manually inspect the augmented data
to ensure the semantic relevance and correctness of
adjective substitutions.</p>
        <p>Adverb Augmentation: For adverbs, we employ a
WordNet-based synonym replacement approach. This
method aims to generate similar data examples while
preserving contextual relevance. As with other categories,
we manually verify the semantic correctness of the newly
generated examples. Throughout the augmentation
process for all lexical categories, we maintain consistency
between the SBN logical representations and the
corresponding text. This ensures that the augmented data
remains coherent and semantically valid across both the
formal representation and natural language formats.</p>
        <sec id="sec-2-2-1">
          <title>3.3. Grammatical Augmentation</title>
          <p>
            This approach primarily focuses on transforming
morpho-syntactic relations within sentences, with a
particular emphasis on tense modifications. This method same data split for training, development, and test sets
involves non-lexical substitutions that alter the tempo- [
            <xref ref-type="bibr" rid="ref29 ref4">10</xref>
            ]. Each data example consists of a pair: a DRS meaning
ral context of events without introducing external vo- representation and its corresponding textual form.
cabulary. Our strategy encompasses a wide range of
grammatical transformations, including shifts between Table 1
present, past, and future tenses, as well as changes in Dataset split along with statistic numbers for multi-lingual
voice (active to passive and vice versa), mood (e.g., im- baselines. Note: T_Gold = Train Gold; T_Silver = Train Silver
perative), negation, number (singular to plural), subject- Langs T_Gold Dev Test T_Silver
object relationships, aspect (progressive and perfect), and Italian 745 555 555 4,316
other grammatical features such as infinitive forms, first- German 1,206 900 900 6,862
person perspective, and perfect participles. Dutch 586 435 435 1,646
          </p>
          <p>To implement these transformations, we employ a dual English 9,057 1,132 1,132 143,731
approach: for the Sequence Box Notation (SBN), we use a
rule-based system to replace logical entities (e.g., chang- Categorization of Augmented Data: To facilitate a
ing “EQU” to “TPR” or “TSU” for tense shifts), while for comprehensive analysis of our augmentation strategies,
the corresponding natural language text, we utilize the we classify the augmented dataset into various categories
tenseflow API 1. This comprehensive grammatical aug- based on named entities, lexical, and grammatical
transmentation technique allows us to significantly expand formations. Our experimental approach is structured into
our dataset with grammatically diverse versions of exist- three main categories: (i) baseline experiments without
ing sentences, maintaining core semantic content while augmentation; (ii) individual augmentation — applying
introducing new syntactic variety. Such diversity is es- one augmentation technique at a time; and (iii)
comsential for training robust NLP models, particularly for pound augmentation — concatenating all augmentation
tasks involving temporal reasoning and varied syntactic approaches applied to the Italian semantic corpus. Table 2
structures. provides detailed information on the types of
augmenta</p>
          <p>While our augmentation strategies efectively expand tion, dataset sizes, and the number of training examples
the dataset nine times, we acknowledge specific chal- for both individual and compound augmentation
stratelenges in preserving semantic integrity during transfor- gies employed in our experiments.
mations. For named entities, semantic preservation is
straightforward as we maintain entity types. However, Table 2
tense transformations present more complexity due to Impact on the size of Italian dataset examples without
augItalian’s rich verbal morphology. For instance, the Ital- mentation and with individual and compound augmentation.
ian imperfetto tense (“cantava”–was singing) can map to Note: w/o = without; Aug = Augmentation; Ex. = Examples;
multiple English past tense forms, requiring careful han- G = Gold; S = Silver; G-S = Gold-Silver; CN = Common Noun;
dling to maintain the original temporal relations in the NE = Named Entities; Adj. = Adjectives; Adv = Adverbs; Comp
DRS. Additionally, Italian’s pro-drop nature and flexible = Compound
word order can complicate the preservation of argument
structure when performing verbal augmentations.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimental Implementation</title>
      <sec id="sec-3-1">
        <title>Our experimental setup utilizes the Italian, German,</title>
        <p>
          Dutch, and English versions of logic-text pairs from the
Parallel Meaning Bank (PMB) release 5.0.02 [
          <xref ref-type="bibr" rid="ref29 ref4">10</xref>
          ]
(statistical numbers for multilingual baselines are listed in
Table 1). These datasets are categorized into three
annotation levels: Gold (fully manually annotated), Silver
(partially manually annotated), and Copper
(machinetranslated version of English data examples without any
annotation). For Italian meaning representation, we
maintain this annotation distinction. We adhere to the
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>1https://github.com/bendichter/tenseflow</title>
        <p>2The PMB is developed at the University of Groningen as part of
the NWO-VICI project “Lost in Translation – Found in Meaning”
(Project number 277-89-003), led by Johan Bos.</p>
        <p>Training Type
w/o Aug
NE Aug
CN Aug
Adj Aug
Adv Aug
Verb Aug
Tense Aug
Comp Aug
Dev
Test</p>
      </sec>
      <sec id="sec-3-3">
        <title>Neural Architecture Our approach to semantic pars</title>
        <p>ing and generation primarily involves fine-tuning the
byT5 model [11], a multilingual variant of the T5
transformer. We chose byT5 for several compelling reasons:
(i) its multilingual nature enhances cross-language and
cross-task generalization; (ii) its byte-level tokenization</p>
        <p>Size
x1
# G Ex.</p>
        <p>
          745
x2
x2
x2
x2
x2
x4
x9
–
–
strategy aids in understanding complex language pat- racy and the linguistic quality of our model output across
terns and semantic information; (iii) it demonstrates supe- parsing and generation tasks.
rior performance in spelling and pronunciation-sensitive Results and Analysis The experimental results
retasks due to its resilience to noisy data; (iv) and as a ported in Table 3 demonstrate the eficacy of diverse DA
token-free model, it operates directly on raw UTF-8 data. strategies in enhancing semantic parsing and text
genImportantly, byT5 has shown state-of-the-art results on eration tasks for Italian DRS. We used diferent variants
multilingual NLP benchmarks [
          <xref ref-type="bibr" rid="ref45">11, 12, 13</xref>
          ]. We also con- of T5 (byT5 and IT5) models and evaluated performance
ducted experiments with T5 specialized on ITalian (IT5) on the PMB-5.0.0 dataset, utilizing SMATCH F1 for
pars[14], a model that had demonstrated promising results ing and BLEU, METEOR, COMET, chrF, and BERT-Score
in Italian language understanding and generation across metrics for generation tasks.
various benchmarks. In the multilingual baseline comparisons, Italian
        </p>
        <p>
          Our fine-tuning strategy involves two stages: initial (76.10% SMATCH F1 for parsing) exhibits superior
perforpre-fine-tuning with gold and silver (for exp.1–12), and mance to Dutch (42.77%) and comparable results to
Gergold, silver, and copper (for exp.13–20) data for 5 epochs man (73.00%), while expectedly trailing English (91.42%).
to provide foundational DRS knowledge, followed by fine- For generation, Italian achieves baseline scores of 37.79
tuning on only gold data—without augmentation—with BLEU, 30.83 METEOR, 81.66 COMET, 54.84 chrF, and
an early stopping mechanism [
          <xref ref-type="bibr" rid="ref33 ref56">15</xref>
          ]. The hyperparameter 88.86 BERT-Score, positioning it better than Dutch and
setting used in our experimentation is listed in Table 5. German in all metrics.
        </p>
        <p>
          Evaluation Methods For evaluation, we employ dis- Individual augmentation strategies uniformly yield
tinct methods for semantic parsing and natural language improvements over the baseline Italian model. For
parsgeneration tasks. In parsing evaluation, we first trans- ing tasks, tense augmentation demonstrates the
highform DRS into Penman notation [16], then use SMATCH est eficacy among singular strategies, achieving 84.13%
[17] to calculate the overlap of triples between system SMATCH F1 (exp. 10). In generation tasks, tense
augmenoutput and the gold standard, assessing the output us- tation emerges as the most efective individual strategy,
ing F-Score to balance precision and recall [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ]. For attaining scores of 44.49 BLEU, 33.46 METEOR, 85.14
generation evaluation, we use a combination of difer- COMET, 60.05 chrF, and 90.26 BERT-Score (exp. 10).
ent automatic metric evaluations including (i) n-gram- These enhancements indicate that each augmentation
based measures like BLEU [19], METEOR [20], and chrF type contributes uniquely to the semantic understanding
[
          <xref ref-type="bibr" rid="ref38">21</xref>
          ]; (ii) neural model-based COMET score [22]; and and generative capabilities of the neural model.
(iii) the pre-trained model-based BERT-Score (“bert-base- The efectiveness of tense augmentation correlates
multilingual-cased” model) [23]. These comprehensive with the significant presence of temporal relations and
evaluations allow us to assess both the technical accu- structural simplicity in the test set’s DRSs. Our analysis
reveals that approximately 94.05% of the test set contains dataset, outnumber our G+S compound augmentation by
active voice examples, while passive voice examples ac- approximately 2:1, somewhat diminishing the observable
count for only 5.95%, making tense augmentation par- impact of augmentation strategies. Furthermore, in our
ticularly valuable for improving model performance in experiments with G+S+C (exp. 13–20), we have used the
sentence structures. Additionally, 98.20% of the test set Copper version without any augmentation—just to have
consists of simple sentences, which further emphasizes a fair comparison with literature reference (see
experithe importance of augmentations that can enhance lexical mental results of [24] in in Table 3). These experimental
diversity without overcomplicating sentence complex- outcomes provide strong evidence that DA can
signifiity. We observed the following distribution of sentence cantly enhance the performance of semantic parsing and
types in our test set: declarative (87.57%), exclamatory text generation models for Italian.
(2.52%), and interrogative (9.78%), reinforcing the need
for augmentations that efectively handle these dominant
structures. 5. Conclusion
        </p>
        <p>The compound augmentation approach, which
integrates all augmentation strategies, produces the optimal This study has successfully developed and evaluated a
results for the Gold+Silver (G+S) dataset. This compre- novel cross-lingual DA technique for Italian, specifically
hensive strategy achieves 85.98% SMATCH F1 for parsing tailored for DRS-based semantic parsing and generation
and notable improvements across all generation metrics tasks. Our research has made significant improvements
(45.12 BLEU, 34.54 METEOR, 85.66 COMET, 61.66 chrF, in addressing the challenges faced by low-resource
and 90.56 BERT-Score), underscoring the synergistic ben- languages in advanced NLP tasks. The proposed
efits of combining diverse augmentation techniques (exp. augmentation methodology, leveraging English WordNet
11). The performance of IT5 proved inadequate when to enhance Italian semantic datasets, has demonstrated
applied to formal meaning representations i.e., DRS. The remarkable efectiveness. Empirical evidence shows
model exhibited suboptimal results in both semantic pars- substantial improvements in performance scores for both
ing and text generation tasks subsequent to fine-tuning DRS parsing and generation tasks in Italian. Notably,
on the compound augmentation dataset. The suboptimal our approach achieved a 90.56% SMATCH F1 score
performance of IT5 can be attributed to its pre-training for parsing and significant enhancements across all
focus on general Italian language tasks rather than formal generation metrics (BLEU: 57.48, METEOR: 40.95,
meaning representations like DRS. This limitation high- COMET: 90.97, chrF: 70.88, BERT-Score: 92.97) on the
lights the challenges of adapting general-purpose lan- G+S+C dataset, surpassing both baseline models and
guage models to specialized semantic processing tasks. previous state-of-the-art results. Our detailed analysis</p>
        <p>Furthermore, comparisons with extant literature ([24] reveals that data augmentation positively afects the
in Table 3) reveal the superior performance of our handling of Italian-specific linguistic features in semantic
proposed approach. The referenced study reports processing. The improvements observed across various
87.20% SMATCH F1 for parsing and 53.20 BLEU, 38.50 augmentation strategies indicate enhanced capability in
METEOR, and 87.50 COMET for generation on the managing syntactic flexibility and grammatical nuances
Gold+Silver+Copper (G+S+C) dataset. In contrast, our in Italian. This suggests a successful transfer of semantic
Italian model (exp. 13—G+S+C baseline) achieves 89.22% knowledge through the lens of Italian DRS.
SMATCH F1, 56.46 BLEU, 40.48 METEOR, 90.02 COMET,
70.38 chrF, and 92.72 BERT-Score on the same dataset, Limitations:
representing significant advancements across all metrics. Despite our results approach the performance metrics</p>
        <p>The most notable results are observed in the G+S+C of English—a rich resource language, there remains a
dataset experiments. Verb Augmentation (exp. 18) gap that future research could address. For example, the
achieves the highest parsing score of 90.56% SMATCH F1, original sentence “Tom è piuttosto scarso a tennis.” (“Tom
while Tense Augmentation (exp. 19) leads in generation is rather poor at tennis.”) becomes “Bob era piuttosto
with scores of 57.48 BLEU, 40.95 METEOR, 90.97 COMET, ricco con i single.” (“Bob was sort of rich at singles.”)
70.88 chrF, and 92.97 BERT-Score. These results not only While this method introduces linguistic diversity, it
surpass previous benchmarks but also approach the per- can result in less coherent sentences in some cases, as
formance metrics of English, a high-resource language, seen in this example. Such limitations are common
despite comparatively limited lexical resources for Ital- with cross-lingual augmentation strategies through
ian. The similar performance between the baseline Italian back-and-forth language translations, which focus
model (exp. 13) and compound augmentation (exp. 20) on on lexical variation over syntactic coherence. Future
G+S+C is primarily attributable to the substantial volume refinement, such as filtering improbable substitutions
of Copper data (92, 394 examples). These Copper exam- or adding human validation, could help ensure more
ples, which are Italian translations of the English Bronze consistent logicality in cross-lingual semantic tasks.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>We thank “High-Performance Computing for Artificial Intelligence (HPC4AI) at the University of Turin” for providing GPU support [25].</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>A. Data Transformation through</title>
    </sec>
    <sec id="sec-6">
      <title>Augmentation</title>
      <p>
        ical augmentation is when “A girl is playing the flute” is
changed to one of three tenses: “A girl was playing the
lfute”, “A girl will be playing the flute”, or “A girl has been
The SBN is graphically shown in Figure 1 both with and playing the flute”. These illustrations show how
enhancwithout augmentation (a and b), highlighting the distinc- ing various phrase constituents can produce diverse and
tions between proper nouns, common nouns, adjectives, richer datasets, supporting the creation of strong neural
adverbs, and verbal tense augmentations. With this aug- models.
mentation, the original sentence “Tom è piuttosto scarso
a tennis.” or “Tom is rather poor at tennis.” becomes B. Statistical distribution of
“Bob era piuttosto ricco con i single.” or “Bob was sort
of rich at singles.”. In Figure 1, augmented logical no- examples
tions are highlighted conceptually. We used the Parallel
Meaning Bank (PMB) dataset for this investigation, using
both its gold (completely manually annotated) and silver
(partially manually annotated) standard versions, and
split it according to conventional methods for training,
development, and testing.
(a) DRS (sequence box notation) without augmentation:
male.n.02 Name "Tom"
time.n.08 EQU now
rather.r.02
poor.a.04 AttributeOf -3 Time -2 Degree -1 Theme +1
tennis.n.01
% Tom [
        <xref ref-type="bibr" rid="ref19 ref35">0-3</xref>
        ]
% is [
        <xref ref-type="bibr" rid="ref37">4-6</xref>
        ]
% rather [
        <xref ref-type="bibr" rid="ref15 ref29 ref31 ref4 ref45 ref61">7-13</xref>
        ]
% poor at [
        <xref ref-type="bibr" rid="ref17 ref33 ref38 ref56">14-21</xref>
        ]
% tennis. [
        <xref ref-type="bibr" rid="ref47 ref63">22-29</xref>
        ]
(b) DRS (sequence box notation) with augmentation:
male.n.02 Name ”Bob"
time.n.08 TPR now
sort_of.r.01
rich.a.01 AttributeOf -3 Time -2 Degree -1 Theme +1
singles.n.01
% Bob [
        <xref ref-type="bibr" rid="ref19 ref35">0-3</xref>
        ]
% was [
        <xref ref-type="bibr" rid="ref15 ref31 ref37">4-7</xref>
        ]
% sort of [
        <xref ref-type="bibr" rid="ref29 ref33 ref4 ref45 ref56 ref61">8-15</xref>
        ]
% rich at [
        <xref ref-type="bibr" rid="ref17 ref38">16-23</xref>
        ]
% singles. [
        <xref ref-type="bibr" rid="ref47 ref63">24-32</xref>
        ]
      </p>
      <p>Table 1 reports the number of training, development, and
testing examples in each language as well as the statistical
distribution of the dataset used for multilingual baselines.</p>
      <p>Train Gold (T_Gold), Train Silver (T_Silver),
Development (Dev), and Test sets comprise the dataset. There
are 4,316 T_Silver, 555 Dev, 555 Test, and 745 T_Gold
examples for Italian. There are 6,862 T_Silver, 900 Dev,
900 Test, and 1,206 T_Gold examples in German. There
are 1,646 T_Silver, 435 Dev, 435 Test, and 586 T_Gold
examples in Dutch. There are 143,731 T_Silver, 1,132 Dev,
1,132 Test, and 9,057 T_Gold examples for English, the
language with the largest representation. As can be seen
from this distribution, the English corpus is substantially
larger than the other languages, ofering a solid dataset
for training and evaluation. This diversity in dataset
size across languages highlights the varying amounts of
linguistic data available for training multilingual models.</p>
    </sec>
    <sec id="sec-7">
      <title>C. Impact of Augmentation on</title>
    </sec>
    <sec id="sec-8">
      <title>Dataset Size</title>
      <p>Table 2 compares the number of instances with and
without augmentation to those with individual and
compound augmentations to show how diferent
augmen</p>
      <p>In order to provide transformed instances for neural tation methods afect the size of the dataset. Without any
semantic processing and text generation, named entities, augmentation, the original dataset had 5061 gold-silver
lexical, and grammatical DA approaches were applied samples altogether, 4316 silver examples, and 745 gold
to the original sentences as shown in Table 4. It demon- examples. Applying individual augmentations, including
strates how varying a sentence’s constituent parts can Named Entities, Common Noun, Adjective, Adverb, and
improve dataset variety. When it comes to named enti- Verb augmentations, twice the size of the dataset; for
ties, the sentence “Tom asked Mary if she had been to every augmentation type, there are 1490 gold, 8632 silver,
Boston” becomes “Bob asked Sarah if she had been to and 10122 gold-silver examples. Even more so, tense
augCambridge”, demonstrating how proper nouns are substi- mentation quadruples the amount of the dataset to 2980
tuted. “Tom played with his dog” becomes “Tom played gold, 17264 silver, and 20244 gold-silver examples.
Comwith his puppy” when it comes to common nouns, il- pound augmentation yields the largest gain, ninefolding
lustrating synonym replacement with hyponyms. Verb the dataset size to 6705 gold, 38844 silver, and 45549
goldaugmentation is demonstrated by changing the verb from silver examples. Compound augmentation incorporates
“Tom thinks I stole the money” to “Tom philosophizes I numerous augmentation strategies. The number of
exstole the money”, changing the meaning of the phrase. To amples in both the development and test sets stays at
demonstrate adjective and adverb augmentations, lexical 555. This notable augmentation of the dataset size
highentities are changed from “ill” to “well” and “deeply” to lights the potential for more comprehensive and diverse
“profoundly”, respectively. The last example of
grammatTom ha chiesto a Mary se fosse stata a Boston.
“Tom asked Mary if she had been to Boston.”</p>
      <p>Bob ha chiesto a Sarah se fosse stata a Cambridge.</p>
      <p>“Bob asked Sarah if she had been to Cambridge.”
Tom ha giocato con il suo cane.
“Tom played with his dog.”
Tom pensa che io abbia rubato i soldi.
“Tom thinks I stole the money.”
Lui è malato.
“He is ill.”
Una ragazza suona il flauto.
“A girl is playing the flute.”
La ragazza è profondamente legata a sua zia.
“The girl is deeply attached to her aunt.”</p>
      <p>La ragazza è sinceramente legata a sua zia.
“The girl is sincerely attached to her aunt.”
Tom ha giocato con il suo cucciolo.
“Tom played with his puppy.”
Tom filosofeggia che ho rubato i soldi.
“Tom philosophizes I stole the money.”
Lui è bene.
“He is well.”
Una ragazza suonava il flauto.
“A girl was playing the flute.”
Una ragazza suonerà il flauto.
“A girl will be playing the flute.”
Una ragazza ha suonato il flauto.
“A girl has been playing the flute.”</p>
    </sec>
    <sec id="sec-9">
      <title>D. Hyperparameters For</title>
    </sec>
    <sec id="sec-10">
      <title>Experimental Implementation</title>
      <p>In Table 5, we report a list of the main hyperparameters
used in our experimental implementation. We have used
the same experimental setting for all of our experiments
reported in Table 3. We used the AdamW optimizer with
a batch size of 32, a learning rate of 1e-4, and a
maximum sequence length of 512 tokens. Throughout our
experiments, we used GeGLU for activation functions.
Two rounds of fine-tuning were carried out: the first
stage lasted for five epochs, and the second stage used
early stopping criteria to dynamically decide the ideal
number of epochs depending on metrics related to the
performance of the model. These hyperparameters were
chosen with attention to guarantee reliable operation
and eficient byT5 model customization to our particular
tasks and datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>tics: ACL-IJCNLP</source>
          <year>2021</year>
          , Association for Compu-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>tational Linguistics</surname>
          </string-name>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>968</fpage>
          -
          <lpage>988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          URL: https://aclanthology.org/
          <year>2021</year>
          .findings-acl.
          <volume>84</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>doi:10</source>
          .18653/v1/
          <year>2021</year>
          .findings-acl.
          <volume>84</volume>
          . [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          , Pre-trained
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>2023, Association for Computational Linguistics,</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Toronto</surname>
          </string-name>
          , Canada,
          <year>2023</year>
          , pp.
          <fpage>5586</fpage>
          -
          <lpage>5600</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          //aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>345</volume>
          . doi: 10.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2023</year>
          .findings-acl.
          <volume>345</volume>
          . [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Evang</surname>
          </string-name>
          , N. Venhuizen, De-
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>and Evaluation (LREC'12)</source>
          , European Language
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Resources</given-names>
            <surname>Association</surname>
          </string-name>
          (ELRA), Istanbul, Turkey, [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          , S. Narang,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <year>2012</year>
          , pp.
          <fpage>3196</fpage>
          -
          <lpage>3200</lpage>
          . URL: http://www.lrec-conf. M.
          <string-name>
            <surname>Kale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Roberts</surname>
          </string-name>
          , C. Rafel, Byt5: Towards
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          org/proceedings/lrec2012/pdf/534_Paper.
          <article-title>pdf. a token-free future with pre-trained byte-to-</article-title>
          <string-name>
            <surname>byte</surname>
            [2]
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Banarescu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bonial</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cai</surname>
          </string-name>
          , M. Georgescu, models, Transactions of the Association for Com-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Grifitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Hermjakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Knight</surname>
          </string-name>
          , P. Koehn, putational Linguistics 10 (
          <year>2022</year>
          )
          <fpage>291</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , Abstract meaning rep- [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stankevičius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lukoševičius</surname>
          </string-name>
          , J. Kapočiu¯tė-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>7th linguistic annotation workshop and interoper- diacritics and typos with a byt5 transformer model,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>ability with discourse</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          .
          <source>Applied Sciences</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>2636</fpage>
          . [3]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Anselma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazzei</surname>
          </string-name>
          , Exploring data [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Belouadi</surname>
          </string-name>
          , S. Eger, ByGPT5: End-to-end
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>18th Conference of the European Chapter of the Graber</article-title>
          , N. Okazaki (Eds.),
          <source>Proceedings of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>Association for Computational Linguistics (Volume 61st Annual Meeting of the Association for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>1: Long Papers), Association for Computational Computational Linguistics (Volume 1: Long Pa-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , St.
          <source>Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>2164</fpage>
          -
          <lpage>2178</lpage>
          . pers), Association for Computational Linguis-
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>eacl-long</article-title>
          .
          <volume>132</volume>
          . tics, Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>7364</fpage>
          -
          <lpage>7381</lpage>
          . URL: [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Abzianidze</surname>
          </string-name>
          , R. van Noord,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          , The https://aclanthology.org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>406</volume>
          . doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>parallel meaning bank: A framework for</article-title>
          semanti-
          <volume>18653</volume>
          /v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>406</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>cally annotating multiple languages</article-title>
          , Applied math- [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, IT5: Text-to-text pretraining for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>ematics and informatics 25</source>
          (
          <year>2020</year>
          )
          <fpage>45</fpage>
          -
          <lpage>60</lpage>
          .
          <article-title>Italian language understanding and generation</article-title>
          , in: [5]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazzei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Anselma</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Towards</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
          </string-name>
          , S. Sakti,
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>data augmentation for drs-to-text generation</article-title>
          , in: N. Xue (Eds.),
          <source>Proceedings of the 2024</source>
          Joint In-
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>CEUR WORKSHOP PROCEEDINGS</source>
          , volume
          <volume>3287</volume>
          , ternational Conference on Computational Linguis-
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>152</lpage>
          . tics,
          <source>Language Resources and Evaluation (LREC</source>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <source>Annotating the COLING</source>
          <year>2024</year>
          ),
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>little prince with chinese amrs</article-title>
          , in: Proceedings of pp.
          <fpage>9422</fpage>
          -
          <lpage>9433</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>the 10th Linguistic Annotation Workshop held in lrec-main.823.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>Conjunction with ACL 2016 (LAW-X</article-title>
          <year>2016</year>
          ),
          <year>2016</year>
          , [15]
          <string-name>
            <surname>R. van Noord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          , Character-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          pp.
          <fpage>7</fpage>
          -
          <lpage>15</lpage>
          .
          <article-title>level representations improve DRS-based seman[7</article-title>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          ,
          <article-title>The sequence notation: Catching complex tic parsing even in the age of BERT</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <article-title>meanings in simple graphs</article-title>
          ,
          <source>in: Proceedings of the Proceedings of the 2020 Conference on Empir-</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>15th International Conference on Computational ical Methods in Natural Language Processing</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Semantics (IWCS</surname>
          </string-name>
          <year>2023</year>
          ), Nancy, France,
          <year>2023</year>
          , pp.
          <source>(EMNLP)</source>
          , Association for Computational Linguis-
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          1-
          <fpage>14</fpage>
          . tics, Online,
          <year>2020</year>
          , pp.
          <fpage>4587</fpage>
          -
          <lpage>4603</lpage>
          . URL: https: [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shorten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Khoshgoftaar</surname>
          </string-name>
          , A survey on image //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>371</volume>
          . doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>data augmentation for deep learning</article-title>
          ,
          <source>Journal of big 18653/v1/2020.emnlp-main.371.</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <source>data 6</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>48</lpage>
          . [16]
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Kasper</surname>
          </string-name>
          ,
          <article-title>A flexible interface for linking applica</article-title>
          [9]
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gangal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandar</surname>
          </string-name>
          , S. Vosoughi,
          <article-title>tions to Penman's sentence generator</article-title>
          , in: Speech
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          21-
          <fpage>23</fpage>
          ,
          <year>1989</year>
          ,
          <year>1989</year>
          . URL: https://aclanthology.org/ challenging benchmarks, in: C.
          <string-name>
            <surname>Bonial</surname>
          </string-name>
          , J. Bonn,
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <fpage>H89</fpage>
          -
          <lpage>1022</lpage>
          . J. D. Hwang (Eds.),
          <source>Proceedings of the Fifth Inter</source>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Knight</surname>
          </string-name>
          ,
          <source>Smatch: an evaluation metric national Workshop on Designing Meaning Rep-</source>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>for semantic feature structures</article-title>
          , in: H. Schuetze, resentations @ LREC-COLING
          <year>2024</year>
          ,
          <article-title>ELRA and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          , M. Poesio (Eds.),
          <source>Proceedings of the 51st ICCL, Torino, Italia</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>162</fpage>
          -
          <lpage>175</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <article-title>Annual Meeting of the Association for Computa-</article-title>
          //aclanthology.org/
          <year>2024</year>
          .dmr-
          <volume>1</volume>
          .
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>tional Linguistics</surname>
          </string-name>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Associ- [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aldinucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rabellino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pironti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Spiga</surname>
          </string-name>
          , P. Vi-
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <year>2013</year>
          , pp.
          <fpage>748</fpage>
          -
          <lpage>752</lpage>
          . URL: https://aclanthology.org/ P. Margara,
          <string-name>
            <surname>I. Drago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marturano</surname>
          </string-name>
          , G. Marchetto,
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <fpage>P13</fpage>
          -
          <lpage>2131</lpage>
          . E. Piccolo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bagnasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lusso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vallero</surname>
          </string-name>
          , G. At[18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Poelman</surname>
          </string-name>
          , R. van Noord,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          , Transparent tardi, A.
          <string-name>
            <surname>Barchiesi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Colla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Galeazzi</surname>
          </string-name>
          , Hpc4ai:
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <article-title>ing graph transformations</article-title>
          ,
          <source>in: Proceedings of the in: Proceedings of the 15th ACM International</source>
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <source>29th International Conference on Computational Conference on Computing Frontiers, CF '18</source>
          , As-
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>tational Linguistics</surname>
          </string-name>
          , Gyeongju, Republic of Korea, NY, USA,
          <year>2018</year>
          , p.
          <fpage>279</fpage>
          -
          <lpage>286</lpage>
          . URL: https://doi.org/
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <year>2022</year>
          , pp.
          <fpage>4186</fpage>
          -
          <lpage>4192</lpage>
          . URL: https://aclanthology.org/ 10.1145/3203217.3205340. doi:
          <volume>10</volume>
          .1145/3203217.
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          2022.coling-
          <volume>1</volume>
          .
          <fpage>367</fpage>
          . 3205340. [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , W.-J. Zhu, Bleu: a
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          lation,
          <source>in: Proceedings of the 40th annual meeting</source>
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <year>2002</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Meteor:</surname>
          </string-name>
          <article-title>An automatic met-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>tion</surname>
          </string-name>
          ,
          <year>2005</year>
          , pp.
          <fpage>65</fpage>
          -
          <lpage>72</lpage>
          . [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Popović</surname>
          </string-name>
          ,
          <article-title>chrF: character n-gram F-score for au-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <string-name>
            <surname>Lisbon</surname>
          </string-name>
          , Portugal,
          <year>2015</year>
          , pp.
          <fpage>392</fpage>
          -
          <lpage>395</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          //aclanthology.org/W15-3049. doi:
          <volume>10</volume>
          .18653/v1/
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <fpage>W15</fpage>
          -
          <lpage>3049</lpage>
          . [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Farinha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Lavie,
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          (Eds.),
          <source>Proceedings of the 2020 Conference on</source>
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <string-name>
            <surname>guistics</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>2685</fpage>
          -
          <lpage>2702</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>213</volume>
          . doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2020</year>
          .emnlp-main.
          <volume>213</volume>
          . [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kishore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <string-name>
            <surname>BERT</surname>
          </string-name>
          , in: 8th International Conference on Learning
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <string-name>
            <surname>Representations</surname>
            ,
            <given-names>ICLR</given-names>
          </string-name>
          <year>2020</year>
          ,
          <string-name>
            <given-names>Addis</given-names>
            <surname>Ababa</surname>
          </string-name>
          , Ethiopia,
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          <string-name>
            <surname>April</surname>
          </string-name>
          26-
          <issue>30</issue>
          ,
          <year>2020</year>
          , OpenReview.net,
          <year>2020</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          //openreview.net/forum?id=SkeHuCVFDr. [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. van Noord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          , Gain-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>