1. Introduction

Data Augmentation for Low-Resource Italian NLP: Enhancing Semantic Processing with DRS

Muhammad Saad Amin

Luca Anselma

Alessandro Mazzei

0 0 Department of Computer Science, University of Turin , Italy

Discourse Representation Structure (DRS), a formal meaning representation, has shown promising results in semantic parsing and natural language generation tasks for high-resource languages like English. This paper investigates enhancing the application of DRS to low-resource Italian Natural Language Processing (NLP), in both semantic parsing (Text-to-DRS) and natural language generation (DRS-to-Text). To address the scarcity of annotated corpora for Italian DRS, we propose a novel data augmentation technique that involves the use of external linguistic resources including: (i) WordNet for common nouns, adjectives, adverbs, and verbs; (ii) LLM-generated named entities for proper nouns; and (iii) rule-based algorithms for tense augmentation. This approach not only increases the quantity of training data but also introduces linguistic diversity, which is crucial for improving model performance and robustness. Using this augmented dataset, we developed neural semantic parser and generator models that demonstrated enhanced generalization ability compared to models trained on non-augmented data. We evaluated the efect of semantic data augmentation using two state-of-the-art transformer-based neural sequence-to-sequence models, i.e., byT5 and IT5. Our implementation shows promising results for Italian semantic processing. Data augmentation significantly increased the performance of semantic parsing from 76.10 to 90.56 (+14.46%) F1-SMATCH score and generation with 37.79 to 57.48 (+19.69%) BLEU, 30.83 to 40.95 (+10.12%) METEOR, 81.66 to 90.97 (+9.31%) COMET, 54.84 to 70.88 (+16.04%) chrF, and 88.86 to 92.97 (+4.11%) BERT scores. These results demonstrate the efectiveness of our novel augmentation approach in enhancing semantic processing capabilities for low-resource languages like Italian.

eol>Data augmentation Italian semantic processing low-resource NLP semantic parsing and generation

1. Introduction

for nouns, adjectives, and articles.

In the context of NLP and Natural Language GeneraThe field of Natural Language Processing (NLP) has seen tion (NLG), Italian has seen moderate progress. However, significant advancements in recent years, particularly in compared to high-resource languages like English, Italian semantic processing tasks. These tasks, which include still lacks extensive task-specific datasets, particularly semantic parsing and natural language generation, of- in areas requiring deep semantic understanding. This ten rely heavily on parallel corpora — datasets that align deficiency is especially pronounced in tasks involving text in one language with its semantic representation or formal semantic representations such as Discourse Repwith text in another language [ 1, 2 ]. For languages with resentation Structures (DRS) [ 7 ]. rich linguistic resources, such as English, the availabil- While Italian is not typically classified as a lowity of large-scale parallel corpora has facilitated rapid resource language in general NLP terms, it can be considprogress in semantic processing [3, 4]. However, for ered as such in the specific domain of semantic processmany languages, including Italian, the scarcity of such ing, especially when dealing with formal semantic represources poses a significant challenge to advancing se- resentations. This status is characterized by: (i) Named mantic NLP capabilities [ 5, 6 ]. Italian presents unique Entities: Italian naming conventions difer from those challenges and opportunities. While Italian shares some in English, requiring adaptation in entity recognition structural similarities with English, it possesses distinct tasks; (ii) Syntactic Structure: Although Italian follows linguistic features that complicate NLP tasks. These in- the SVO structure like English, it allows for greater flexiclude a more flexible word order, a rich system of verb bility, posing challenges, especially in parsing tasks; (iii) conjugations, and the presence of grammatical gender Grammatical Gender: The presence of grammatical gender in Italian adds complexity to tasks such as coreference CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, resolution and agreement in the generated text. These *DCecor0r4es—po0n6,d2in02g4a,uPtihsao,rI.taly linguistic features, combined with the limited availability $ muhammadsaad.amin@unito.it (M. S. Amin); of semantically annotated corpora, position Italian as a luca.anselma@unito.it (L. Anselma); alessandro.mazzei@unito.it challenging language for advanced semantic NLP tasks. (A. Mazzei) Data augmentation (DA), a technique widely used in 0000-0002-7002-9373 (M. S. Amin); 0000-0003-2292-6480 machine learning to increase the size and diversity of (L. Anselma); 0000-0003-3072-0108 (A. Mazzei) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License training datasets, has shown promise in addressing reAttribution 4.0 International (CC BY 4.0). (a) DRS (box notation) x1 s1 t1 male.n.02(x1)

Name(x1, tom) rude.a.01(s1)

Time(s1, t1)

AttributeOf(s1, x1) time.n.08(t1)

t1 ≺ now (c) DRS/SBN (sequence notation) male.n.02 Name "Tom” time.n.08 TPR now rude.a.01 AttributeOf -2 Time -1

The remaining paper is organized as follows: Section 2

provides an overview of DRS. Section 3 details semantic DA for Italian with a focus on named entities, lexical, and grammatical data transformation techniques. Section 4 presents our experimental implementation, implications of our results and findings, and their broader impact on the field. Finally, Section 5 concludes the paper, addresses certain limitations, and outlines directions for future research.

2. Background

source scarcity in NLP [ 8 ]. For semantic tasks involving DRS, DA presents unique challenges due to the need to In this Section, we provide an overview of the formal preserve semantic equivalence while introducing linguis- definition of DRS. tic variety. DRS is a formal semantic representation, that captures

In the context of Italian semantic processing, tradi- the essential meaning of text, equivalent to first-order tional augmentation techniques such as random word logic. DRS is capable of representing a broad spectrum of insertion, deletion, substitutions or back-translation have linguistic phenomena, including anaphora, presupposilimited applicability due to the scarcity of Italian-specific tions, and temporal expressions [ 7 ]. What sets DRS apart semantic resources [9]. This necessitates innovative ap- from other meaning representations, such as Abstract proaches that can leverage resources from high-resource Meaning Representation (AMR) [2], is its proficiency languages while maintaining the integrity of Italian lin- in handling negation and quantification, as well as its guistic structures. language-independent nature. Furthermore, DRS can ef

Given the challenges outlined, this study aims to de- fectively represent meaning across multiple sentences in velop a novel cross-lingual DA technique for Italian, a discourse. specifically tailored for DRS-based semantic parsing and Initially, DRS utilized box notation to provide scope to generation tasks. While word substitution techniques meaning representation (see Figure 1(a)). This notation are established in DA literature, our approach introduces incorporates (e.g. x1) and conditions (e.g. person, Time), an innovative cross-lingual framework that leverages the with concepts anchored using WordNet synsets and thelanguage-neutral nature of DRS. The method uniquely matic roles derived from VerbNet. Operators (e.g. =) are bridges the resource gap between high-resource and low- employed to establish comparative relationships between resource languages by temporarily transforming Italian entities. Conditions can also embody complex structures examples into English, enabling access to rich lexical to express logical (e.g. NEGATION, ¬) or rhetorical relaresources like WordNet, before converting back to Ital- tionships among various condition sets. To address the ian. This cross-lingual approach leverages the univer- challenges posed by the complexity of box notation in sal semantic representations of the DRS to enable more neural parser development, Clause Notation was introadvanced data transformation approaches than Italian duced. This method streamlines DRS by reorganizing the resources alone would allow, which is particularly advan- structure and placing variables before discourse referents tageous given the limited availability of Italian-specific and conditions (see Figure 1(b)). semantic datasets (see Table 1 for Italian examples). Further simplification led to the development of SeThis paper makes the following key contributions: quence Box Notation (SBN), a variable-free format de1. A novel cross-lingual augmentation methodol- signed to be more compatible with neural sequence-toogy that leverages English WordNet to enhance sequence transformer architectures [ 7 ]. SBN utilizes inItalian semantic datasets. dices to form connections between concepts, with the2. Empirical evidence demonstrating the efective- matic roles indicating the nature of these connections ness of this augmentation technique in improv- (see Figure 1(c)). This notation can also be interpreted in

The data-intensive nature of neural networks presents

a significant challenge for low-resource languages like Italian, where available data is limited. This challenge is further compounded when dealing with logical semantic representations such as DRS-Text pairs, which follow specific patterns. In DRS, concepts are represented as a combination of lemma, part of speech, and WordNet sense numbers. The part of speech component includes adjectives, adverbs, common nouns, and verbs with lexical entities, followed by other logical representations (e.g., “idea.n.01”).

Our augmentation methodology addresses the scarcity of Italian lexical resources by utilizing a cross-lingual approach that takes advantage of the language-neutral structure of DRS. The process (i) begins with translating the Italian text into English while keeping the original DRS unchanged; (ii) allowing us to apply a variety of augmentation techniques including named-entity, lexical, and grammatical augmentations—made possible through access to English WordNet—on English-aligned examples; (iii) after augmentation, the English examples are translated back into Italian, ensuring that the semantic relationships from the DRS are preserved. This strategy not only generates semantically rich and contextually relevant data but also overcomes the limitations of Italianspecific resources by augmenting English-aligned examples and transforming them into Italian-aligned examples (see Figure 2 and Table 4 in Appendix), maintaining semantic accuracy through DRS’s formal representations.

3.1. Named Entities Augmentation

Our initial augmentation approach focused on proper noun (PN) augmentation, also referred to as Named Entities (NE) Augmentation. This method targets the transformation of specific named entities, particularly person names (PER, both male and female) and geographical entities (GPE) such as city, state, country, and island names. These entities are explicitly represented in the DRS through predicates (e.g., “male.n.02” for person names). We employed a rule-based approach to extract NEs from both the DRS and the text. Our NE augmentation strategy involves replacing existing entities with those outside the context of the dataset. This approach aims to evaluate the role of external lexical information in semantic processing.

To maintain semantic integrity, we ensure that NEs

3.2. Lexical Entities Augmentation Our lexical augmentation strategy focuses on four spe

cific categories: common nouns, adjectives, adverbs, and verbs. We utilize WordNet synsets to group these entities, ensuring that transformations maintain the contextual sense and meaning of the sentences.

Common Noun Augmentation: CN can significantly alter sentence meaning, making their augmentation challenging. We employ a rule-based approach to extract common nouns from the Sequence Box Notation (SBN) and use NLTK’s “WordNetLemmatizer” for the corresponding text. The augmentation process involves replacing nouns with their hyponyms from WordNet, which allows for more specific substitutions while preserving contextual meaning.

Verb Augmentation: Verbs play a crucial role in sentence context, making their augmentation complex. We use WordNet-based troponyms to replace verbs with more specific, contextually similar alternatives. This approach helps maintain semantic coherence while introducing lexical variety.

Adjective Augmentation: Adjectives, as descriptive attributes of nouns, are augmented using WordNet-based antonyms. This method generates new, contextually similar examples. We manually inspect the augmented data to ensure the semantic relevance and correctness of adjective substitutions.

Adverb Augmentation: For adverbs, we employ a WordNet-based synonym replacement approach. This method aims to generate similar data examples while preserving contextual relevance. As with other categories, we manually verify the semantic correctness of the newly generated examples. Throughout the augmentation process for all lexical categories, we maintain consistency between the SBN logical representations and the corresponding text. This ensures that the augmented data remains coherent and semantically valid across both the formal representation and natural language formats.

3.3. Grammatical Augmentation

This approach primarily focuses on transforming morpho-syntactic relations within sentences, with a particular emphasis on tense modifications. This method same data split for training, development, and test sets involves non-lexical substitutions that alter the tempo- [ 10 ]. Each data example consists of a pair: a DRS meaning ral context of events without introducing external vo- representation and its corresponding textual form. cabulary. Our strategy encompasses a wide range of grammatical transformations, including shifts between Table 1 present, past, and future tenses, as well as changes in Dataset split along with statistic numbers for multi-lingual voice (active to passive and vice versa), mood (e.g., im- baselines. Note: T_Gold = Train Gold; T_Silver = Train Silver perative), negation, number (singular to plural), subject- Langs T_Gold Dev Test T_Silver object relationships, aspect (progressive and perfect), and Italian 745 555 555 4,316 other grammatical features such as infinitive forms, first- German 1,206 900 900 6,862 person perspective, and perfect participles. Dutch 586 435 435 1,646

To implement these transformations, we employ a dual English 9,057 1,132 1,132 143,731 approach: for the Sequence Box Notation (SBN), we use a rule-based system to replace logical entities (e.g., chang- Categorization of Augmented Data: To facilitate a ing “EQU” to “TPR” or “TSU” for tense shifts), while for comprehensive analysis of our augmentation strategies, the corresponding natural language text, we utilize the we classify the augmented dataset into various categories tenseflow API 1. This comprehensive grammatical aug- based on named entities, lexical, and grammatical transmentation technique allows us to significantly expand formations. Our experimental approach is structured into our dataset with grammatically diverse versions of exist- three main categories: (i) baseline experiments without ing sentences, maintaining core semantic content while augmentation; (ii) individual augmentation — applying introducing new syntactic variety. Such diversity is es- one augmentation technique at a time; and (iii) comsential for training robust NLP models, particularly for pound augmentation — concatenating all augmentation tasks involving temporal reasoning and varied syntactic approaches applied to the Italian semantic corpus. Table 2 structures. provides detailed information on the types of augmenta

While our augmentation strategies efectively expand tion, dataset sizes, and the number of training examples the dataset nine times, we acknowledge specific chal- for both individual and compound augmentation stratelenges in preserving semantic integrity during transfor- gies employed in our experiments. mations. For named entities, semantic preservation is straightforward as we maintain entity types. However, Table 2 tense transformations present more complexity due to Impact on the size of Italian dataset examples without augItalian’s rich verbal morphology. For instance, the Ital- mentation and with individual and compound augmentation. ian imperfetto tense (“cantava”–was singing) can map to Note: w/o = without; Aug = Augmentation; Ex. = Examples; multiple English past tense forms, requiring careful han- G = Gold; S = Silver; G-S = Gold-Silver; CN = Common Noun; dling to maintain the original temporal relations in the NE = Named Entities; Adj. = Adjectives; Adv = Adverbs; Comp DRS. Additionally, Italian’s pro-drop nature and flexible = Compound word order can complicate the preservation of argument structure when performing verbal augmentations.

4. Experimental Implementation Our experimental setup utilizes the Italian, German,

Dutch, and English versions of logic-text pairs from the Parallel Meaning Bank (PMB) release 5.0.02 [ 10 ] (statistical numbers for multilingual baselines are listed in Table 1). These datasets are categorized into three annotation levels: Gold (fully manually annotated), Silver (partially manually annotated), and Copper (machinetranslated version of English data examples without any annotation). For Italian meaning representation, we maintain this annotation distinction. We adhere to the

1https://github.com/bendichter/tenseflow

2The PMB is developed at the University of Groningen as part of the NWO-VICI project “Lost in Translation – Found in Meaning” (Project number 277-89-003), led by Johan Bos.

Training Type w/o Aug NE Aug CN Aug Adj Aug Adv Aug Verb Aug Tense Aug Comp Aug Dev Test

Neural Architecture Our approach to semantic pars

ing and generation primarily involves fine-tuning the byT5 model [11], a multilingual variant of the T5 transformer. We chose byT5 for several compelling reasons: (i) its multilingual nature enhances cross-language and cross-task generalization; (ii) its byte-level tokenization

Size x1 # G Ex.

745 x2 x2 x2 x2 x2 x4 x9 – – strategy aids in understanding complex language pat- racy and the linguistic quality of our model output across terns and semantic information; (iii) it demonstrates supe- parsing and generation tasks. rior performance in spelling and pronunciation-sensitive Results and Analysis The experimental results retasks due to its resilience to noisy data; (iv) and as a ported in Table 3 demonstrate the eficacy of diverse DA token-free model, it operates directly on raw UTF-8 data. strategies in enhancing semantic parsing and text genImportantly, byT5 has shown state-of-the-art results on eration tasks for Italian DRS. We used diferent variants multilingual NLP benchmarks [ 11, 12, 13 ]. We also con- of T5 (byT5 and IT5) models and evaluated performance ducted experiments with T5 specialized on ITalian (IT5) on the PMB-5.0.0 dataset, utilizing SMATCH F1 for pars[14], a model that had demonstrated promising results ing and BLEU, METEOR, COMET, chrF, and BERT-Score in Italian language understanding and generation across metrics for generation tasks. various benchmarks. In the multilingual baseline comparisons, Italian

Our fine-tuning strategy involves two stages: initial (76.10% SMATCH F1 for parsing) exhibits superior perforpre-fine-tuning with gold and silver (for exp.1–12), and mance to Dutch (42.77%) and comparable results to Gergold, silver, and copper (for exp.13–20) data for 5 epochs man (73.00%), while expectedly trailing English (91.42%). to provide foundational DRS knowledge, followed by fine- For generation, Italian achieves baseline scores of 37.79 tuning on only gold data—without augmentation—with BLEU, 30.83 METEOR, 81.66 COMET, 54.84 chrF, and an early stopping mechanism [ 15 ]. The hyperparameter 88.86 BERT-Score, positioning it better than Dutch and setting used in our experimentation is listed in Table 5. German in all metrics.

Evaluation Methods For evaluation, we employ dis- Individual augmentation strategies uniformly yield tinct methods for semantic parsing and natural language improvements over the baseline Italian model. For parsgeneration tasks. In parsing evaluation, we first trans- ing tasks, tense augmentation demonstrates the highform DRS into Penman notation [16], then use SMATCH est eficacy among singular strategies, achieving 84.13% [17] to calculate the overlap of triples between system SMATCH F1 (exp. 10). In generation tasks, tense augmenoutput and the gold standard, assessing the output us- tation emerges as the most efective individual strategy, ing F-Score to balance precision and recall [ 18 ]. For attaining scores of 44.49 BLEU, 33.46 METEOR, 85.14 generation evaluation, we use a combination of difer- COMET, 60.05 chrF, and 90.26 BERT-Score (exp. 10). ent automatic metric evaluations including (i) n-gram- These enhancements indicate that each augmentation based measures like BLEU [19], METEOR [20], and chrF type contributes uniquely to the semantic understanding [ 21 ]; (ii) neural model-based COMET score [22]; and and generative capabilities of the neural model. (iii) the pre-trained model-based BERT-Score (“bert-base- The efectiveness of tense augmentation correlates multilingual-cased” model) [23]. These comprehensive with the significant presence of temporal relations and evaluations allow us to assess both the technical accu- structural simplicity in the test set’s DRSs. Our analysis reveals that approximately 94.05% of the test set contains dataset, outnumber our G+S compound augmentation by active voice examples, while passive voice examples ac- approximately 2:1, somewhat diminishing the observable count for only 5.95%, making tense augmentation par- impact of augmentation strategies. Furthermore, in our ticularly valuable for improving model performance in experiments with G+S+C (exp. 13–20), we have used the sentence structures. Additionally, 98.20% of the test set Copper version without any augmentation—just to have consists of simple sentences, which further emphasizes a fair comparison with literature reference (see experithe importance of augmentations that can enhance lexical mental results of [24] in in Table 3). These experimental diversity without overcomplicating sentence complex- outcomes provide strong evidence that DA can signifiity. We observed the following distribution of sentence cantly enhance the performance of semantic parsing and types in our test set: declarative (87.57%), exclamatory text generation models for Italian. (2.52%), and interrogative (9.78%), reinforcing the need for augmentations that efectively handle these dominant structures. 5. Conclusion

The compound augmentation approach, which integrates all augmentation strategies, produces the optimal This study has successfully developed and evaluated a results for the Gold+Silver (G+S) dataset. This compre- novel cross-lingual DA technique for Italian, specifically hensive strategy achieves 85.98% SMATCH F1 for parsing tailored for DRS-based semantic parsing and generation and notable improvements across all generation metrics tasks. Our research has made significant improvements (45.12 BLEU, 34.54 METEOR, 85.66 COMET, 61.66 chrF, in addressing the challenges faced by low-resource and 90.56 BERT-Score), underscoring the synergistic ben- languages in advanced NLP tasks. The proposed efits of combining diverse augmentation techniques (exp. augmentation methodology, leveraging English WordNet 11). The performance of IT5 proved inadequate when to enhance Italian semantic datasets, has demonstrated applied to formal meaning representations i.e., DRS. The remarkable efectiveness. Empirical evidence shows model exhibited suboptimal results in both semantic pars- substantial improvements in performance scores for both ing and text generation tasks subsequent to fine-tuning DRS parsing and generation tasks in Italian. Notably, on the compound augmentation dataset. The suboptimal our approach achieved a 90.56% SMATCH F1 score performance of IT5 can be attributed to its pre-training for parsing and significant enhancements across all focus on general Italian language tasks rather than formal generation metrics (BLEU: 57.48, METEOR: 40.95, meaning representations like DRS. This limitation high- COMET: 90.97, chrF: 70.88, BERT-Score: 92.97) on the lights the challenges of adapting general-purpose lan- G+S+C dataset, surpassing both baseline models and guage models to specialized semantic processing tasks. previous state-of-the-art results. Our detailed analysis

Furthermore, comparisons with extant literature ([24] reveals that data augmentation positively afects the in Table 3) reveal the superior performance of our handling of Italian-specific linguistic features in semantic proposed approach. The referenced study reports processing. The improvements observed across various 87.20% SMATCH F1 for parsing and 53.20 BLEU, 38.50 augmentation strategies indicate enhanced capability in METEOR, and 87.50 COMET for generation on the managing syntactic flexibility and grammatical nuances Gold+Silver+Copper (G+S+C) dataset. In contrast, our in Italian. This suggests a successful transfer of semantic Italian model (exp. 13—G+S+C baseline) achieves 89.22% knowledge through the lens of Italian DRS. SMATCH F1, 56.46 BLEU, 40.48 METEOR, 90.02 COMET, 70.38 chrF, and 92.72 BERT-Score on the same dataset, Limitations: representing significant advancements across all metrics. Despite our results approach the performance metrics

The most notable results are observed in the G+S+C of English—a rich resource language, there remains a dataset experiments. Verb Augmentation (exp. 18) gap that future research could address. For example, the achieves the highest parsing score of 90.56% SMATCH F1, original sentence “Tom è piuttosto scarso a tennis.” (“Tom while Tense Augmentation (exp. 19) leads in generation is rather poor at tennis.”) becomes “Bob era piuttosto with scores of 57.48 BLEU, 40.95 METEOR, 90.97 COMET, ricco con i single.” (“Bob was sort of rich at singles.”) 70.88 chrF, and 92.97 BERT-Score. These results not only While this method introduces linguistic diversity, it surpass previous benchmarks but also approach the per- can result in less coherent sentences in some cases, as formance metrics of English, a high-resource language, seen in this example. Such limitations are common despite comparatively limited lexical resources for Ital- with cross-lingual augmentation strategies through ian. The similar performance between the baseline Italian back-and-forth language translations, which focus model (exp. 13) and compound augmentation (exp. 20) on on lexical variation over syntactic coherence. Future G+S+C is primarily attributable to the substantial volume refinement, such as filtering improbable substitutions of Copper data (92, 394 examples). These Copper exam- or adding human validation, could help ensure more ples, which are Italian translations of the English Bronze consistent logicality in cross-lingual semantic tasks.

Acknowledgments We thank “High-Performance Computing for Artificial Intelligence (HPC4AI) at the University of Turin” for providing GPU support [25]. A. Data Transformation through Augmentation

ical augmentation is when “A girl is playing the flute” is changed to one of three tenses: “A girl was playing the lfute”, “A girl will be playing the flute”, or “A girl has been The SBN is graphically shown in Figure 1 both with and playing the flute”. These illustrations show how enhancwithout augmentation (a and b), highlighting the distinc- ing various phrase constituents can produce diverse and tions between proper nouns, common nouns, adjectives, richer datasets, supporting the creation of strong neural adverbs, and verbal tense augmentations. With this aug- models. mentation, the original sentence “Tom è piuttosto scarso a tennis.” or “Tom is rather poor at tennis.” becomes B. Statistical distribution of “Bob era piuttosto ricco con i single.” or “Bob was sort of rich at singles.”. In Figure 1, augmented logical no- examples tions are highlighted conceptually. We used the Parallel Meaning Bank (PMB) dataset for this investigation, using both its gold (completely manually annotated) and silver (partially manually annotated) standard versions, and split it according to conventional methods for training, development, and testing. (a) DRS (sequence box notation) without augmentation: male.n.02 Name "Tom" time.n.08 EQU now rather.r.02 poor.a.04 AttributeOf -3 Time -2 Degree -1 Theme +1 tennis.n.01 % Tom [ 0-3 ] % is [ 4-6 ] % rather [ 7-13 ] % poor at [ 14-21 ] % tennis. [ 22-29 ] (b) DRS (sequence box notation) with augmentation: male.n.02 Name ”Bob" time.n.08 TPR now sort_of.r.01 rich.a.01 AttributeOf -3 Time -2 Degree -1 Theme +1 singles.n.01 % Bob [ 0-3 ] % was [ 4-7 ] % sort of [ 8-15 ] % rich at [ 16-23 ] % singles. [ 24-32 ]

Table 1 reports the number of training, development, and testing examples in each language as well as the statistical distribution of the dataset used for multilingual baselines.

Train Gold (T_Gold), Train Silver (T_Silver), Development (Dev), and Test sets comprise the dataset. There are 4,316 T_Silver, 555 Dev, 555 Test, and 745 T_Gold examples for Italian. There are 6,862 T_Silver, 900 Dev, 900 Test, and 1,206 T_Gold examples in German. There are 1,646 T_Silver, 435 Dev, 435 Test, and 586 T_Gold examples in Dutch. There are 143,731 T_Silver, 1,132 Dev, 1,132 Test, and 9,057 T_Gold examples for English, the language with the largest representation. As can be seen from this distribution, the English corpus is substantially larger than the other languages, ofering a solid dataset for training and evaluation. This diversity in dataset size across languages highlights the varying amounts of linguistic data available for training multilingual models.

C. Impact of Augmentation on Dataset Size

Table 2 compares the number of instances with and without augmentation to those with individual and compound augmentations to show how diferent augmen

In order to provide transformed instances for neural tation methods afect the size of the dataset. Without any semantic processing and text generation, named entities, augmentation, the original dataset had 5061 gold-silver lexical, and grammatical DA approaches were applied samples altogether, 4316 silver examples, and 745 gold to the original sentences as shown in Table 4. It demon- examples. Applying individual augmentations, including strates how varying a sentence’s constituent parts can Named Entities, Common Noun, Adjective, Adverb, and improve dataset variety. When it comes to named enti- Verb augmentations, twice the size of the dataset; for ties, the sentence “Tom asked Mary if she had been to every augmentation type, there are 1490 gold, 8632 silver, Boston” becomes “Bob asked Sarah if she had been to and 10122 gold-silver examples. Even more so, tense augCambridge”, demonstrating how proper nouns are substi- mentation quadruples the amount of the dataset to 2980 tuted. “Tom played with his dog” becomes “Tom played gold, 17264 silver, and 20244 gold-silver examples. Comwith his puppy” when it comes to common nouns, il- pound augmentation yields the largest gain, ninefolding lustrating synonym replacement with hyponyms. Verb the dataset size to 6705 gold, 38844 silver, and 45549 goldaugmentation is demonstrated by changing the verb from silver examples. Compound augmentation incorporates “Tom thinks I stole the money” to “Tom philosophizes I numerous augmentation strategies. The number of exstole the money”, changing the meaning of the phrase. To amples in both the development and test sets stays at demonstrate adjective and adverb augmentations, lexical 555. This notable augmentation of the dataset size highentities are changed from “ill” to “well” and “deeply” to lights the potential for more comprehensive and diverse “profoundly”, respectively. The last example of grammatTom ha chiesto a Mary se fosse stata a Boston. “Tom asked Mary if she had been to Boston.”

Bob ha chiesto a Sarah se fosse stata a Cambridge.

“Bob asked Sarah if she had been to Cambridge.” Tom ha giocato con il suo cane. “Tom played with his dog.” Tom pensa che io abbia rubato i soldi. “Tom thinks I stole the money.” Lui è malato. “He is ill.” Una ragazza suona il flauto. “A girl is playing the flute.” La ragazza è profondamente legata a sua zia. “The girl is deeply attached to her aunt.”

La ragazza è sinceramente legata a sua zia. “The girl is sincerely attached to her aunt.” Tom ha giocato con il suo cucciolo. “Tom played with his puppy.” Tom filosofeggia che ho rubato i soldi. “Tom philosophizes I stole the money.” Lui è bene. “He is well.” Una ragazza suonava il flauto. “A girl was playing the flute.” Una ragazza suonerà il flauto. “A girl will be playing the flute.” Una ragazza ha suonato il flauto. “A girl has been playing the flute.”

D. Hyperparameters For Experimental Implementation

In Table 5, we report a list of the main hyperparameters used in our experimental implementation. We have used the same experimental setting for all of our experiments reported in Table 3. We used the AdamW optimizer with a batch size of 32, a learning rate of 1e-4, and a maximum sequence length of 512 tokens. Throughout our experiments, we used GeGLU for activation functions. Two rounds of fine-tuning were carried out: the first stage lasted for five epochs, and the second stage used early stopping criteria to dynamically decide the ideal number of epochs depending on metrics related to the performance of the model. These hyperparameters were chosen with attention to guarantee reliable operation and eficient byT5 model customization to our particular tasks and datasets.

tics: ACL-IJCNLP 2021 , Association for Compu-

tational Linguistics , Online, 2021 , pp. 968 - 988 .

URL: https://aclanthology.org/ 2021 .findings-acl. 84 .

doi:10 .18653/v1/ 2021 .findings-acl. 84 . [10]

Wang ,

Lai ,

Nissim ,

Bos , Pre-trained

2023, Association for Computational Linguistics,

Toronto , Canada, 2023 , pp. 5586 - 5600 . URL: https:

//aclanthology.org/ 2023 .findings-acl. 345 . doi: 10.

18653 /v1/ 2023 .findings-acl. 345 . [1]

Basile ,

Bos ,

Evang , N. Venhuizen, De-

and Evaluation (LREC'12) , European Language

Resources

Association (ELRA), Istanbul, Turkey, [11]

Xue ,

Barua ,

Constant ,

Al-Rfou , S. Narang,

2012 , pp. 3196 - 3200 . URL: http://www.lrec-conf. M. Kale , A. Roberts , C. Rafel, Byt5: Towards

org/proceedings/lrec2012/pdf/534_Paper. pdf. a token-free future with pre-trained byte-to-

byte [2] L.

Banarescu , C.

Bonial , S.

Cai , M. Georgescu, models, Transactions of the Association for Com-

Grifitt ,

Hermjakob ,

Knight , P. Koehn, putational Linguistics 10 ( 2022 ) 291 - 306 .

Palmer ,

Schneider , Abstract meaning rep- [12]

Stankevičius ,

Lukoševičius , J. Kapočiu¯tė-

7th linguistic annotation workshop and interoper- diacritics and typos with a byt5 transformer model,

ability with discourse , 2013 , pp. 178 - 186 . Applied Sciences 12 ( 2022 ) 2636 . [3]

M. S.

Amin ,

Anselma ,

Mazzei , Exploring data [13]

Belouadi , S. Eger, ByGPT5: End-to-end

18th Conference of the European Chapter of the Graber , N. Okazaki (Eds.), Proceedings of the

Association for Computational Linguistics (Volume 61st Annual Meeting of the Association for

1: Long Papers), Association for Computational Computational Linguistics (Volume 1: Long Pa-

Linguistics , St. Julian's, Malta , 2024 , pp. 2164 - 2178 . pers), Association for Computational Linguis-

URL: https://aclanthology.org/ 2024 . eacl-long . 132 . tics, Toronto, Canada, 2023 , pp. 7364 - 7381 . URL: [4]

Abzianidze , R. van Noord,

Wang ,

Bos , The https://aclanthology.org/ 2023 . acl-long . 406 . doi:10.

parallel meaning bank: A framework for semanti- 18653 /v1/ 2023 . acl-long . 406 .

cally annotating multiple languages , Applied math- [14]

Sarti , M.

Nissim, IT5: Text-to-text pretraining for

ematics and informatics 25 ( 2020 ) 45 - 60 . Italian language understanding and generation , in: [5]

M. S.

Amin ,

Mazzei ,

Anselma , et al., Towards

Calzolari , M.- Y.

Kan , V.

Hoste , A.

Lenci , S. Sakti,

data augmentation for drs-to-text generation , in: N. Xue (Eds.), Proceedings of the 2024 Joint In-

CEUR WORKSHOP PROCEEDINGS , volume 3287 , ternational Conference on Computational Linguis-

CEUR-WS , 2022 , pp. 141 - 152 . tics, Language Resources and Evaluation (LREC [6]

Li ,

Wen ,

Qu ,

Bu ,

Xue , Annotating the COLING 2024 ), ELRA and ICCL , Torino , Italia, 2024 ,

little prince with chinese amrs , in: Proceedings of pp. 9422 - 9433 . URL: https://aclanthology.org/ 2024 .

the 10th Linguistic Annotation Workshop held in lrec-main.823.

Conjunction with ACL 2016 (LAW-X

2016 ), 2016 , [15] R. van Noord ,

Toral ,

Bos , Character-

pp. 7 - 15 . level representations improve DRS-based seman[7 ]

Bos , The sequence notation: Catching complex tic parsing even in the age of BERT , in:

meanings in simple graphs , in: Proceedings of the Proceedings of the 2020 Conference on Empir-

15th International Conference on Computational ical Methods in Natural Language Processing

Semantics (IWCS

2023 ), Nancy, France, 2023 , pp. (EMNLP) , Association for Computational Linguis-

1- 14 . tics, Online, 2020 , pp. 4587 - 4603 . URL: https: [8]

Shorten ,

T. M.

Khoshgoftaar , A survey on image //aclanthology.org/ 2020 .emnlp-main. 371 . doi:10.

data augmentation for deep learning , Journal of big 18653/v1/2020.emnlp-main.371.

data 6 ( 2019 ) 1 - 48 . [16]

R. T.

Kasper , A flexible interface for linking applica [9]

S. Y.

Feng ,

Gangal ,

Wei ,

Chandar , S. Vosoughi, tions to Penman's sentence generator , in: Speech

21- 23 , 1989 , 1989 . URL: https://aclanthology.org/ challenging benchmarks, in: C. Bonial , J. Bonn,

H89 - 1022 . J. D. Hwang (Eds.), Proceedings of the Fifth Inter [17]

Cai ,

Knight , Smatch: an evaluation metric national Workshop on Designing Meaning Rep-

for semantic feature structures , in: H. Schuetze, resentations @ LREC-COLING 2024 , ELRA and

Fung , M. Poesio (Eds.), Proceedings of the 51st ICCL, Torino, Italia , 2024 , pp. 162 - 175 . URL: https:

Annual Meeting of the Association for Computa- //aclanthology.org/ 2024 .dmr- 1 . 17 .

tional Linguistics (Volume 2 : Short

Papers)

, Associ- [25]

Aldinucci ,

Rabellino ,

Pironti ,

Spiga , P. Vi-

2013 , pp. 748 - 752 . URL: https://aclanthology.org/ P. Margara, I. Drago ,

Marturano , G. Marchetto,

P13 - 2131 . E. Piccolo,

Bagnasco ,

Lusso ,

Vallero , G. At[18]

Poelman , R. van Noord,

Bos , Transparent tardi, A. Barchiesi , A.

Colla , F.

Galeazzi , Hpc4ai:

ing graph transformations , in: Proceedings of the in: Proceedings of the 15th ACM International

29th International Conference on Computational Conference on Computing Frontiers, CF '18 , As-

tational Linguistics , Gyeongju, Republic of Korea, NY, USA, 2018 , p. 279 - 286 . URL: https://doi.org/

2022 , pp. 4186 - 4192 . URL: https://aclanthology.org/ 10.1145/3203217.3205340. doi: 10 .1145/3203217.

2022.coling- 1 . 367 . 3205340. [19]

Papineni ,

Roukos ,

Ward , W.-J. Zhu, Bleu: a

lation, in: Proceedings of the 40th annual meeting

2002 , pp. 311 - 318 . [20]

Banerjee ,

Lavie , Meteor:

An automatic met-

tion , 2005 , pp. 65 - 72 . [21]

Popović , chrF: character n-gram F-score for au-

Lisbon , Portugal, 2015 , pp. 392 - 395 . URL: https:

//aclanthology.org/W15-3049. doi: 10 .18653/v1/

W15 - 3049 . [22]

Rei ,

Stewart ,

A. C.

Farinha , A . Lavie,

(Eds.), Proceedings of the 2020 Conference on

guistics , Online, 2020 , pp. 2685 - 2702 . URL: https:

//aclanthology.org/ 2020 .emnlp-main. 213 . doi:10.

18653 /v1/ 2020 .emnlp-main. 213 . [23]

Zhang ,

Kishore ,

Wu ,

K. Q.

Weinberger ,

BERT , in: 8th International Conference on Learning

Representations , ICLR

2020 ,

Addis

Ababa , Ethiopia,

April 26- 30 , 2020 , OpenReview.net, 2020 . URL: https:

//openreview.net/forum?id=SkeHuCVFDr. [24]

Zhang ,

Wang , R. van Noord ,

Bos , Gain-