1. Introduction

1613-0073

with LLMs and T-PAS

Luca Simonetti

simonetti.luca@spes.uniud.it 0 1

Elisabetta Ježek

Guido Vetere

g.vetere@unimarconi.it 0

NLP, T-PAS, Verb Subcategorization, Mistral, CLiC-it

0 Università degli studi Guglielmo Marconi , Via Plinio, 44, 00193 Roma , Italy 1 Università degli studi di Udine , Via Palladio, 8, 33100 Udine , Italy 2 Università di Pavia , Corso Strada Nuova, 65, 27100 Pavia , Italy

This study explores the application of Large Language Models (LLMs) to verb subcategorization in Italian, focusing on the identification and classification of syntactic patterns in sentences. While LLMs have made lexical analysis more implicit, explicit argument structure identification remains crucial in domain-specific contexts. The research leverages T-PAS, a rich lexical resource for Italian verbs, to fine-tune the open multilingual model Mistral 7B using the Iterative Reasoning Preference Optimization (IRPO) technique. This approach aims to enhance the recognition and extraction of verbal patterns from Italian sentences, addressing challenges in resource quality, coverage, and frame extraction methods. By combining curated lexical-semantic resources with neural language models, this work contributes to improving verb subcategorization tasks, particularly for the Italian language, and demonstrates the potential of LLMs in refining linguistic analysis tools.

1. Introduction

sifying the syntactic patterns (or frames) taken by verbs in sentences. These patterns encode the possible combinations of arguments (such as subjects, objects, and ber and type of arguments as well as their syntactic and semantic roles. Verb subcategorization is often used in Natural Language Understanding (NLU) to provide the main interpretation backbone. Although recent developments brought about by Large Language Models (LLM) make lexical analysis somewhat implicit, there are cases in which the identification of the argument structure of the verb is required, especially those where extensive domain-specific knowledge is required.

FrameNet[2] and PropBank[3] have been largely employed for several NLP tasks in the past decades, including accomplishing verbal framing for the English language. VerbNet, for example, has been used to improve semantic role labeling, verb sense disambiguation and ontology mapping ([4], [5]); its new enhanced semantic representations have also recently been used for entity state tracking [6]. The main problems addressed in these experiences concern the quality and coverage of such resources and the methods used to extract frames from sentences. tences using a technique called IRPO [9].

The paper is organized as follows: in Section 2 we introduce the T-PAS resource for Italian verbs, which we used in our experiments. Section 3 discusses in detail the methodology we applied and references closely related works, whereas Section 4 illustrates the experimental setup. We complete the paper by discussing our results making suggestions for future research in Section 6.

2. The T-PAS resource T-PAS [7] is an inventory of argument structures and

senses for Italian verbs.1 In T-PAS, for each verb meaning, a specific Typed Predicate-Argument Structure (T

PAS, informally called pattern) is provided, in which

arguments are defined in terms of semantic classes notated between square brackets, called semantic types.

An example of a pattern for the verb guidare ‘drive’ in

are acquired from corpora following the Corpus Pattern

1The T-PAS project was developed at the Department of Humanities

of the University of Pavia, with the technical support of Lexical Computing Ltd. The resource can be freely accessed and down

Semantic lexical resources such as VerbNet[ 1 ], in Section 5 and by drawing some conclusions as well as CEUR

3. Background and Methodology

Analysis (CPA) methodology [ 10 ]. Currently, T-PAS con- 4. Experimental Setup tains 1160 analyzed verbs, 5529 patterns and ca. 200,000 annotated corpus instances. Semantic types (Human, The experimental setup consists of two main stages: Event, Location, Food, Vehicle, etc.) are obtained from dataset creation and fine-tuning of the base model Mismanual clustering of the lexical items found in the ar- tral 7B [8], as per the paper Iterative Reasoning with gument positions in the corpus. These types look very Preference Optimization (IRPO) [9]. Our implementamuch like ontological categories; however, instead of be- tion involves a single iteration, comprising both dataset ing stipulated, they are induced from corpus data and generation and the actual fine-tuning. Additionally, we reflect how humans talk about events and states of enti- conduct a basic fine-tuning process where we train Misties through language. The system of semantic types in tral to directly complete prompts with the correct answer T-PAS currently contains 180 semantic types. The list is in a specified format: organized in a hierarchy to identify the appropriate level of specificity of the selectional properties of individual verbs.

La risposta corretta è 2...

We refer to this as the SFT (Supervised Fine-Tuning)

model later in the discussion. This approach allows us to compare the efectiveness of IRPO against a more straightforward fine-tuning method. We now provide more details about the two stages of our experimental setup.

The extraction of verbal frames consists of applying frame-like structures to sentences. Once a suitable frame 4.1. Dataset Creation is identified, each element of the structure is mapped to an element of the sentence. To start our experiment, The first stage, dataset creation, involves the following we attempted to extract the frame directly from the neu- steps: ral model (LLM), relying on the fact that LLMs are pretrained on large amount of texts and that their language modeling capabilities have reached unprecedented levels of maturity in the last three years. Although promising, this approach proved insuficient, since the model struggled with the correct subcategorization of the verb before extracting the appropriate frame. In a way, it appeared that the selection was compromised by the nondeterministic nature of LLM inference. Consequently, we split the task into two separate phases: 1) frame identification, i.e. T-PAS subcategorization, and 2) frame extraction, i.e. frame-sentence mapping.

We found that the baseline model performed poorly on the subcategorization task, achieving only 59.8% accuracy. For this reason, we decided to fine-tune the baseline model on the task of identifying verbal frames, which proved to be key for the subsequent task of extracting these frames. This approach was inspired by [11], where the authors set up a framework for verb sense disambiguation by providing the model with the frame that describes the sense the verb can take. This allows us to treat this task as a linguistic and semantic task rather than a simple categorization task. The idea is to provide the model with a prompt that includes the frames, based on the hypothesis that supplying the model with as much information as possible might be beneficial. This paper will only cover the subcategorization task. To do this, we created a fine-tune dataset based on the T-PAS resource, containing both the necessary information and a large number of examples to build upon.

The first phase involves gathering sentences and structuring prompts. The prompts consist of questions to the model, where we ask which of the listed senses is the correct one for the sentence we provide. We use a subset of the T-PAS dataset, comprising approximately 5,324 examples out of the total 26,652 elements (around 19.9% of the full dataset). The sentences are randomly picked from this subset, using at most two examples for each verb to avoid any bias towards one specific pattern or predicate. This approach ensures a diverse representation while maintaining a manageable dataset size for our experiments. The possible senses the verb can acquire are constructed from the T-PAS dataset. We maintain the original order of the senses as listed in T-PAS to facilitate both the dataset generation and the evaluation processes.

Our preliminary tests indicated that this decision doesn’t significantly afect performance. We provide an example of a prompt in the Appendix to illustrate the structure and content of our queries to the model. After building the prompts, we query the Mistral 7B model API 30 times with temperature set to the highest value to let the model

1. We collect 30 responses from the base Mistral

Model with a high temperature for each sentence. 2. Using these responses, we build a dataset containing ( , , , , ) tuples, where: • is the prompt used in step 1 to generate

the responses • , is the winning response (i.e., the one

that leads to a correct answer) • , is the losing response (i.e., the wrong one) 4.2. Fine-Tune explore as much as possible its internal latent space to 'up_proj', 'q_proj', 'o_proj', provide a response. This second phase results in 30 re- 'down_proj'] sponses per prompt. We then compile a dataset of both correct and incorrect responses. The prompt instructs the We use a single L4 GPU chip with 24GB VRAM available. model to answer in a specific format. Even if a response As such, we can only have a batch size of 1 and use is semantically correct but doesn’t adhere to the required gradient accumulation of 2 to simulate a batch size of format, we classify it as incorrect and include it with the 2. We set max steps to 4,500 steps but actually stop the wrong responses. This approach aligns with the method- fine-tuning after 3,000 steps since there seems to be a ology of the base paper and serves multiple purposes in plateau in the performances of the reward accuracy (see our tuning process. By enforcing a specific format, we’re Figure 1). not only training the model to provide correct answers but also to follow instructions precisely which provides 4.3. Diferences from the IRPO paper us with a standardized format that ensures consistency across responses, crucial for large-scale evaluation and The main diferences from the original IRPO setup are as comparison. This phase results in an unbalanced dataset follows: of wrong and right responses for each prompt summing up to 30.

The second step of this pipeline involves transforming the intermediate dataset into the final dataset. Following the approach of the IRPO authors, we combine chosen and rejected responses to create a balanced dataset, ensuring that each response is processed at least once during ifne-tuning. For example, if the number of elements in one of the two stacks – chosen and rejected – is less than the other, we reuse elements from the stack with fewer items multiple times to achieve balance.

The result is a dataset consisting of 17,863 rows with columns prompt, chosen, rejected, which we make available on Hugginface2. 1. As starting model 0 we use Mistral 7B: in other words a diferent model with 10x less parameters whereas the authors of the original paper use Llama-2 70B, a diferent model with diferent architecture, and possibly diferent dataset used in the pretraining.

2. We apply IRPO to a linguistic task instead of logic

or math reasoning task.

3. We use a subset of verbs for training and ob

serve generalization on diferent verbs during testing. This approach difers from the IRPO authors, who utilize standardized datasets such as

GSM8K, MATH, and ARC-Challenge. While these

datasets allow for direct comparison across diferent models and techniques, they don’t provide the opportunity to assess generalization to unseen problem types in the same way our verb-based approach does.

The loss function we employ in the second stage, the ifne-tuning, consists of two components: one handling the Direct Preference Optimization (DPO) Rewards [12], and another that positively afects the Negative Log Like- 5. Results and Discussion lihood (NLL) of the correct answer. This approach has similar efects to those described by the authors of IRPO Our final results are summarized in Table 1. We observe [9]. that the multilingual baseline model, although includ

Using the dataset built as described in the previous ing Italian, is not suficiently accurate in performing the section we proceed with the fine-tune. We also build the selection task. Furthermore, when the model’s tempercustom loss function as described (but not implemented) ature is increased, it does not remain consistent with a by the authors of IRPO. Our implementation of the repli- specific answer but rather explores multiple response cated loss function will be made publicly available. The options, selecting a diferent choice randomly each time. LoRA [13] configuration is as follows: This could be explained in multiple ways: one is that the model knows it needs to select an answer but doesn’t rank=16, relate to the correct one using a thorough analysis but lora_alpha=16, rather following a pick-one strategy with the explanation lora_dropout=0.05, coming as a consequence. The other possible explanation bias="none", is that the model just tries to give an answer, not actually task_type="CAUSAL_LM", connecting pieces of the given possibilities to the sentarget_modules=[ tence but rather picking random parts of the sentence 'k_proj', 'gate_proj', 'v_proj', where they are more likely to reside for that particular part of the frame – e.g. the subject is usually heading the sentence. Unfortunately these are only speculations parameters. The unchanged performance indicates that our fine-tuning enhanced the model’s capabilities on our specific task without compromising its general language understanding. This result aligns with the versatility of large language models, capable of maintaining proifciency across multiple NLP tasks simultaneously, and suggests potential for developing specialized AI systems without sacrificing broader capabilities.

Another significant result derives from the fact that

the subset of verbs used for fine-tuning difers from the verb subset used for testing. This means that we not only avoid using the same sentences from the training phase but also employ verbs that were not present during training, and yet we obtain performance improvements. This demonstrates some degree of generalization. While a thorough study would be needed to draw more accurate conclusions, we can still ofer some speculations: 1. The model has generalized to understand which thought patterns are rewarded during the DPO (Direct Preference Optimization) phase. It now attempts to reproduce these patterns regardless of the specific sentence or verb used. This can be seen clearly from Figure 1, where the accuracy in rewarding the right choices improves over training steps, a key indicator that the model learns to discern between right and wrong answers. 2. Another speculation is that the logits have changed, and the behaviour is too random or at least too noisy to be taken into consideration alone. This means that the fact that the model performs better could be just associated with random guessing as the baseline models do, but with new logits. and future work might clarify and explain better what happens.

But we also find that after using the IRPO technique

the model modifies its behaviour, improving its accuracy.

In other words model seems to acquire some competence in this task by being fine-tuned with a double signal consisting of the DPO plus the NLL losses being considered. The first signal teaches the model to distinguish We now briefly illustrate the impact of fine-tuning on between the right and the wrong answer. The second the Mistral 7B model’s performance in subcategorization signal pushes further up the correct answer in probability tasks, presenting a case study focusing on the Italian verb space. It is remarkable that the collection of the dataset ”scherzare” (to joke). This example, while not necessarfor the second iteration proved to be quite a hard task ily representative of all improvements, provides at least since the model was performing well enough to give just some insight into the changes in the model’s reasoning a reduced amount of wrong answers, both in an absolute process. The sentence under consideration is: ”Non si – i.e. for a given sentence the model returns 30 correct può scherzare con la realtà virtuale.” (One cannot joke answers – and in a relative – i.e. the number of wrong with virtual reality.) The model was tasked with selecting answers is small: 2,3 – sense. the correct sense from a predefined set of options:

We assessed the model’s performance on basic Com- Human non fa o dice qualcosa sul serio (doesn’t do or say mon Sense [14] tasks to probe the efects of our fine- something seriously) tuning. Interestingly, we found no change in perfor- Human fa ironia su|sopra|con [Abstract Entity] (makes mance across these tasks. This outcome is particularly irony about [Abstract Entity]) noteworthy when we analyze how the diferent outcomes Human ha un comportamento leggero e divertito (has a might have been speculated to have happened. A deterio- light and amused behavior) ration in performance could have suggested catastrophic Human fa sul serio, non sottovaluta, prende sul serio forgetting, a common issue in neural networks where ([Anything]) (takes seriously, doesn’t underesnew learning replaces irremediably the previous knowl- timate [Anything]) edge. However, our use of Low-Rank Adaptation (LoRA) Human si espone imprudentemente a un rischio (imprulikely mitigated this efect by updating only task-specific dently exposes oneself to a risk)

Human prende alla leggera qualcosa (takes something lightly) In conclusion, we can say that small multilingual baseline

Before fine-tuning, the model incorrectly classified models such as Mistral 7B perform poorly on semantic the usage as sense 1, providing the following reasoning: analysis of Italian sentences. We observe that the poor be”Nella frase data, ’scherzare’ non è seguito da un oggetto havior is due to the model’s inability to discern the correct concreto o astratto, quindi non si applicano i sensi 1, 2, 4, answer, either because it lacks the linguistic knowledge, 5 e 6. Il contesto della frase indica che ’scherzare’ non è therefore mostly resorting on random guesses, or because inteso come un comportamento leggero e divertito (senso it follows an incorrect explanation for the answer is about 3), ma piuttosto come non fare o dire qualcosa in maniera to give. However, our research also demonstrates that irrispettosa o non seria (senso 1).” This reasoning demon- the model can be significantly improved using IRPO techstrates a lack of sensitivity to the syntactic structure and niques without afecting the baseline performance on context of the sentence. Also, during the reasoning pro- common sense and reasoning tasks. Notably, we observe cess, option 1 was inconsistently excluded but was later the ability to generalize across predicates, likely due to chosen as the final answer. underlying linguistic skills, though further investigation

After fine-tuning, the model correctly identified sense is needed to fully understand this phenomenon. 4 as the appropriate choice: ”Nella frase data, ’scherzare’ The production of small open language models is non è usato attivo, ma negativo, quindi non si può appli- rapidly evolving, approaching the level of huge close care direttamente i sensi 1, 3, 5 e 6, che si riferiscono a models which were available on the cloud a couple of azioni o comportamenti specifici. Invece, il verbo ’si può’ years ago. At present, Italian monolingual models have implica un senso di non poter fare qualcosa, quindi il room for improvement in terms of performance levels, senso appropriato è quello di sottovalutare o prendere in 3 while multilingual models, e.g. the recently released giro qualcosa, che corrisponde al senso 4.” This reasoning Gemma 2[15], show increasing proficiency in our lanshows multiple enhancements: guage, probably due to transfer learning efects. Our 1. Recognition of the negative construction ”Non si research shows the potential of leveraging such modpuò scherzare” els in combination with high-quality lexical resources 2. Consideration of the phrase ”con la realtà virtuale” to develop a new class of task-specific models for the as crucial context Italian language. These models, while small in scale, are 3. More nuanced interpretation, considering multi- expected to exhibit remarkable proficiency in executing ple senses before making a decision complex analytical tasks, such as those related to verbs.

With this in mind, our future work is aimed, on the one

While this single example cannot be generalized to hand at enriching lexicographic resources and refining the model’s overall performance, it suggests that fine- the ways to obtain training material from them, and on tuning may have enhanced the model’s ability to parse the other hand at continuously evaluating the improvecomplex syntactic structures and integrate contextual ments brought about by the progress of general-purpose information in subcategorization tasks. Further com- open models. prehensive analysis across a wide range of verbs and One promising application is the use of a verbal subconstructions would be necessary to draw broader con- categorization and frame extraction system to extract clusions about the model’s improved capabilities as well content from specialist documents, such as legal [16] or as identifying new means to further enhance accuracy medical texts [17]. Furthermore, the ability to analyze and performance. the complex argument structure of verbs has potential for use in language learning systems [18], e.g. providing Table 1 support for immigrants to learn Italian afordably. Comparison between various fine-tune methods Finally, we made our fine-tuned model publicly available on huggingface4 along with a visual report on

Model Test Accuracy (%) wandb.5

Iterative RPO

Iteration 1 SFT

PST CoT Mistral baseline Zero-shot CoT 75.6 65.6 59.8

3See for instance Hugging Face’s INVALSI Leader

board, https://huggingface.co/spaces/CrispUnimib/INVALSIbenchmark

4https://huggingface.co/theGhoul21/srl-base-irpo-080524-16bit-v0.3

lighning-ai-6000

5https://shorturl.at/4jmPq

among the 5 (in this case) possible choices. This is the Prima di procedere con l'analisi, sentence translated in English: letteralmente, la frase significa

Those islanders, jealous of the small properties they "those islanders, jealous of their little had acquired at a very hard cost, intended with extreme properties, obtained at a great price, penalties to remove the danger of seeing them taken intended with extreme pain to keep the danger away from them. of having them taken away at a distance." Given the context, we can deduce that the Frase da analizzare: "Quegli isolani , "danger" referred to in the phrase gelosi delle piccole proprietà, a prezzo is a potential threat that may cause sudatissimo procurate , intendevano con the "islanders" to lose their "little estreme pene allontanare il pericolo di properties." vedersele sottratte . " Therefore, in order to protect these possessions, the islanders are taking Task: Ragionando sempre in italiano, deliberate actions to prevent the danger from determina il significato del verbo approaching. "allontanare" in base al contesto della Given this context, option 1 is the least frase. Considera i seguenti sensi: likely to fit, as there is no explicit indication that anyone is being 1: expelled or cast out. Option 5 Forma: [Human1] allontanare [Human2] da also seems unlikely, as the islanders are [Location] | da [Human Group] acting intentionally to keep Significato: [Human1] espelle, caccia the danger at bay, not distancing themselves [Human2] da [Location] | da [Human Group] from it. 2: Options 2, 3, and 4 all have potential, but Forma: [Human] allontanarsi (da [Physical let's consider each in turn. Entity] | da [Human Group]) In option 2, the verb "allontanarsi" does not Significato: [Human] fa in modo di essere appear, and the rest of lontano (da [Physical Entity] | da [Human the sentence suggests a collective rather Group]) than individual action, which 3: makes it less likely. Option 3 seems Forma: [Human] allontanarsi da [Location] plausible at first, but it doesn't Significato: [Human] si assenta da [Location] fully capture the meaning of the text. While 4: the islanders may be Forma: [Human] allontanare [Abstract Entity] physically separating themselves from the Significato: [Human] rimuove, respinge, tiene danger, the primary intent lontano [Abstract Entity] of their actions is to keep the danger away 5: from their properties, Forma: [Eventuality] allontanarsi not just to be far from it. Significato: [Eventuality] si distanzia, On the other hand, option 4 seems to fit the diventa difficilmente raggiungibile context best. The islanders are taking deliberate steps to remove or keep the danger at a distance from their possessions, effectively "removing" or "keeping at a distance" the potential threat to their property.

Risposta finale: 4.

Domanda: Quale dei sensi sopra elencati meglio descrive l'uso del verbo "allontanare" nella frase data? Spiega il tuo ragionamento passo per passo prima di rispondere e quanto sei pronto a rispondere usa il formato "Risposta finale: 1,2,3,4,5 ecc".

Soluzione: A.3. The wrong answer

On the other hand another response given by the model

A.2. The right answer to the same prompt says that the islanders are pushing someone away but is not explicitly stated in the sentence.

This is one of the example responses the baseline model But the islanders are removing the danger which is sense gives and that is correct and will be used in the training 4, that is keeping at distance an Abstract Entity. This as winning. answer is obviously wrong Considerando il contesto della frase, "quegli isolani intendevano con estreme pene allontanare il pericolo di vederle sottratte," il senso secondo (3) "Forma: [Human] allontanarsi da [Location]" e (4) "Forma: [Human] allontanare [Abstract Entity]" non sembrano appropriati. Infatti, il primo senso implica un'azione volontaria e attiva di uno soggetto verso un'altra persona, mentre nel contesto di questa frase si intende un'azione collettiva compiuta dai "quegli isolani" per proteggere "le piccole proprietà" dalle minacce esterne.

Il senso quinto invece non è applicabile in quanto non si tratta di una entità astratta a essere allontanata, bensì di un pericolo da mantenere lontano.

Perciò, il senso appropriato del verbo "allontanare" in questa frase è il primo (1): "Forma: [Human1] allontanare [Human2] da [Location] | da [Human Group]".

In questo caso, "quegli isolani" stanno cacciando o espellendo un pericolo (che è non specificato nella frase) lontano dai loro piccoli propriety.

Risposta finale: 1.

ceedings of the 29th International Conference on [1] K. K. Schuler , Verbnet: A broad-coverage , compre- Computational Linguistics , International Commit-

Pennsylvania ( 2005 ). public of Korea , 2022 , pp. 2188 - 2198 . URL: https: [2]

C. F.

Baker ,

C. J.

Fillmore ,

J. B.

Lowe , The berkeley //aclanthology.org/ 2022 .coling- 1 . 191 .

framenet project , Proceedings of the 17th inter- [12]

Rafailov ,

Sharma , E. Mitchell, C. D. Manning,

Volume 1 ( 1998 ) 86 - 90 . Your language model is secretly a reward model , in: [3]

Palmer ,

Gildea ,

Kingsbury , The proposition Thirty-seventh Conference on Neural Information

bank: An annotated corpus of semantic roles , in: Processing Systems , 2023 . URL: https://openreview.

Computational

Linguistics , volume 31 , MIT Press, net/forum?id= HPuSIXJaa9 .

2005 , pp. 71 - 106 . [13]

J. E.

Hu ,

Shen ,

Wallis ,

Allen-Zhu , [4]

Shi ,

Mihalcea , Putting pieces together: com- Y. Li,

Wang ,

Chen , Lora: Low-

semantic parsing , in: International Conference on ArXiv abs/2106 .09685 ( 2021 ). URL: https:

Intelligent

Text Processing and Computational Lin- //api.semanticscholar.org/CorpusID:235458009.

guistics, Springer, Mexico City, 2005 , pp. 100 - 111 . [14]

Gao ,

Tow ,

Abbasi ,

Biderman , S. Black, [5] A.-M. Giuglea , A. Moschitti , Semantic role labeling A . DiPofi,

Foster ,

Golding ,

Hsu ,

Le Noac

'h,

ings of the 21st International Conference on Com- J . Phang,

Reynolds ,

Schoelkopf , A . Skowron,

putational Linguistics and the 44th annual meeting L . Sutawika , E.

Tang , A.

Thite , B.

Wang , K.

Wang ,

Association for Computational

Linguistics

, Sydney, evaluation, 2023 . URL: https://zenodo.org/records/

NSW , 2006 , pp. 929 - 936 . 10256836. doi: 10 .5281/zenodo.10256836. [6]

S. W.

Brown , J. Bonn,

Kazeminejad ,

Zaenen , [15] G. T. et al., Gemma: Open models based on gemini

Pustejovsky ,

Palmer , Semantic representations research and technology, 2024 . URL: https://arxiv.

for nlp using verbnet and the generative lexicon , org/abs/2403.08295. arXiv: 2403 . 08295 .

Frontiers in Artificial Intelligence 5 ( 2022 ) 821697 . [16]

Hassani , Enhancing legal compliance and

doi:10 .3389/frai. 2022 . 821697 . regulation analysis with large language mod [7]

Jezek ,

Magnini ,

Feltracco , A . Bianchini, els, 2024 . URL: https://arxiv.org/abs/2404.17522.

Popescu , T-pas: A resource of typed predicate arXiv: 2404 . 17522 .

argument structures for linguistic analysis and se- [17]

Mumtaz ,

Ahmed ,

Mumtaz , Llms-healthcare

and Evaluation (LREC'14) , European Language Re- ties, 2024 . URL: https://arxiv.org/abs/2311.12882.

sources Association (ELRA) , 2014 , pp. 890 - 895 . arXiv: 2311 . 12882 . [8]

A. Q.

Jiang ,

Sablayrolles ,

Mensch , C. Bam- [18]

Haristiani , Artificial intelligence (ai) chatbot as

Lengyel ,

Lample ,

Saulnier ,

L. R.

Lavaud , M.- of Physics: Conference Series 1387 ( 2019 ) 012020 .

Lachaux ,

Stock ,

T. L.

Scao ,

Lavril ,

Wang , URL: https://dx.doi.org/10.1088/ 1742 -6596/1387/1/

Lacroix ,

W. E.

Sayed , Mistral 7b, 2023 . URL: https: 012020. doi: 10 .1088/ 1742 - 6596/1387/1/012020.

//arxiv.org/abs/2310.06825. arXiv: 2310 . 06825 . [9]

R. Y.

Pang ,

Yuan ,

Cho ,

He , S. Sukhbaatar,

mization, ArXiv abs/2404 .19733 ( 2024 ). URL: https: A. A complete example

//api.semanticscholar.org/CorpusID:269457506. [10]

Hanks , Lexical analysis: Norms and exploitations, A.1 . Prompt example

Mit Press, 2013 . This is an example of a prompt. The predicate is ”allon[11]

Wang ,

Li ,

Meng ,

Sun ,

Qiu , Z.

Wang, tanare” that in English can be translated based on the