<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>with LLMs and T-PAS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Simonetti</string-name>
          <email>simonetti.luca@spes.uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elisabetta Ježek</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Vetere</string-name>
          <email>g.vetere@unimarconi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>NLP, T-PAS, Verb Subcategorization, Mistral, CLiC-it</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli studi Guglielmo Marconi</institution>
          ,
          <addr-line>Via Plinio, 44, 00193 Roma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli studi di Udine</institution>
          ,
          <addr-line>Via Palladio, 8, 33100 Udine</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università di Pavia</institution>
          ,
          <addr-line>Corso Strada Nuova, 65, 27100 Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study explores the application of Large Language Models (LLMs) to verb subcategorization in Italian, focusing on the identification and classification of syntactic patterns in sentences. While LLMs have made lexical analysis more implicit, explicit argument structure identification remains crucial in domain-specific contexts. The research leverages T-PAS, a rich lexical resource for Italian verbs, to fine-tune the open multilingual model Mistral 7B using the Iterative Reasoning Preference Optimization (IRPO) technique. This approach aims to enhance the recognition and extraction of verbal patterns from Italian sentences, addressing challenges in resource quality, coverage, and frame extraction methods. By combining curated lexical-semantic resources with neural language models, this work contributes to improving verb subcategorization tasks, particularly for the Italian language, and demonstrates the potential of LLMs in refining linguistic analysis tools.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>sifying the syntactic patterns (or frames) taken by verbs
in sentences. These patterns encode the possible
combinations of arguments (such as subjects, objects, and
ber and type of arguments as well as their syntactic and
semantic roles. Verb subcategorization is often used in
Natural Language Understanding (NLU) to provide the
main interpretation backbone. Although recent
developments brought about by Large Language Models (LLM)
make lexical analysis somewhat implicit, there are cases
in which the identification of the argument structure of
the verb is required, especially those where extensive
domain-specific knowledge is required.</p>
      <p>FrameNet[2] and PropBank[3] have been largely
employed for several NLP tasks in the past decades,
including accomplishing verbal framing for the English
language. VerbNet, for example, has been used to improve
semantic role labeling, verb sense disambiguation and
ontology mapping ([4], [5]); its new enhanced semantic
representations have also recently been used for entity
state tracking [6]. The main problems addressed in these
experiences concern the quality and coverage of such
resources and the methods used to extract frames from
sentences.
tences using a technique called IRPO [9].</p>
      <p>The paper is organized as follows: in Section 2 we
introduce the T-PAS resource for Italian verbs, which we
used in our experiments. Section 3 discusses in detail the
methodology we applied and references closely related
works, whereas Section 4 illustrates the experimental
setup. We complete the paper by discussing our results
making suggestions for future research in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The T-PAS resource</title>
      <sec id="sec-2-1">
        <title>T-PAS [7] is an inventory of argument structures and</title>
        <p>senses for Italian verbs.1 In T-PAS, for each verb
meaning, a specific Typed Predicate-Argument Structure
(T</p>
      </sec>
      <sec id="sec-2-2">
        <title>PAS, informally called pattern) is provided, in which</title>
        <p>arguments are defined in terms of semantic classes
notated between square brackets, called semantic types.</p>
      </sec>
      <sec id="sec-2-3">
        <title>An example of a pattern for the verb guidare ‘drive’ in</title>
        <p>are acquired from corpora following the Corpus Pattern</p>
      </sec>
      <sec id="sec-2-4">
        <title>1The T-PAS project was developed at the Department of Humanities</title>
        <p>of the University of Pavia, with the technical support of Lexical
Computing Ltd. The resource can be freely accessed and
down</p>
        <p>
          Semantic lexical resources such as VerbNet[
          <xref ref-type="bibr" rid="ref4">1</xref>
          ], in Section 5 and by drawing some conclusions as well as
CEUR
        </p>
        <p>ceur-ws.org
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Background and Methodology</title>
      <p>
        Analysis (CPA) methodology [
        <xref ref-type="bibr" rid="ref18">10</xref>
        ]. Currently, T-PAS con- 4. Experimental Setup
tains 1160 analyzed verbs, 5529 patterns and ca. 200,000
annotated corpus instances. Semantic types (Human, The experimental setup consists of two main stages:
Event, Location, Food, Vehicle, etc.) are obtained from dataset creation and fine-tuning of the base model
Mismanual clustering of the lexical items found in the ar- tral 7B [8], as per the paper Iterative Reasoning with
gument positions in the corpus. These types look very Preference Optimization (IRPO) [9]. Our
implementamuch like ontological categories; however, instead of be- tion involves a single iteration, comprising both dataset
ing stipulated, they are induced from corpus data and generation and the actual fine-tuning. Additionally, we
reflect how humans talk about events and states of enti- conduct a basic fine-tuning process where we train
Misties through language. The system of semantic types in tral to directly complete prompts with the correct answer
T-PAS currently contains 180 semantic types. The list is in a specified format:
organized in a hierarchy to identify the appropriate level
of specificity of the selectional properties of individual
verbs.
      </p>
      <p>La risposta corretta è 2...</p>
      <sec id="sec-3-1">
        <title>We refer to this as the SFT (Supervised Fine-Tuning)</title>
        <p>model later in the discussion. This approach allows us to
compare the efectiveness of IRPO against a more
straightforward fine-tuning method. We now provide more
details about the two stages of our experimental setup.</p>
        <p>The extraction of verbal frames consists of applying
frame-like structures to sentences. Once a suitable frame 4.1. Dataset Creation
is identified, each element of the structure is mapped
to an element of the sentence. To start our experiment, The first stage, dataset creation, involves the following
we attempted to extract the frame directly from the neu- steps:
ral model (LLM), relying on the fact that LLMs are
pretrained on large amount of texts and that their language
modeling capabilities have reached unprecedented levels
of maturity in the last three years. Although
promising, this approach proved insuficient, since the model
struggled with the correct subcategorization of the verb
before extracting the appropriate frame. In a way, it
appeared that the selection was compromised by the
nondeterministic nature of LLM inference. Consequently,
we split the task into two separate phases: 1) frame
identification, i.e. T-PAS subcategorization, and 2) frame
extraction, i.e. frame-sentence mapping.</p>
        <p>We found that the baseline model performed poorly
on the subcategorization task, achieving only 59.8%
accuracy. For this reason, we decided to fine-tune the baseline
model on the task of identifying verbal frames, which
proved to be key for the subsequent task of extracting
these frames. This approach was inspired by [11], where
the authors set up a framework for verb sense
disambiguation by providing the model with the frame that
describes the sense the verb can take. This allows us to
treat this task as a linguistic and semantic task rather
than a simple categorization task. The idea is to provide
the model with a prompt that includes the frames, based
on the hypothesis that supplying the model with as much
information as possible might be beneficial. This paper
will only cover the subcategorization task. To do this, we
created a fine-tune dataset based on the T-PAS resource,
containing both the necessary information and a large
number of examples to build upon.</p>
        <p>The first phase involves gathering sentences and
structuring prompts. The prompts consist of questions to the
model, where we ask which of the listed senses is the
correct one for the sentence we provide. We use a subset
of the T-PAS dataset, comprising approximately 5,324
examples out of the total 26,652 elements (around 19.9%
of the full dataset). The sentences are randomly picked
from this subset, using at most two examples for each
verb to avoid any bias towards one specific pattern or
predicate. This approach ensures a diverse
representation while maintaining a manageable dataset size for our
experiments. The possible senses the verb can acquire
are constructed from the T-PAS dataset. We maintain the
original order of the senses as listed in T-PAS to facilitate
both the dataset generation and the evaluation processes.</p>
        <p>Our preliminary tests indicated that this decision doesn’t
significantly afect performance. We provide an example
of a prompt in the Appendix to illustrate the structure
and content of our queries to the model. After building
the prompts, we query the Mistral 7B model API 30 times
with temperature set to the highest value to let the model</p>
      </sec>
      <sec id="sec-3-2">
        <title>1. We collect 30 responses from the base Mistral</title>
        <p>Model with a high temperature for each sentence.
2. Using these responses, we build a dataset
containing (  ,  , ,  , ) tuples, where:
•   is the prompt used in step 1 to generate</p>
        <p>the responses
•  , is the winning response (i.e., the one</p>
        <p>that leads to a correct answer)
•  , is the losing response (i.e., the wrong
one)
4.2. Fine-Tune
explore as much as possible its internal latent space to 'up_proj', 'q_proj', 'o_proj',
provide a response. This second phase results in 30 re- 'down_proj']
sponses per prompt. We then compile a dataset of both
correct and incorrect responses. The prompt instructs the We use a single L4 GPU chip with 24GB VRAM available.
model to answer in a specific format. Even if a response As such, we can only have a batch size of 1 and use
is semantically correct but doesn’t adhere to the required gradient accumulation of 2 to simulate a batch size of
format, we classify it as incorrect and include it with the 2. We set max steps to 4,500 steps but actually stop the
wrong responses. This approach aligns with the method- fine-tuning after 3,000 steps since there seems to be a
ology of the base paper and serves multiple purposes in plateau in the performances of the reward accuracy (see
our tuning process. By enforcing a specific format, we’re Figure 1).
not only training the model to provide correct answers
but also to follow instructions precisely which provides 4.3. Diferences from the IRPO paper
us with a standardized format that ensures consistency
across responses, crucial for large-scale evaluation and The main diferences from the original IRPO setup are as
comparison. This phase results in an unbalanced dataset follows:
of wrong and right responses for each prompt summing
up to 30.</p>
        <p>The second step of this pipeline involves transforming
the intermediate dataset into the final dataset. Following
the approach of the IRPO authors, we combine chosen
and rejected responses to create a balanced dataset,
ensuring that each response is processed at least once during
ifne-tuning. For example, if the number of elements in
one of the two stacks – chosen and rejected – is less than
the other, we reuse elements from the stack with fewer
items multiple times to achieve balance.</p>
        <p>The result is a dataset consisting of 17,863 rows with
columns prompt, chosen, rejected, which we make
available on Hugginface2.
1. As starting model  0 we use Mistral 7B: in other
words a diferent model with 10x less
parameters whereas the authors of the original paper
use Llama-2 70B, a diferent model with diferent
architecture, and possibly diferent dataset used
in the pretraining.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2. We apply IRPO to a linguistic task instead of logic</title>
        <p>or math reasoning task.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3. We use a subset of verbs for training and ob</title>
        <p>serve generalization on diferent verbs during
testing. This approach difers from the IRPO
authors, who utilize standardized datasets such as</p>
      </sec>
      <sec id="sec-3-5">
        <title>GSM8K, MATH, and ARC-Challenge. While these</title>
        <p>datasets allow for direct comparison across
diferent models and techniques, they don’t provide the
opportunity to assess generalization to unseen
problem types in the same way our verb-based
approach does.</p>
        <p>The loss function we employ in the second stage, the
ifne-tuning, consists of two components: one handling
the Direct Preference Optimization (DPO) Rewards [12],
and another that positively afects the Negative Log Like- 5. Results and Discussion
lihood (NLL) of the correct answer. This approach has
similar efects to those described by the authors of IRPO Our final results are summarized in Table 1. We observe
[9]. that the multilingual baseline model, although
includ</p>
        <p>Using the dataset built as described in the previous ing Italian, is not suficiently accurate in performing the
section we proceed with the fine-tune. We also build the selection task. Furthermore, when the model’s
tempercustom loss function as described (but not implemented) ature is increased, it does not remain consistent with a
by the authors of IRPO. Our implementation of the repli- specific answer but rather explores multiple response
cated loss function will be made publicly available. The options, selecting a diferent choice randomly each time.
LoRA [13] configuration is as follows: This could be explained in multiple ways: one is that the
model knows it needs to select an answer but doesn’t
rank=16, relate to the correct one using a thorough analysis but
lora_alpha=16, rather following a pick-one strategy with the explanation
lora_dropout=0.05, coming as a consequence. The other possible explanation
bias="none", is that the model just tries to give an answer, not actually
task_type="CAUSAL_LM", connecting pieces of the given possibilities to the
sentarget_modules=[ tence but rather picking random parts of the sentence
'k_proj', 'gate_proj', 'v_proj', where they are more likely to reside for that particular
part of the frame – e.g. the subject is usually heading
the sentence. Unfortunately these are only speculations
parameters. The unchanged performance indicates that
our fine-tuning enhanced the model’s capabilities on our
specific task without compromising its general language
understanding. This result aligns with the versatility
of large language models, capable of maintaining
proifciency across multiple NLP tasks simultaneously, and
suggests potential for developing specialized AI systems
without sacrificing broader capabilities.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Another significant result derives from the fact that</title>
        <p>the subset of verbs used for fine-tuning difers from the
verb subset used for testing. This means that we not
only avoid using the same sentences from the training
phase but also employ verbs that were not present during
training, and yet we obtain performance improvements.
This demonstrates some degree of generalization. While
a thorough study would be needed to draw more accurate
conclusions, we can still ofer some speculations:
1. The model has generalized to understand which
thought patterns are rewarded during the DPO
(Direct Preference Optimization) phase. It now
attempts to reproduce these patterns regardless
of the specific sentence or verb used. This can be
seen clearly from Figure 1, where the accuracy in
rewarding the right choices improves over
training steps, a key indicator that the model learns to
discern between right and wrong answers.
2. Another speculation is that the logits have
changed, and the behaviour is too random or
at least too noisy to be taken into consideration
alone. This means that the fact that the model
performs better could be just associated with
random guessing as the baseline models do, but with
new logits.
and future work might clarify and explain better what
happens.</p>
      </sec>
      <sec id="sec-3-7">
        <title>But we also find that after using the IRPO technique</title>
        <p>the model modifies its behaviour, improving its accuracy.</p>
        <p>In other words model seems to acquire some competence
in this task by being fine-tuned with a double signal
consisting of the DPO plus the NLL losses being
considered. The first signal teaches the model to distinguish We now briefly illustrate the impact of fine-tuning on
between the right and the wrong answer. The second the Mistral 7B model’s performance in subcategorization
signal pushes further up the correct answer in probability tasks, presenting a case study focusing on the Italian verb
space. It is remarkable that the collection of the dataset ”scherzare” (to joke). This example, while not
necessarfor the second iteration proved to be quite a hard task ily representative of all improvements, provides at least
since the model was performing well enough to give just some insight into the changes in the model’s reasoning
a reduced amount of wrong answers, both in an absolute process. The sentence under consideration is: ”Non si
– i.e. for a given sentence the model returns 30 correct può scherzare con la realtà virtuale.” (One cannot joke
answers – and in a relative – i.e. the number of wrong with virtual reality.) The model was tasked with selecting
answers is small: 2,3 – sense. the correct sense from a predefined set of options:</p>
        <p>We assessed the model’s performance on basic Com- Human non fa o dice qualcosa sul serio (doesn’t do or say
mon Sense [14] tasks to probe the efects of our fine- something seriously)
tuning. Interestingly, we found no change in perfor- Human fa ironia su|sopra|con [Abstract Entity] (makes
mance across these tasks. This outcome is particularly irony about [Abstract Entity])
noteworthy when we analyze how the diferent outcomes Human ha un comportamento leggero e divertito (has a
might have been speculated to have happened. A deterio- light and amused behavior)
ration in performance could have suggested catastrophic Human fa sul serio, non sottovaluta, prende sul serio
forgetting, a common issue in neural networks where ([Anything]) (takes seriously, doesn’t
underesnew learning replaces irremediably the previous knowl- timate [Anything])
edge. However, our use of Low-Rank Adaptation (LoRA) Human si espone imprudentemente a un rischio
(imprulikely mitigated this efect by updating only task-specific dently exposes oneself to a risk)</p>
      </sec>
      <sec id="sec-3-8">
        <title>Human prende alla leggera qualcosa (takes something lightly)</title>
      </sec>
      <sec id="sec-3-9">
        <title>In conclusion, we can say that small multilingual baseline</title>
        <p>Before fine-tuning, the model incorrectly classified models such as Mistral 7B perform poorly on semantic
the usage as sense 1, providing the following reasoning: analysis of Italian sentences. We observe that the poor
be”Nella frase data, ’scherzare’ non è seguito da un oggetto havior is due to the model’s inability to discern the correct
concreto o astratto, quindi non si applicano i sensi 1, 2, 4, answer, either because it lacks the linguistic knowledge,
5 e 6. Il contesto della frase indica che ’scherzare’ non è therefore mostly resorting on random guesses, or because
inteso come un comportamento leggero e divertito (senso it follows an incorrect explanation for the answer is about
3), ma piuttosto come non fare o dire qualcosa in maniera to give. However, our research also demonstrates that
irrispettosa o non seria (senso 1).” This reasoning demon- the model can be significantly improved using IRPO
techstrates a lack of sensitivity to the syntactic structure and niques without afecting the baseline performance on
context of the sentence. Also, during the reasoning pro- common sense and reasoning tasks. Notably, we observe
cess, option 1 was inconsistently excluded but was later the ability to generalize across predicates, likely due to
chosen as the final answer. underlying linguistic skills, though further investigation</p>
        <p>After fine-tuning, the model correctly identified sense is needed to fully understand this phenomenon.
4 as the appropriate choice: ”Nella frase data, ’scherzare’ The production of small open language models is
non è usato attivo, ma negativo, quindi non si può appli- rapidly evolving, approaching the level of huge close
care direttamente i sensi 1, 3, 5 e 6, che si riferiscono a models which were available on the cloud a couple of
azioni o comportamenti specifici. Invece, il verbo ’si può’ years ago. At present, Italian monolingual models have
implica un senso di non poter fare qualcosa, quindi il room for improvement in terms of performance levels,
senso appropriato è quello di sottovalutare o prendere in 3 while multilingual models, e.g. the recently released
giro qualcosa, che corrisponde al senso 4.” This reasoning Gemma 2[15], show increasing proficiency in our
lanshows multiple enhancements: guage, probably due to transfer learning efects. Our
1. Recognition of the negative construction ”Non si research shows the potential of leveraging such
modpuò scherzare” els in combination with high-quality lexical resources
2. Consideration of the phrase ”con la realtà virtuale” to develop a new class of task-specific models for the
as crucial context Italian language. These models, while small in scale, are
3. More nuanced interpretation, considering multi- expected to exhibit remarkable proficiency in executing
ple senses before making a decision complex analytical tasks, such as those related to verbs.</p>
      </sec>
      <sec id="sec-3-10">
        <title>With this in mind, our future work is aimed, on the one</title>
        <p>While this single example cannot be generalized to hand at enriching lexicographic resources and refining
the model’s overall performance, it suggests that fine- the ways to obtain training material from them, and on
tuning may have enhanced the model’s ability to parse the other hand at continuously evaluating the
improvecomplex syntactic structures and integrate contextual ments brought about by the progress of general-purpose
information in subcategorization tasks. Further com- open models.
prehensive analysis across a wide range of verbs and One promising application is the use of a verbal
subconstructions would be necessary to draw broader con- categorization and frame extraction system to extract
clusions about the model’s improved capabilities as well content from specialist documents, such as legal [16] or
as identifying new means to further enhance accuracy medical texts [17]. Furthermore, the ability to analyze
and performance. the complex argument structure of verbs has potential
for use in language learning systems [18], e.g. providing
Table 1 support for immigrants to learn Italian afordably.
Comparison between various fine-tune methods Finally, we made our fine-tuned model publicly
available on huggingface4 along with a visual report on</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Model Test Accuracy (%) wandb.5</title>
      <p>Iterative RPO</p>
      <p>Iteration 1
SFT</p>
      <p>PST CoT
Mistral baseline
Zero-shot CoT
75.6
65.6
59.8</p>
      <sec id="sec-4-1">
        <title>3See for instance Hugging Face’s INVALSI Leader</title>
        <p>board,
https://huggingface.co/spaces/CrispUnimib/INVALSIbenchmark</p>
      </sec>
      <sec id="sec-4-2">
        <title>4https://huggingface.co/theGhoul21/srl-base-irpo-080524-16bit-v0.3</title>
        <p>lighning-ai-6000</p>
      </sec>
      <sec id="sec-4-3">
        <title>5https://shorturl.at/4jmPq</title>
        <p>among the 5 (in this case) possible choices. This is the Prima di procedere con l'analisi,
sentence translated in English: letteralmente, la frase significa</p>
        <p>Those islanders, jealous of the small properties they "those islanders, jealous of their little
had acquired at a very hard cost, intended with extreme properties, obtained at a great price,
penalties to remove the danger of seeing them taken intended with extreme pain to keep the danger
away from them. of having them taken away at a distance."
Given the context, we can deduce that the
Frase da analizzare: "Quegli isolani , "danger" referred to in the phrase
gelosi delle piccole proprietà, a prezzo is a potential threat that may cause
sudatissimo procurate , intendevano con the "islanders" to lose their "little
estreme pene allontanare il pericolo di properties."
vedersele sottratte . " Therefore, in order to protect these
possessions, the islanders are taking
Task: Ragionando sempre in italiano, deliberate actions to prevent the danger from
determina il significato del verbo approaching.
"allontanare" in base al contesto della Given this context, option 1 is the least
frase. Considera i seguenti sensi: likely to fit, as there is no
explicit indication that anyone is being
1: expelled or cast out. Option 5
Forma: [Human1] allontanare [Human2] da also seems unlikely, as the islanders are
[Location] | da [Human Group] acting intentionally to keep
Significato: [Human1] espelle, caccia the danger at bay, not distancing themselves
[Human2] da [Location] | da [Human Group] from it.
2: Options 2, 3, and 4 all have potential, but
Forma: [Human] allontanarsi (da [Physical let's consider each in turn.
Entity] | da [Human Group]) In option 2, the verb "allontanarsi" does not
Significato: [Human] fa in modo di essere appear, and the rest of
lontano (da [Physical Entity] | da [Human the sentence suggests a collective rather
Group]) than individual action, which
3: makes it less likely. Option 3 seems
Forma: [Human] allontanarsi da [Location] plausible at first, but it doesn't
Significato: [Human] si assenta da [Location] fully capture the meaning of the text. While
4: the islanders may be
Forma: [Human] allontanare [Abstract Entity] physically separating themselves from the
Significato: [Human] rimuove, respinge, tiene danger, the primary intent
lontano [Abstract Entity] of their actions is to keep the danger away
5: from their properties,
Forma: [Eventuality] allontanarsi not just to be far from it.
Significato: [Eventuality] si distanzia, On the other hand, option 4 seems to fit the
diventa difficilmente raggiungibile context best. The islanders
are taking deliberate steps to remove or keep
the danger at a distance
from their possessions, effectively
"removing" or "keeping at a distance"
the potential threat to their property.</p>
        <p>Risposta finale: 4.</p>
        <p>Domanda: Quale dei sensi sopra elencati
meglio descrive l'uso del verbo
"allontanare" nella frase data? Spiega il
tuo ragionamento passo per passo prima di
rispondere e quanto sei pronto a rispondere
usa il formato
"Risposta finale: 1,2,3,4,5 ecc".</p>
        <p>Soluzione:
A.3. The wrong answer</p>
      </sec>
      <sec id="sec-4-4">
        <title>On the other hand another response given by the model</title>
        <p>A.2. The right answer to the same prompt says that the islanders are pushing
someone away but is not explicitly stated in the sentence.</p>
        <p>This is one of the example responses the baseline model But the islanders are removing the danger which is sense
gives and that is correct and will be used in the training 4, that is keeping at distance an Abstract Entity. This
as winning. answer is obviously wrong
Considerando il contesto della frase, "quegli
isolani intendevano con
estreme pene allontanare il pericolo di
vederle sottratte," il senso
secondo (3) "Forma: [Human] allontanarsi da
[Location]" e (4) "Forma:
[Human] allontanare [Abstract Entity]" non
sembrano appropriati. Infatti, il primo senso
implica un'azione volontaria e attiva di uno
soggetto verso un'altra persona, mentre nel
contesto di questa frase si intende un'azione
collettiva compiuta dai "quegli isolani"
per proteggere "le piccole proprietà" dalle
minacce esterne.</p>
        <p>Il senso quinto invece non è applicabile in
quanto non si tratta di una entità astratta a
essere allontanata, bensì di un pericolo da
mantenere lontano.</p>
        <p>Perciò, il senso appropriato del verbo
"allontanare" in questa frase è il primo
(1): "Forma: [Human1] allontanare [Human2] da
[Location] | da [Human Group]".</p>
        <p>In questo caso, "quegli isolani" stanno
cacciando o espellendo un pericolo (che è
non specificato nella frase) lontano dai loro
piccoli propriety.</p>
        <p>Risposta finale: 1.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>ceedings of the 29th International Conference on [1]</source>
          <string-name>
            <surname>K. K. Schuler</surname>
          </string-name>
          ,
          <article-title>Verbnet: A broad-coverage</article-title>
          , compre- Computational
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , International Commit-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Pennsylvania</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <source>public of Korea</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2188</fpage>
          -
          <lpage>2198</lpage>
          . URL: https: [2]
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Lowe</surname>
          </string-name>
          , The berkeley //aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>191</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>framenet project</article-title>
          ,
          <source>Proceedings of the 17th inter-</source>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rafailov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , E. Mitchell, C. D. Manning,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>Volume</source>
          <volume>1</volume>
          (
          <year>1998</year>
          )
          <fpage>86</fpage>
          -
          <lpage>90</lpage>
          .
          <article-title>Your language model is secretly a reward model</article-title>
          , in: [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gildea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kingsbury</surname>
          </string-name>
          ,
          <source>The proposition Thirty-seventh Conference on Neural Information</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>bank: An annotated corpus of semantic roles</article-title>
          ,
          <source>in: Processing Systems</source>
          ,
          <year>2023</year>
          . URL: https://openreview.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , volume
          <volume>31</volume>
          , MIT Press, net/forum?id=
          <fpage>HPuSIXJaa9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <year>2005</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>106</lpage>
          . [13]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          , [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          , Putting pieces together: com- Y. Li,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora: Low-
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>semantic parsing</article-title>
          ,
          <source>in: International Conference on ArXiv abs/2106</source>
          .09685 (
          <year>2021</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Intelligent</given-names>
            <surname>Text Processing</surname>
          </string-name>
          and Computational Lin- //api.semanticscholar.org/CorpusID:235458009.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          guistics, Springer, Mexico City,
          <year>2005</year>
          , pp.
          <fpage>100</fpage>
          -
          <lpage>111</lpage>
          . [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Abbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          , S. Black, [5]
          <string-name>
            <surname>A.-M. Giuglea</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <article-title>Semantic role labeling A</article-title>
          . DiPofi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Le Noac</surname>
          </string-name>
          <article-title>'h,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>ings of the 21st International Conference on Com- J</source>
          . Phang,
          <string-name>
            <given-names>L.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Skowron,
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>putational Linguistics and the 44th annual meeting L</article-title>
          .
          <string-name>
            <surname>Sutawika</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Thite</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Association for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          , Sydney, evaluation,
          <year>2023</year>
          . URL: https://zenodo.org/records/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>NSW</surname>
          </string-name>
          ,
          <year>2006</year>
          , pp.
          <fpage>929</fpage>
          -
          <lpage>936</lpage>
          . 10256836. doi:
          <volume>10</volume>
          .5281/zenodo.10256836. [6]
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Brown</surname>
          </string-name>
          , J. Bonn,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kazeminejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaenen</surname>
          </string-name>
          , [15]
          <string-name>
            <surname>G. T.</surname>
          </string-name>
          et al.,
          <source>Gemma: Open models based on gemini</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          , Semantic representations research and technology,
          <year>2024</year>
          . URL: https://arxiv.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>for nlp using verbnet and the generative lexicon</article-title>
          , org/abs/2403.08295. arXiv:
          <volume>2403</volume>
          .
          <fpage>08295</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>Frontiers in Artificial Intelligence</source>
          <volume>5</volume>
          (
          <year>2022</year>
          )
          <fpage>821697</fpage>
          . [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <article-title>Enhancing legal compliance and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>doi:10</source>
          .3389/frai.
          <year>2022</year>
          .
          <volume>821697</volume>
          .
          <article-title>regulation analysis with large language mod</article-title>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jezek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feltracco</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bianchini, els,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2404.17522.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Popescu</surname>
          </string-name>
          , T-pas:
          <article-title>A resource of typed predicate</article-title>
          arXiv:
          <volume>2404</volume>
          .
          <fpage>17522</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>argument structures for linguistic analysis</article-title>
          and se- [17]
          <string-name>
            <given-names>U.</given-names>
            <surname>Mumtaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mumtaz</surname>
          </string-name>
          , Llms-healthcare
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>and Evaluation (LREC'14)</source>
          , European Language Re- ties,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2311.12882.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>sources Association (ELRA)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>890</fpage>
          -
          <lpage>895</lpage>
          . arXiv:
          <volume>2311</volume>
          .
          <fpage>12882</fpage>
          . [8]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          , C. Bam- [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Haristiani</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence (ai) chatbot as</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Lengyel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Lavaud</surname>
          </string-name>
          , M.- of
          <source>Physics: Conference Series</source>
          <volume>1387</volume>
          (
          <year>2019</year>
          )
          <fpage>012020</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lachaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          , URL: https://dx.doi.org/10.1088/
          <fpage>1742</fpage>
          -6596/1387/1/
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Lacroix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Sayed</surname>
          </string-name>
          , Mistral 7b,
          <year>2023</year>
          . URL: https: 012020. doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          - 6596/1387/1/012020.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          //arxiv.org/abs/2310.06825. arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          . [9]
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          , S. Sukhbaatar,
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          mization,
          <source>ArXiv abs/2404</source>
          .19733 (
          <year>2024</year>
          ).
          <article-title>URL: https: A. A complete example</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          //api.semanticscholar.org/CorpusID:269457506. [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hanks</surname>
          </string-name>
          ,
          <article-title>Lexical analysis: Norms and exploitations, A.1</article-title>
          . Prompt example
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          Mit Press,
          <year>2013</year>
          .
          <article-title>This is an example of a prompt. The predicate is</article-title>
          ”allon[11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <article-title>Wang, tanare” that in English can be translated based on the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>