<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can LLMs Help Recollect and Elaborate On Our Personal Experiences?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriel Roccabruna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olha Khomyn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Yin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Riccardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CLiC-it 2025: Eleventh Italian Conference on Computational Linguistics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Signals and Interactive Systems Lab, Department of Information Engineering and Computer Science, University of Trento</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Work done while he was working at the University of Trento</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In the act of narration, speakers engage with others, communicate findings, and share personal facts and knowledge. This act involves recollecting and reasoning about thoughts and events. Individuals need to plan and organize events and associated emotions in a temporal and logical order. These recollection processes are cognitively demanding and emotion-laden. In this work, we investigate whether Large Language Models (LLMs) may help and support the process of personal narration, i.e. in elaborating on the unfolding events, participants, and emotions. For this, we test LLMs' abilities on a novel task called Automatic NarraTive Elicitation (ANTE). We have crowdsourced a corpus of elicitation responses in the Italian language using a pre-existing dataset of personal narratives. We used this dataset to evaluate a set of closed and open-source LLMs with automatic and human-evaluation metrics. The human evaluation results show that GPT-4 achieves performance similar to humans', while smaller open-source LLMs struggle with this task. We investigate whether fine-tuning smaller open-source LLMs improves performance by experimenting with mixing crowd-sourced and synthetic data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Personal Narrative</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Elicitation</kwd>
        <kwd>Emotions</kwd>
        <kwd>Conversational Agent</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The act of narration manifests in written or spoken
conversations. It is generally used to communicate facts,
knowledge and personal events. This act involves
recollecting and reasoning about thoughts and events. Indeed,
the narrative has been widely used in journalism [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
education [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and economics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In psychology, the analysis
of personal narratives is a research tool used in many
ifelds such as rehabilitation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], managing psychosis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
investigating language dysfunctions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and monitoring
the variation of the emotional state during psychotherapy
[
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. A Personal Narrative (PN) is a series of unfolding
events recounting the social interactions, emotions,
experiences and others lived by the narrator [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this sense,
a PN is a way to observe the interpretation of the world
from the narrator’s perspective [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
      </p>
      <p>
        Currently, the collection of personal narratives is
mainly based on textual stimuli or interviews. In the
textual stimuli approach, the narrators recount or write
down in complete isolation an event [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] recollected by a
crafted eliciting prompt based on valence-charged words
(e.g. friendship or death) or questions [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. However,
      </p>
      <p>Eliciting prompt
"What did you do this</p>
      <p>weekend?"
Poppy, my dog, and I went to a cottage in the
mountains close to my home. While hiking, we
ran into a bear close to the river we needed to</p>
      <p>cross! I was super worried for my dog!
Oh, that’s sound scary! How did you get
around the bear to reach the cottage?
I picked up Poppy and slowly walked back
without giving it our shoulders. Luckily, there
was another point to cross the river and we</p>
      <p>reached the cottage safely.</p>
      <p>That was smart! Did Poppy look scared on way</p>
      <p>
        to the cottage?
the act of narration may be a cognitively demanding and
emotionally intense process, leading some individuals to related to QG because the model has to generate a
quesget stuck with the narration or to recount overgeneral- tion given a context, but in ANTE the target answer is
ized memories, overlooking important details of the story unknown. Thus, the ANTE task has no predecessors to
[
        <xref ref-type="bibr" rid="ref12 ref15">15, 12</xref>
        ]. While human-human conversation has been the best of our knowledge, but previous research in QG
shown to alleviate these issues [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16, 17, 18</xref>
        ], the potential is still relevant. GPT-2 [24] has been on the generation
role of Large Language Models (LLM) in supporting this of clarifying questions by experimenting with several
process remains underexplored. Indeed, the recent sug- zero-shot prompts grounding the generation on a list
gested improvements in the safety, biases and toxicity of facets which are possible directions for an
ambigu[
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ] and in natural language fluency [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] make these ous query [25]. A BART model [26] has been used to
models suitable candidates for this task. generate questions based on a storybook summary for
      </p>
      <p>
        To help narrators recollect and elaborate on personal improving intellectual development in children [27]. In
events, LLMs must understand the unfolding events, par- the healthcare domain, a combination of T5 and BERT
ticipants, and emotions encompassed in the Personal models has been used in the task of asking patients with
Narrative (PN). In this work, we investigate whether depression questions for triage [28].
LLMs have these capabilities by evaluating their
performance on a novel task called Automatic NarraTive Data Augmentation with LLMs Recently, there has
Elicitation (ANTE). In this, to support the elaboration been increased attention on using the LLMs for data
augof personal events, the model is tasked to generate em- mentation. [29] have leveraged several LLMs to augment
pathetic eliciting responses pointing to a specific aspect three multilingual datasets. Similarly, [
        <xref ref-type="bibr" rid="ref22">30</xref>
        ] have
develof the recount. We crowdsource a corpus of more than oped an augmentation method based on GPT-3 [
        <xref ref-type="bibr" rid="ref23">31</xref>
        ] and
500 eliciting responses in the Italian language starting in-context learning to generate a dataset of synthetic
from a pre-existing dataset of PNs. On this, we evaluate dialogues. Related to this, the ability of LLMs to
gen5 open and closed-source LLMs with in-context learning. erate Socratic questions, i.e. questions for helping
stuThe human evaluation has shown that while GPT-4 [22] dents solve a problem without revealing the answer, has
achieves on-pair performance with the human reference, been investigated [
        <xref ref-type="bibr" rid="ref24">32</xref>
        ]. For this, the authors augmented a
all the open-source models lag behind. As closed-source dataset with GPT-4 [22] and fine-tuned Llama2 [
        <xref ref-type="bibr" rid="ref25">33</xref>
        ] with
LLMs may have privacy issues and not be afordable reinforcement learning.
over the long run, we explore whether fine-tuning small
open-source LLMs can reduce the gap. For this, we
augment the training set with a partition generated by GPT- 3. Automatic Narrative Elicitation
4. We then experiment with diferent combinations of
partitions (crowd-sourced vs synthetic data) during
finetuning. The results show that fine-tuning with synthetic
data improves the performance of all models, closing the
gap with the human reference.
      </p>
      <p>Our contributions can be summarized as follows:</p>
      <sec id="sec-1-1">
        <title>We envision a hybrid methodology for eliciting Personal</title>
        <p>Narratives (PN), which joins the benefits of textual
stimuli and interview approaches. The elicitation, depicted in
Figure 1, starts with an eliciting prompt such as a crafted
textual stimulus. Then, once the narrator finishes the first
part of the recount an agent asks a follow-up response
• Definition of a novel LLM skill for supporting that helps continue the narration by elaborating on some
personal narrations; aspect of the story. These exchanges go on till a certain
• Proposed guidelines and procedure for collect- criterion is met, depending on the application (e.g. based
ing the Automatic Narrative Elicitation (ANTE) on the narrative length), or the narrator explicitly wants
corpus; to stop.</p>
        <p>Formally, a prompt  elicits the main event of
• Automatic and human evaluation of 5 LLMs fol- the PN. This is followed by a sequence  =
lowing in-context learning and fine-tuning strate- [(1, 1), ..., (, )], where  is a narrative turn
gies; at time  and  is the corresponding eliciting response.
• Human evaluation protocol with two task-  consists of feedback and an eliciting question. The
specific metrics for the ANTE task; feedback must show active listening and be aligned with
the expressed narrator’s emotions. Furthermore, the
eliciting response must focus on relevant events mentioned
in  (  1  ) without significantly altering the
lfow of the narration.</p>
        <p>The Automatic NarraTive Elicitation (ANTE) task is
defined as:</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <sec id="sec-2-1">
        <title>Question Generation Question Generation (QG) is a natural language processing task in which a model is tasked to generate a question given a context and a target answer [23]. Automatic NarraTive Elicitation (ANTE) is</title>
        <p>Definition 3.1. Given the sequence [(1, 1), ..., ],
the model generates a  such that  elicits the narrator
to continue with the story by yielding a +1.</p>
      </sec>
      <sec id="sec-2-2">
        <title>This task implicitly requires an emotional and semantic understanding of the narrative. Furthermore, it implicitly requires the ability to select the events that might be valuable to support the continuation of the narration.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Data Collection</title>
      <sec id="sec-3-1">
        <title>We have experimented with 5 closed-sourced and open</title>
        <p>source LLMs, namely GPT-4, Llama3 8B [39], Vicuna
13B [40], LLaMAntino 13B [41], and IT5 [42]. The</p>
      </sec>
      <sec id="sec-3-2">
        <title>Similarly, we have included the description of undesir</title>
        <p>able properties, such as asking for personal opinions, and
hypothetical events, giving suggestions or shifting the
1https://github.com/sislab-unitn/ANTE
focus of the conversation away from the narrated event. 2https://www.prolific.com/
Furthermore, to help the annotator focus the question 3We used gpt-4-turbo</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Evaluation</title>
      <sec id="sec-4-1">
        <title>6.1. Metrics</title>
        <p>
          selection of the models has only considered LLMs
supporting the Italian language i.e. the language of the ANTE
dataset. IT5 is pre-trained on the Italian dataset, while
LLaMAntino 13B based on Llama2 [
          <xref ref-type="bibr" rid="ref25">33</xref>
          ] is fine-tuned on
the Italian language using LoRa [43]. Instead, Llama3
8B and Vicuna 13B are pre-trained on a multi-lingual
dataset.
        </p>
        <sec id="sec-4-1-1">
          <title>We have evaluated the models on the ANTE task both</title>
          <p>
            with automatic and human evaluation metrics. We have
used the automatic metric to have a proxy for
performance estimates during the development of the models,
i.e. before the resource-demanding human evaluation. As
5.1. In-Context Learning an automatic evaluation metric, we have used the BLEU
In-context learning, or few-shot learning, is a technique 1 score [44]. Regarding the human evaluation, we have
in which the model can learn from a few examples pro- adopted a human evaluation protocol developed for
evalvided in the context [
            <xref ref-type="bibr" rid="ref23">31</xref>
            ]. In our case, five pairs (5-shot) uating dialogue models in a reproducible and comparable
of narratives and corresponding eliciting responses are way [45]. From this, we have used the Appropriateness,
given to the model. In particular, we have used the same Contextualization and Correctness metrics5. Each metric
examples written in the guidelines for collecting the is translated into a question to which the annotators can
dataset. answer Yes, No, or I don’t know. Furthermore, the
annotaThe input to the model is formalized as: tors can provide explanations for a negative answer for
some metrics. For contextualization, the annotators can
 ⊕ { 11, 11 ⊕ ... ⊕ 15, 15} ⊕  justify their negative answer with wrong or no references
to the grounding context representing hallucination and
where I are the instructions for the model, ⊕ is the genericness, respectively.
concatenation with the new line (\n),  1, 1 are i-shot While the proposed metrics are enough for evaluating
example of the narrative and the corresponding eliciting generic dialogue models, we need specific criteria for
response at the first turn of the dialogue, N is the input better evaluating the models on our task. Specifically,
narrative that the model should generate the response to. we introduced Efectiveness and Compliance. Efectiveness
The beginning of the narrative and the response are indi- evaluates whether the response is efective in helping the
cated with two marker tokens, namely “Narrative:” and narrator continue with the narration naturally. The two
“Response:” 4. We have also experimented with adding possible explanations for being an inefective response
the annotation guidelines before the instructions for the are that the question is either generic (generic question) or
model, but observed only an increase in inference time complex (complex question), which means the narrators
and not in performance. will have dificulties in answering that question.
Diferent from the genericness in contextualization, a generic
5.2. Fine-tuning response can still be efective when the context is not
enough for asking a more specific question. Compliance
In training, the input sequences consist of a narrative evaluates whether the response is compliant with the
and the corresponding eliciting response, concatenated annotation guidelines, i.e. it has the properties listed in
with the new line (\n). Additionally, we add two marker Section 4.
tokens to the input prompt to indicate the beginning of Additionally, in the HE, we have added ground truth
the narrative and the response, respectively. eliciting responses along with those generated as a point
Formally, the input sequence is: of reference and an additional control step [45].
Moreover, as for the data collection, we have split the
evalua  :  ⊕  :  tions into batches of five narratives. Each batch has been
annotated by five crowd workers hired via Prolific and
where N is the narrative, ⊕ is the concatenation with paid £9 per hour. Furthermore, we used an overlap of 20%
the new line and R is the corresponding eliciting response. to compute the agreement, whose overall score is 0.34
In fine-tuning the open-source LLMs, the input of the measured with Fleiss’ [
            <xref ref-type="bibr" rid="ref29">46</xref>
            ], showing a fair agreement.
autoregressive models is as described above, while for
the sequence-to-sequence IT5 model, the input to the
encoder and decoder is narrative and eliciting response,
respectively. All the hyperparameters used to fine-tune
and test the models are reported in Appendix A.
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4An example of a real prompt is reported in Appendix A in Table 5.</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>5Appropriateness whether the response makes sense w.r.t the dia</title>
          <p>logue history; Contextualization whether the response contains
references to the dialogue context; Correct whether the response is
grammatically and syntactically correct.
6.2. Automatic Evaluation thermore, while fine-tuning the models on the merged
and synthetic datasets always degrades the performance
Table 2 reports the BLEU 1 score for each model at- measured on the gold test set, it generally increases the
tained with in-context learning and fine-tuning on crowd- scores on the silver test set. Finally, Llama3 8B fine-tuned
sourced, merged and synthetic datasets. As ground truth, on the synthetic dataset achieves the best BLEU score on
we use both gold and silver eliciting responses coming the silver test set.
from the crowdsourced and synthetic test sets, respec- According to these results, Llama3 8B and IT5 should
tively. have similar performance on the ANTE task.
Notwith</p>
          <p>
            From the results of the in-context learning experi- standing, recent studies have shown that automatic
ments, we observe that GPT-4 outperforms all the other metrics are poorly correlated with human judgement
models by efectively leveraging the provided examples [
            <xref ref-type="bibr" rid="ref30 ref31">47, 48, 45</xref>
            ]. For this reason, we have used human
evaluawith few shots. Fine-tuned on the crowdsourced dataset, tion to have a more realistic representation of the LLMs’
Vicuna 13B and IT5 outperform GPT-4 with ICL, achiev- performance.
ing the highest results on the gold test set overall.
Fur
          </p>
          <p>Appropriateness</p>
          <p>Contextualization
ILC Correctness</p>
          <p>Compliance</p>
          <p>Efectiveness
.s Appropriateness
dw Contextualization
ro Correctness
.C Compliance
T
F Efectiveness
d Appropriateness
eg Contextualization
r
e Correctness
.TM Compliance
F Efectiveness
ic Appropriateness
t
eh Contextualization
tn Correctness
y
.S Compliance
FT Efectiveness</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>6.3. Human Evaluation</title>
        <sec id="sec-4-2-1">
          <title>The results of the human evaluation are presented in</title>
          <p>Table 3. Similarly to the automatic evaluation, the
table shows the results achieved with ICL and fine-tuning
on crowdsourced, merged and synthetic datasets. The
values represent the percentage of eliciting responses
that received a positive evaluation for the corresponding
metric. Considering the limited size of the test set (57
examples) and the unavoidable subjectivity and ambiguity
in the evaluation process, the results are compared with a
coarse margin that we empirically set to ± 5. Along with
manual inspection, this is also supported by the
percentage of “I don’t know” options, catching the ambiguous
cases, which ranges from 3.5% for human reference to
9.1% for Vicuna 13B on average.</p>
          <p>The results in the ICL setting show that the ANTE task
is challenging also for crowd workers (human reference)
who in some cases could not refrain from giving
suggestions or asking for personal information (e.g. What’s
the name of your kid?). Moreover, GPT-4 achieves
onpair performance with human annotators on all metrics
but compliance since the model gave suggestions
similar to the human reference. Given the overall positive
scores, we have used GPT-4 to generate the synthetic
data. Regarding the other models, the gap with human
reference is overall large. Only LLaMAntino 13B and
Vicuna 13B achieve decent performance on the two
taskspecific metrics compliance and efectiveness . Moreover,
the scores on correctness suggest that only LLaMAntino
13B and GPT-4 can properly handle the Italian language
in this task without fine-tuning.</p>
          <p>Fine-tuning especially boosts the performance of</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>IT5 and Llama3 8B, while more contained improvements</title>
          <p>are observed for LLaMAntino 13B and Vicuna 13B.
Moreover, LLaMAntino 13B and Llama3 8B achieve their best
results when fine-tuned on the synthetic dataset, whilst
IT5 and Vicuna 13B perform the best when fine-tuned
on the merged dataset. In particular, Llama3 8B
finetuned on the synthetic dataset attains an improvement of
35% on average w.r.t. ICL results, outperforming all the
other open-source LLMs and matching the performance
on the task-specific metrics of human annotators and
GPT-4. Although a lower performance gain, 10% on
average, LLaMAntino 13B is the second-best model on the
ANTE task, matching GPT-4 performance on efectiveness
and correctness. Regarding the correctness metric, we can
observe that IT5 always achieves the lowest score, but
on the merged dataset, despite being pre-trained on a
corpus in the Italian language.</p>
          <p>All in all, fine-tuning with synthetic data (either
merged or synthetic datasets) improves the performance
of almost all the models. Indeed, the scores of the
taskspecific metrics achieved by fine-tuning the models on
the crowdsourced dataset are lower on average than
those achieved with merged and synthetic datasets. A
possible explanation for these improvements is that the
merged dataset is larger; therefore, a small model such
as IT5 (220M parameters) benefits from this.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>6.4. Error Analysis</title>
        <sec id="sec-4-3-1">
          <title>Since the human evaluation has shown that GPT-4 matches the Human Reference’s (HR) performance, we have run some analysis to characterize the similarities and diferences better. We have started by manually com</title>
          <p>
            paring the eliciting responses of GPT-4 and HR. In this, that the cases of hallucination and genericness on the
we observed that GPT-4 tends to use paraphrased parts synthetic dataset are minimized compared to fine-tuning
of the narrative in the feedback and question parts of on the crowdsourced dataset. The improvement is even
the eliciting response. Indeed, the Jaccard similarity [
            <xref ref-type="bibr" rid="ref32">49</xref>
            ] more evident comparing the errors of IT5 fine-tuned on
between the narrative and the eliciting response6 on av- crowdsourced and merged datasets, where the number of
erage is 13% for GPT-4 and 7% for HR. After that, we generic questions is halved, and the hallucination cases
investigate whether there is a challenging set of exam- decrease by 11%. All in all, we can observe that the major
ples on which both models make errors by considering source of errors for contextualization and efectiveness is
an eliciting response wrong when it received negative due to either hallucination or genericness, regardless of
feedback on at least one metric. The intersection of the the dataset used during fine-tuning.
errors is only the 7% of the narrative, while the cases We have investigated whether the performance gap
in which HR is correct and GPT-4 is wrong are 20% and between fine-tuning on crowdsourced and synthetic
vice versa are 13%. By analysing all these errors man- datasets is due to a diference in the learning
complexually, we observed that in some cases GPT-4 deducted ity. In other words, learning from synthetic data may be
the context wrongly such as “I was having a cofee with easier than learning from human-generated data. Our
a colleague and we were talking about Christmas when...” rationale is that the distribution learned by LLMs, during
and the model asked7 “Have you already decided what to pre-training, is more similar to the distribution of
syngift for Christmas?”. Overall, one of the main issues is thetic data than that of human-generated data. This is
due to suggestions or requests for personal information because LLMs are based on similar architectures, and the
negatively afecting the performance on appropriateness relative pre-training datasets may overlap. For this, we
and compliance. have used the entrainment statistic because of the
dif
          </p>
          <p>
            The distributions of the explanations that annotators ferent vocabularies, making measuring the distribution
gave to justify their negative evaluations for the metrics distance challenging. Entrainment is the phenomenon
contextualization (wrong and no references) and efective- in which, during a conversation, a speaker reuses the
ness (complex or generic questions) are depicted in Figure terms of the other interlocutor [
            <xref ref-type="bibr" rid="ref33">50</xref>
            ]. This phenomenon
2. HR and GPT-4 errors are reported as references in may also be seen during the training process, where a
all plots. We can observe that HR is penalized on con- model learns to use the same language as the training set.
textualization and efectiveness due to genericness in the We have measured the entrainment using the formula
responses. On the GPT-4 side, the negative score on efec- proposed by Hirschberg et al. [
            <xref ref-type="bibr" rid="ref34">51</xref>
            ], which is:
tiveness is mainly due to complex questions. Furthermore,
the percentage of errors classified as wrong references is   () = − ∑∑︀︀∈∈ ||11 (())−+22 (())|| (1)
zero for both HR and GPT-4, meaning that GPT-4 does not
hallucinate in this task. The opposite is observed in the
ICL experiments where Llama3 8B has been penalized on
contextualization mainly due to wrong references, i.e., the
model hallucinated some part of the eliciting response.
          </p>
          <p>Moreover, for the same model, the efectiveness score is
negatively afected by many generic questions. As for
human evaluation, the distributions of the errors show
that fine-tuning the models improves the performance,
especially with synthetic data. In this, we can observe
where  is a target word class and  is the
frequency of the word  used by the model 1 and the test
set responses 2. The resulting score ranges between
0 (perfect match) and -1 (mismatch). We used the 100
most frequent words computed on the joint responses
generated by 1 and 2.</p>
          <p>Specifically, as 1, we have used the responses
generated by either Llama3 8B or LLaMAntino 13B8 fine-tuned
on crowdsourced (FTC) and synthetic (FTS) datasets. As
2, we have used the responses either in the
crowdsourced (CT) or the synthetic (ST) test sets. From Table</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>6From both, we removed the stopwords and lemmatized the rest.</title>
          <p>7In this case, the model wrongly inferred that Christmas is yet to
come, which is impossible to say by looking at the context only.
The model should have focused on other parts of the narrative.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>8The two best-performing models.</title>
        </sec>
        <sec id="sec-4-3-4">
          <title>To test the models in a real-case scenario, we have de</title>
          <p>
            veloped a Virtual Reality (VR) system for the collection
of personal narratives. The collection follows the same
procedure as depicted in Figure 1, which starts with an
eliciting prompt and is followed by a conversation
between a narrator and an embodied conversational agent.
The system consists of an automatic speech recognition
[
            <xref ref-type="bibr" rid="ref35">52</xref>
            ] model, a conversational agent based on our
bestperforming LLM (Llama3 8B), which generates eliciting
responses, and a text-to-speech model. To connect these
components, we have utilized an adaptation of the
architecture proposed by Yin et al. [
            <xref ref-type="bibr" rid="ref36">53</xref>
            ], which also employs
a strategy of input segmentation to minimize response
latency. After some internal tests, we have observed that
the dialogue is efective and the system’s response latency
is not a major issue. However, the turn-taking strategy
is rule-based and, therefore, studying a more efective
approach would make the conversation smoother9.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>8. Conclusions</title>
      <sec id="sec-5-1">
        <title>In this work, we evaluated 5 LLMs on the Automatic</title>
        <p>NarraTive Elicitation (ANTE) task to investigate whether
the models can help us elaborate and recollect personal
events. To do this, we collected and created three corpora,
namely crowdsourced, merged, and synthetic. Then, we
evaluated closed and open-source models with in-context
learning and fine-tuning on the ANTE task. The results
show that closed-source LLMs can perform similarly to
human annotators and that fine-tuned open-source LLMs
on synthetic data can achieve similar performance. This
suggests that LLMs may be used to support individuals
in recollecting and elaborating on personal events.</p>
        <p>A future work is to study the efectiveness of LLMs
in collecting personal narratives compared to standard
techniques such as textual stimuli or interviews in a
random controlled trial setting. Another is to study how to
instruct the model to steer the conversation toward
specific events relevant to the researchers or professionals
collecting the narratives.</p>
      </sec>
      <sec id="sec-5-2">
        <title>9A demo of this system can be found at https://www.youtube.com/</title>
        <p>watch?v=ozpuoEKsTjs
Acknowledgments</p>
      </sec>
      <sec id="sec-5-3">
        <title>4, we can observe that the entrainment scores computed</title>
        <p>between FTC and CT are lower than those computed
between FTS and ST. Thus, the fine-tuned models are more We acknowledge the support of the MUR PNRR project
aligned with the language of the synthetic dataset than FAIR - Future AI Research (PE00000013) and the MUR
the natural language found in the crowdsourced dataset, PNRR project iNEST- Interconnected Nord-Est
Innovasuggesting that learning from the synthetic data is easier. tion Ecosystem (ECS00000043) funded by the European
Union under NextGenerationEU. Views and opinions
expressed are however those of the author(s) only and do
7. Personal Narratives in VR not necessarily reflect those of the European Union or
The European Research Executive Agency. Neither the
European Union nor the granting authority can be held
responsible for them.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Appendix</title>
      <sec id="sec-6-1">
        <title>A.1. Hyperparameters</title>
        <sec id="sec-6-1-1">
          <title>We used a batch size of 8 for the fine-tuning. The models</title>
          <p>
            were fine-tuned for 10 epochs with early stopping based
on the perplexity computed on the development set. We
have trained the autoregressive models, Vicuna 13B,
LLaMAntino 13B, Llama3 8B, in an auto-regressive manner
with Adam [
            <xref ref-type="bibr" rid="ref37">54</xref>
            ] optimizer. The models were fine-tuned
using Low-Rank Adaptation (LoRA) [43], i.e. a method
for fine-tuning large-scale LLMs, which reduces the
number of trainable parameters. We set the learning rate to
Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Grammarly in order
to: Paraphrase and reword and Grammar and spelling check. After using these tool(s)/service(s), the
author(s) reviewed and edited the content as needed and take(s) full responsibility for the
publication’s content.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. B.</given-names>
            <surname>Connery</surname>
          </string-name>
          ,
          <article-title>A sourcebook of american literary journalism: representative writers in an emerging genre (</article-title>
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hobbs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Davis</surname>
          </string-name>
          , Narrative pedagogies in science,
          <source>mathematics and technology, Res. Sci. Educ</source>
          .
          <volume>43</volume>
          (
          <year>2013</year>
          )
          <fpage>1289</fpage>
          -
          <lpage>1305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Shiller</surname>
          </string-name>
          ,
          <article-title>Narrative economics: How stories go viral and drive major economic events</article-title>
          , Princeton University Press,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>K. D'Cruz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Douglas</surname>
          </string-name>
          , T. Serry,
          <article-title>Personal narrative approaches in rehabilitation following traumatic brain injury: A synthesis of qualitative research</article-title>
          ,
          <source>Neuropsychological Rehabilitation</source>
          <volume>29</volume>
          (
          <year>2019</year>
          )
          <fpage>985</fpage>
          -
          <lpage>1004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. N.</given-names>
            <surname>Wiesepape</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Lysaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Queller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Lysaker</surname>
          </string-name>
          ,
          <article-title>Personal narratives and the pursuit of purpose and possibility in psychosis: directions for developing recovery-oriented treatments</article-title>
          ,
          <source>Expert Review of Neurotherapeutics</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>525</fpage>
          -
          <lpage>534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Botting</surname>
          </string-name>
          ,
          <article-title>Narrative as a tool for the assessment of linguistic and pragmatic impairments, Child language teaching</article-title>
          and therapy
          <volume>18</volume>
          (
          <year>2002</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Danieli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ciulli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          , G. Silvestri,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barbato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Natale</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Riccardi, Assessing the impact of conversational artificial intelligence in the treatment of stress and anxiety in aging adults: randomized controlled trial</article-title>
          ,
          <source>JMIR mental health 9</source>
          (
          <year>2022</year>
          )
          <article-title>e38067</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Roccabruna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          , G. Riccardi,
          <article-title>Understanding emotion valence is a joint deep learning task</article-title>
          ,
          <source>in: Proceedings of the 13th Workshop on Computational Approaches</source>
          to Subjectivity, Sentiment, &amp;
          <string-name>
            <surname>Social Media Analysis</surname>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tammewar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cervone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-M.</given-names>
            <surname>Messner</surname>
          </string-name>
          , G. Riccardi,
          <article-title>Annotation of emotion carriers in personal narratives</article-title>
          ,
          <source>in: Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>1517</fpage>
          -
          <lpage>1525</lpage>
          . URL: https://aclanthology.org/ URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <year>341</year>
          .
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .189. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>341</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Sarbin</surname>
          </string-name>
          ,
          <article-title>The narrative as a root metaphor for</article-title>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Achiam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , I. Akkaya, psychology, Narrative psychology: The storied
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Aleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Altenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Altnature of human conduct (</article-title>
          <year>1986</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          . man,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anadkat</surname>
          </string-name>
          , et al.,
          <source>Gpt-4 technical report,</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>U.</given-names>
            <surname>Neisser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fivush</surname>
          </string-name>
          ,
          <source>The remembering self: Con- arXiv preprint arXiv:2303.08774</source>
          (
          <year>2023</year>
          ).
          <article-title>struction and accuracy in the self-narrative, 6</article-title>
          , Cam- [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xiong</surname>
          </string-name>
          , Generating highly relevant quesbridge University Press,
          <year>1994</year>
          . tions,
          <source>in: Proceedings of the 2019 Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mills</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>D'Mello, On the validity of the auto- on Empirical Methods in Natural Language Probiographical emotional memory task for emotion cessing and the 9th International Joint Conferinduction</article-title>
          ,
          <source>PloS one 9</source>
          (
          <year>2014</year>
          )
          <article-title>e95837. ence on Natural Language Processing (EMNLP-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J. M. Williams</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Broadbent</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autobiographical</surname>
            <given-names>IJCNLP</given-names>
          </string-name>
          ),
          <article-title>Association for Computational Linguismemory in suicide attempters</article-title>
          .,
          <source>Journal of abnormal tics, Hong Kong, China</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>5983</fpage>
          -
          <lpage>5987</lpage>
          . URL: psychology
          <volume>95</volume>
          (
          <year>1986</year>
          )
          <article-title>144</article-title>
          . https://aclanthology.org/D19-1614. doi:
          <volume>10</volume>
          .18653/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>Remembering our past: Studies in auto-</article-title>
          v1/
          <fpage>D19</fpage>
          -1614. biographical memory, Cambridge University Press, [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <year>1999</year>
          . I.
          <string-name>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <source>Language models are unsuper-</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>R. J. McNally</surname>
            ,
            <given-names>N. B.</given-names>
          </string-name>
          <string-name>
            <surname>Lasko</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          <string-name>
            <surname>Macklin</surname>
          </string-name>
          , R. K. Pit- vised multitask learners,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          . man, Autobiographical memory disturbance in [25]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rosset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          , combat
          <article-title>-related posttraumatic stress disorder, Be- Zero-shot clarifying question generation for conhaviour research</article-title>
          and therapy
          <volume>33</volume>
          (
          <year>1995</year>
          )
          <fpage>619</fpage>
          -
          <lpage>630</lpage>
          . versational search,
          <source>in: Proceedings of the ACM</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Borrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dall'Ora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Della</given-names>
            <surname>Sala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marinelli</surname>
          </string-name>
          ,
          <source>Web Conference</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>3288</fpage>
          -
          <lpage>3298</lpage>
          . H. Spinnler,
          <article-title>Autobiographical memory</article-title>
          . sensitiv- [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ghazvininejad, ity to age and education of a standardized enquiry, A</article-title>
          .
          <string-name>
            <surname>Mohamed</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          , L.
          <source>ZettlePsychological Medicine</source>
          <volume>19</volume>
          (
          <year>1989</year>
          )
          <fpage>215</fpage>
          -
          <lpage>224</lpage>
          . moyer, BART:
          <article-title>Denoising sequence-to-sequence</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>M. D. Kopelman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wilson</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <string-name>
            <surname>Baddeley</surname>
          </string-name>
          ,
          <article-title>The pre-training for natural language generation, transautobiographical memory interview: a new assess- lation, and comprehension, in: Proceedings of the ment of autobiographical and personal semantic 58th Annual Meeting of the Association for Commemory in amnesic patients</article-title>
          ,
          <source>Journal of clinical putational Linguistics</source>
          ,
          <source>Association for Computaand experimental neuropsychology 11</source>
          (
          <year>1989</year>
          )
          <fpage>724</fpage>
          - tional Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          . URL:
          <volume>744</volume>
          . https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          . doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Svoboda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Hay</surname>
          </string-name>
          , G. Winocur,
          <volume>18653</volume>
          /v1/
          <year>2020</year>
          .acl-main.703.
          <string-name>
            <surname>M. Moscovitch</surname>
            , Aging and autobiographical mem- [27]
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , X. Ma, ory
          <article-title>: dissociating episodic from semantic retrieval</article-title>
          .,
          <source>Educational question generation of children stoPsychology and aging 17</source>
          (
          <year>2002</year>
          )
          <article-title>677. rybooks via question type distribution learning</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Inan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Upasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rungta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <article-title>and event-centric summarization</article-title>
          , in: ProceedY. Mao,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tontchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fuller</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Testuggine, ings of the 60th Annual Meeting of the Associaet al</article-title>
          .,
          <article-title>Llama guard: Llm-based input-output safe- tion for Computational Linguistics (Volume 1: Long guard for human-ai conversations</article-title>
          ,
          <source>arXiv preprint Papers)</source>
          ,
          <source>Association for Computational LinguisarXiv:2312.06674</source>
          (
          <year>2023</year>
          ). tics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>5073</fpage>
          -
          <lpage>5085</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rebedea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Sreedhar</surname>
          </string-name>
          , C. Parisien, https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>348</volume>
          . doi:10. J. Cohen,
          <article-title>Nemo guardrails: A toolkit for</article-title>
          control-
          <volume>18653</volume>
          /v1/
          <year>2022</year>
          .
          <article-title>acl-long.348. lable and safe llm applications with programmable</article-title>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaur</surname>
          </string-name>
          , K. Roy, rails, in: Proceedings of the 2023 Conference on V.
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kumaraguru</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sheth</surname>
          </string-name>
          ,
          <article-title>Learning Empirical Methods in Natural Language Processing: to automate follow-up question generation using System Demonstrations,</article-title>
          <year>2023</year>
          , pp.
          <fpage>431</fpage>
          -
          <lpage>445</lpage>
          .
          <article-title>process knowledge for depression triage on reddit</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Zhang, posts, in: Proceedings of the Eighth Workshop on K. Gai,
          <article-title>DialogBench: Evaluating LLMs as human- Computational Linguistics and Clinical Psychology, like dialogue systems</article-title>
          , in: K. Duh,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <year>2022</year>
          , p.
          <fpage>137</fpage>
          . S. Bethard (Eds.),
          <source>Proceedings of the 2024</source>
          Con- [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <article-title>LLMference of the North American Chapter of the powered data augmentation for enhanced crossAssociation for Computational Linguistics: Hu- lingual performance</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
          </string-name>
          ,
          <source>J. Pino, man Language Technologies</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 ConPapers)</source>
          ,
          <article-title>Association for Computational Linguis- ference on Empirical Methods in Natural Lantics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>6137</fpage>
          -
          <lpage>6170</lpage>
          . guage Processing, Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>671</fpage>
          -
          <lpage>686</lpage>
          . standards, Springer,
          <year>2017</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>135</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .emnlp-main.
          <volume>44</volume>
          . [37]
          <string-name>
            <given-names>G.</given-names>
            <surname>Roccabruna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cervone</surname>
          </string-name>
          , G. Riccardi, Multifuncdoi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>emnlp-main.44. tional iso standard dialogue act tagging in italian,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          , in: CLiC-it,
          <year>2020</year>
          .
          <article-title>Controllable dialogue simulation with in-context [38] J</article-title>
          .,
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Kelley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <article-title>An iterative design learning</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Zhang methodology for user-friendly natural language of(Eds.), Findings of the Association for Computa- ifce information applications</article-title>
          ,
          <source>ACM Trans. Inf. Syst. tional Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for 2 (</article-title>
          <year>1984</year>
          )
          <fpage>26</fpage>
          -
          <lpage>41</lpage>
          . URL: https://api.semanticscholar.
          <source>Computational Linguistics</source>
          , Abu Dhabi, United org/CorpusID:207660078.
          <string-name>
            <surname>Arab</surname>
            <given-names>Emirates</given-names>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>4330</fpage>
          -
          <lpage>4347</lpage>
          . URL: https:// [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J.</surname>
          </string-name>
          et al.,
          <article-title>The llama 3 aclanthology</article-title>
          .org/
          <year>2022</year>
          .findings-emnlp.
          <volume>318</volume>
          . doi: 10. herd of models,
          <year>2024</year>
          . URL: https://arxiv.org/abs/ 18653/v1/
          <year>2022</year>
          .findings-emnlp.
          <volume>318</volume>
          . 2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            et al.,
            <surname>Language</surname>
          </string-name>
          mod- [40]
          <string-name>
            <surname>W.-L. Chiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <article-title>Wu, els are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , L. Zheng,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>J. E. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
            , H. Lin (Eds.), Gonzalez,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Stoica</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          <string-name>
            <surname>Xing</surname>
          </string-name>
          ,
          <source>Vicuna: An openAdvances in Neural Information Processing source chatbot impressing gpt-4</source>
          with 90%* chatSystems, volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc., gpt quality,
          <year>2023</year>
          . URL: https://lmsys.org/blog/ 2020, pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings. 2023-
          <volume>03</volume>
          -30-vicuna/. neurips.cc/paper_files/paper/2020/file/ [41]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , E. Musacchio,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          , L. Siciliani, 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. G. Fiameni, G. Semeraro, Llamantino: Llama 2
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>N. Ashok</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <article-title>Improving socratic ques- models for efective text generation in italian lantion generation using data augmentation and pref</article-title>
          - guage,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2312.09993. erence optimization, in: E. Kochmar,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bexte</surname>
          </string-name>
          , arXiv:
          <fpage>2312</fpage>
          .09993.
          <string-name>
            <surname>J. Burstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Horbach</surname>
            , R. Laarmann-Quante, [42]
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sarti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Nissim, IT5: Text-to-text pretraining for A</article-title>
          .
          <string-name>
            <surname>Tack</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Yaneva</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          Yuan (Eds.),
          <article-title>Proceedings Italian language understanding and generation</article-title>
          ,
          <source>in: of the 19th Workshop on Innovative Use of NLP N. Calzolari</source>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hoste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Sakti, for Building Educational Applications (BEA</article-title>
          <year>2024</year>
          ), N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint InAssociation for Computational Linguistics</source>
          , Mex- ternational
          <source>Conference on Computational Linguisico City, Mexico</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>118</lpage>
          . URL:
          <article-title>https: tics, Language Resources and Evaluation (LREC//aclanthology</article-title>
          .org/
          <year>2024</year>
          .bea-
          <volume>1</volume>
          .10.
          <article-title>COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Alma- pp.
          <fpage>9422</fpage>
          -
          <lpage>9433</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . hairi, Y. Babaei,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bashlykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Batra</surname>
          </string-name>
          , P. Bhargava, lrec-main.823.
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhosale</surname>
          </string-name>
          , et al.,
          <source>Llama</source>
          <volume>2</volume>
          : Open foundation and fine- [43]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>tuned chat models</article-title>
          ,
          <source>arXiv preprint arXiv:2307</source>
          .09288
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora: Low-rank
          <string-name>
            <surname>adap</surname>
          </string-name>
          (
          <year>2023</year>
          ).
          <source>tation of large language models</source>
          ,
          <year>2021</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cervone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Danieli</surname>
          </string-name>
          , G. Riccardi, //arxiv.org/abs/2106.09685. arXiv:
          <volume>2106</volume>
          .09685.
          <article-title>Would you like to tell me more? generating a</article-title>
          [44]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , W.-J. Zhu,
          <article-title>Bleu: a corpus of psychotherapy dialogues, in: Proceed- method for automatic evaluation of machine transings of the Second Workshop on Natural Language lation, in: Proceedings of the 40th Annual MeetProcessing for Medical Conversations, Association ing on Association for Computational Linguistics, for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <source>ACL '02</source>
          ,
          <article-title>Association for Computational Linguis1-9</article-title>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .nlpmc-
          <volume>1</volume>
          .1. tics, USA,
          <year>2002</year>
          , p.
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . URL: https://doi.org/ doi:10.18653/v1/
          <year>2021</year>
          .nlpmc-
          <volume>1</volume>
          .1. 10.3115/1073083.1073135. doi:
          <volume>10</volume>
          .3115/1073083.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Roccabruna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tammewar</surname>
          </string-name>
          , S. Az-
          <volume>1073135</volume>
          . zolin, G. Riccardi, Can emotion carriers explain au- [45]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          , G. Roccabruna,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lorandi</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Caltomatic sentiment prediction? a study on personal darella, G. Riccardi, Evaluation of response generanarratives</article-title>
          ,
          <source>in: Proceedings of the 12th Workshop on tion models: Shouldn't it be shareable and repliComputational Approaches</source>
          to Subjectivity, Senti- cable?, in: A.
          <string-name>
            <surname>Bosselut</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Chandu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Dhole, ment &amp; Social Media Analysis, Association for Com- V.</article-title>
          <string-name>
            <surname>Gangal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gehrmann</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Jernite</surname>
          </string-name>
          , J. Novikova, putational Linguistics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>L</lpage>
          . Perez-Beltrachini (Eds.),
          <source>Proceedings of the 2nd 70</source>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .wassa-
          <volume>1</volume>
          .6. Workshop on Natural Language Generation, Evaludoi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .wassa-
          <volume>1</volume>
          .6. ation, and
          <string-name>
            <surname>Metrics</surname>
          </string-name>
          (GEM), Association for Com-
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petukhova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Traum</surname>
          </string-name>
          , J. Alexanders- putational
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , Abu Dhabi,
          <article-title>United Arab son, Dialogue act annotation with the iso 24617-2 Emirates (Hybrid</article-title>
          ),
          <year>2022</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>147</lpage>
          . URL:
          <article-title>https: standard, in: Multimodal interaction with W3C //aclanthology</article-title>
          .org/
          <year>2022</year>
          .gem-
          <volume>1</volume>
          .12. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2022</year>
          .gem-
          <volume>1</volume>
          .
          <fpage>12</fpage>
          . 1e
          <article-title>− 5, rank and alpha parameters to 128</article-title>
          . We have used
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Fleiss</surname>
          </string-name>
          ,
          <article-title>Measuring nominal scale agreement the top-k sampling strategy to generate the new tokens among many raters., Psychological bulletin 76 with k set to 10. The IT5 model was fully fine-tuned (</article-title>
          <year>1971</year>
          )
          <article-title>378. with Adafactor [55] optimizer. We have used a beam</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Belz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Howcroft</surname>
          </string-name>
          ,
          <article-title>Disentangling the search with four beams as a decoding strategy. To run properties of human evaluation methods: A clas- our experiments, we used a machine with two Nvidia sification system to support comparability, meta- 3090 with 24GB and an Nvidia A100 with 80GB. Overall, evaluation and reproducibility testing, in: Proceed- the training time for each experiment was less than 30 ings of the 13th International Conference on Natu- minutes, and the inference time was less than 15 minutes. ral Language Generation, Association for Computational Linguistics</article-title>
          , Dublin, Ireland,
          <year>2020</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .inlg-
          <volume>1</volume>
          .
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [48]
          <string-name>
            <surname>A. B. Sai</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          <string-name>
            <surname>Mohankumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Khapra</surname>
          </string-name>
          ,
          <article-title>A survey of evaluation metrics used for nlg systems</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 55</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaccard</surname>
          </string-name>
          ,
          <article-title>Nouvelles recherches sur la distribution lforale</article-title>
          ,
          <source>Bull. Soc. Vaud. Sci. Nat</source>
          .
          <volume>44</volume>
          (
          <year>1908</year>
          )
          <fpage>223</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Brennan</surname>
          </string-name>
          , et al.,
          <article-title>Lexical entrainment in spontaneous dialog</article-title>
          ,
          <source>Proceedings of ISSD 96</source>
          (
          <year>1996</year>
          )
          <fpage>41</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [51]
          <string-name>
            <surname>J. B. Hirschberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gravano</surname>
          </string-name>
          ,
          <article-title>High frequency word entrainment in spoken dialogue (</article-title>
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>J.</given-names>
            <surname>Grosman</surname>
          </string-name>
          ,
          <article-title>Fine-tuned XLSR-53 large model for speech recognition in Italian</article-title>
          , https://huggingface.co/jonatasgrosman/ wav2vec2-large-xlsr-53-italian,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Roccabruna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Azad</surname>
          </string-name>
          , G. Riccardi,
          <article-title>Let's give a voice to conversational agents in virtual reality</article-title>
          ,
          <source>in: Proceedings of Interspeech</source>
          <year>2023</year>
          , Dublin, Ireland,
          <year>2023</year>
          , pp.
          <fpage>5247</fpage>
          -
          <lpage>5248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1412. 6980. arXiv:
          <volume>1412</volume>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stern</surname>
          </string-name>
          , Adafactor:
          <article-title>Adaptive learning rates with sublinear memory cost</article-title>
          , in: J.
          <string-name>
            <surname>Dy</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Krause (Eds.),
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>4596</fpage>
          -
          <lpage>4604</lpage>
          . URL: https://proceedings.mlr. press/v80/shazeer18a.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>