<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ExtremITA at EVALITA 2023: Multi-Task Sustainable Scaling to Large Language Models at its Extreme</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudiu D. Hromei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Croce</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Basili</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Roma Tor Vergata</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper explores the potential application of a monolithic neural model for all tasks in EVALITA 2023. We evaluated two models: extremIT5, an encoder-decoder model, and extremITLLaMA an instruction-tuned Decoder-only Large Language Model, specifically designed for handling Italian instructions. Our approach revolves around representing tasks in natural language, where we provide instructions to the model using prompts that define the expected responses. Remarkably, our best-performing model achieved first place in 41% of the subtasks and showcased top-three performance in 64%. These subtasks encompass various semantic dimensions, including Afect Detection, Authorship Analysis, Computational Ethics, Named Entity Recognition, Information Extraction, and Discourse Coherence.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License (64%). The adopted LLMs (especially LLaMA-based)
proCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) posed solution strongly supports the viability and high
performance of a single monolithic architecture, as it scenarios. Recently, this model was applied to hundreds
only requires modeling the tasks in natural language us- of tasks in [24], while in [4] a systematic pre-training at
ing prompts. This approach has been further reinforced large scale demonstrates the efectiveness within
“zeroby recent work [24], which indicates the same direction. shot" or “few-shot" learning scenarios. In this paper, the</p>
      <p>In the rest of the paper, Section 2 describes the adopted ifrst approach we adopted is based on T5, pre-trained on
LLMs. Section 3 provides the results, accompanied by a Italian texts, namely IT5 [3].
brief error analysis. Finally Section 4 derives the conclu- On the other hand, Decoder models are typically
sions. trained to be triggered by text, such as a natural
language request or a piece of text intended for processing.</p>
      <p>These models generate text one word at a time,
produc2. Multi-task prompting in ing an output that can be an answer to a question or a
ExtremITA solution to the given tasks or requests. Such models have
the ability to essentially follow instructions, as
exempliThe Transformer architecture [25] can be divided into ifed by the recent release of ChatGPT. This characteristic
two main components, each giving rise to distinct fami- holds a greater appeal, as tasks can be linguistically
delies of models. The encoder, exemplified by BERT [ 26], scribed using prompts, where the input sentence serves
RoBERTa [27], and DeBERTa [28], is responsible for en- as contextual information. InstructGPT [30] is an
extencoding input sequences and generating meaningful rep- sion of the GPT [6] language model explicitly designed to
resentations (embeddings) using the self-attention mech- excel in multi-task scenarios when used with prompts. It
anism. On the other hand, the decoder, represented by combines the power of language models with the ability
models like GPT [5], GPT3 [6], and LLaMA [7], generates to follow instructions provided in the form of natural
output sequences in an auto-regressive manner based language prompts. Unlike conventional language
modon the input and previously generated output tokens. els that generate text freely, InstructGPT is fine-tuned
Additionally, another family of models, the Encoder- using human feedback to understand and generate text
Decoder models, such as T5 [1] and BART [29], com- based on a given prompt and to select the best sequence
bine the strengths of both encoder and decoder com- that humans would have preferred. Another language
ponents. These models maintain the integration of the model that adopts this instruction-tuning technique is
two aforementioned blocks and they are usually used Alpaca [31], which builds upon the LLaMA [7]
foundain tasks like machine translation, summarization, and tional models. In the case of Alpaca, the authors created
question-answering, where complex input understand- 175 sets of instructions, input sentences, and
corresponding as transduction is required. ing outputs. These were then used to generate variations</p>
      <p>A first efective application of an Encoder-Decoder using GPT 3.5, resulting in a collection of approximately
architecture in a multi-task scenario is presented in [1]: 52, 000 instruction examples. The LLaMA model was
furin particular, the pre-training process of the so-called ther fine-tuned using this extensive dataset, a process
reT5 involves training the model on a large corpus of di- ferred to as instruction-tuning. The outcome of this efort
verse text data, which consists of a wide range of sources was the Stanford Alpaca [31] as an instruction-following
such as books, articles, and websites, but also texts in- LLaMA model. More recently, an Italian counterpart
volved in machine translation, classification and regres- called Camoscio [32] has overgone a similar
intructionsion tasks. During pre-training, T5 utilizes a denoising tuning to Alpaca but on Italian data, essentially serving
objective, similar to other popular Transformer-based as the Italian equivalent. It is based on the same LLaMA
models like BERT and GPT. The model is trained to re- model and it was instruction-tuned on the 52.000
inconstruct masked or corrupted input text, which helps it structions that were automatically translated into Italian
learn meaningful representations and capture contextual using ChatGPT as in [32]. As the size of these models
information. One of the key strengths of T5 is its ver- continues to grow, reaching trillions of parameters, there
satility. By casting various NLP tasks into a text-to-text is a need for a way to fine-tune them efectively using
format, it can be fine-tuned on a specific task simply by modest GPU resources. The technique adopted in this
providing a prefix that serves as a description of the task paper is called Low-Rank Adaptation (LoRA [33]). LoRA
and appropriate input-output pairs during fine-tuning. involves freezing the weights of the pre-trained model
In practice, such an architecture can be triggered by con- and introducing trainable rank decomposition matrices
catenating the name of the task it is trained on with an into each layer of the Transformer architecture. This
input text, and it generates in output the expected solu- approach significantly reduces the number of trainable
tion to the task, e.g., a class label in a classification task parameters for downstream tasks while avoiding
addior a text span that answers to a question. This flexibil- tional inference latency.
ity eliminates the need for task-specific architectures or To summarize, the ExtremITA approach for the
modifications, making it easier to apply T5 to diferent EVALITA challenge focuses on eficiently modeling all
available tasks using a single monolithic architecture, architectures.
based on two independently tested models:</p>
      <p>Prompt Engineering in ExtremITA. The approach
em• extremIT5, An Encoder-Decoder model, based ployed in this study draws inspiration from the original
on IT51, consisting of approximately 110 million T5 and IT5 methodologies. Similar to those approaches,
parameters. This model is trained by concatenat- each training example is converted into a text-to-text
ing the name of the task and the input sentence/- format.
paragraph in the input texts, each representing The model called extremIT5 is trained as a generic
an example from a generic EVALITA task. Its pur- T5 model. In input, each example for an individual task is
pose is to generate a piece of text that solves the given to the neural architecture as concatenated after the
target task. task name. As an example, in the ACTI A task [17, 19] the
• extremITLLaMA, an instruction-tuned Decoder- input is just “ACTI: Hanno votato tutti obbligo vaccinale,
only model, built upon the LLaMA foundational green pass, persecuzioni varie".
models2, with a total of 7 billion parameters. The The output depends on the specific task. For a
cominitial model was trained using the LoRA tech- prehensive compilation of outputs for the ExtremITA
nique on Italian translations3 of Alpaca instruc- models, please refer to Table 1. In classification tasks
tion data. This training enables the model to com- involving only one label (such as EMit B, LangLearn,
prehend instructions in Italian. After training the HaspeeDe 3, HODI A, MULTI-Fake-DetectiVE, ACTI A,
adapters, they are merged into the original model ACTI B, WiC-ITA and DisCoTEX 1) the output is just the
to create an instruction-based model (using the label of the target class. In the above example, the output
“merge” procedure from [33]). Finally, this model would be “Cospirazione” as the input text reflects some
is further fine-tuned using LoRA on instructions conspiracy theory. In some tasks, such as PoliticIT [11],
that reflect the EVALITA task. For each exam- where a text is expected to be associated with the gender
ple from EVALITA, an input text is paired with and the political inclination of the author, multiple labels
a manually crafted question that simulates an in- reflecting such diferent dimensions are used, e.g., “ uomo
struction to be solved, accurately representing sinistra centro-sinistra”. In EMit A [9] where multiple
the specific task. emotions can be triggered, these are provided as a
sequence of labels. In regression tasks, such as EmotivITA
[10] and DisCoTEX 2 [23], the output is the number to be</p>
      <sec id="sec-1-1">
        <title>The next section describes how the 22 subtasks in</title>
        <p>EVALITA are encoded as prompts to fine-tune the above</p>
      </sec>
      <sec id="sec-1-2">
        <title>1https://huggingface.co/it5/it5-eficient-small-el32 2https://huggingface.co/decapoda-research/llama-7b-hf 3https://github.com/teelinsan/camoscio/tree/main/data</title>
        <p>Task Name
EMit A
EMit B
EmotivITA</p>
        <p>Natural language instruction
“Quali emozioni sono espresse in questo testo? Puoi scegliere una o più emozioni tra ’rabbia’, ’anticipazione’, ’disgusto’,
’paura’, ’gioia’, ’amore’, ’tristezza’, ’sorpresa’, ’fiducia’, o ’neutro’."
“Di cosa parla il testo, tra ’direzione’, ’argomento’, ’entrambi’, ’non specificato’?"
“Scrivi quanta valenza è espressa in questo testo su una scala da 1 a 5, seguito da quanto stimolo è espresso in questo
testo su una scala da 1 a 5, seguito da quanto controllo è espresso in questo testo su una scala da 1 a 5."
“Scrivi se l’autore del testo è ’uomo’ o ’donna’, seguito dalla sua appartenenza politica tra ’destra’, ’sinistra’, ’centrodestra’,
’centrosinistra’."
“Scrivi la regione di appartenenza di chi ha scritto questo testo, seguito dalla latitudine, seguita dalla longitudine."
“Questi due testi separati da [SEP] sono presentati nell’ordine in cui sono stati scritti? Rispondi sì o no."
“In questo testo si esprime odio? Rispondi sì o no."
“In questo testo si esprime odio omotransfobico? Rispondi sì o no."
“Con quali parole l’autore del testo precedente esprime odio omotransfobico? Separa le sequenze di parole con [gap]."
“L’evento riportato nel testo è ’certamente vero’, ’probabilmente vero’, ’probabilmente falso’, o ’certamente falso’?"
“In questo testo si parla di una cospirazione? Rispondi sì o no."
“Di quale teoria cospirazionista parla questo testo, tra ’Covid’, ’Qanon’, ’Terrapiattista’, ’Russia’?"
“Scrivi le menzioni di entità nel testo, indicandone il tipo: [PER] (persona), [LOC] (luogo), [ORG] (organizzazione)."
“Trova i risultati dei test e delle misurazioni nel testo. Per ogni risultato, scrivi ’[BREL]’, seguito dal risultato seguito da
’[SEP]’, seguito dal test, seguito da ’[EREL]’. Se non trovi nessun risultato, scrivi ’[NOREL]’."
“La parola compresa tra [TGTS] e [TGTE] ha lo stesso significato in entrambe le frasi? Rispondi sì o no."
“Le due frasi precedenti, separate da ’[SEP]’, sono coerenti tra loro? Rispondi sì o no."
“Quanto è coerente questa frase, su una scala da 0 a 5?"
predicted within a specific range. In GeoLingIt [12], the [SEP] PSA [EREL]" (where 2 and 62 reflect the RML while
models are requested to determine the region of origin PSA is the test event).
of the tweet and the corresponding coordinates (latitude In contrast, as extremITLLaMA is pre-trained to
exeand longitude) based solely on the text. For instance, cute instructions, it leverages a structured prompt, which
for the extremIT5 model the task name (“GeoLingIt") comprises the textual description of the task and the
is provided, while for extremITLLaMA a more in detail specification of the desired output format. For instance,
prompt is given: “Scrivi la regione di appartenenza di when applied to the ACTI task, the instruction provided
chi ha scritto questo testo, seguito dalla latitudine, seguita is “In questo testo si parla di una cospirazione? Rispondi
dalla longitudine.". For example, if the input sentence sì o no.". The subsequent sentence to be evaluated is
apis “Daje che je ’a famo!", the model should provide the pended to this instruction. A comprehensive list of such
answer “Lazio 41.8984164 12.54514535", considering the instructions can be found in Table 2.
use of the typical Roman dialect. This particular task The decoder is thus expected to continue the sentence
combines both multi-label classification and regression, by generating the answer. In general, the same answers
as it requires determining the region (classification) and used in extremIT5 are adopted. The only exception
providing the precise coordinates (regression) simultane- concerns the following binary classification tasks
(Lanously. In HODI B [15] where the span of the ofending gLearn, HaSpeeDe 3, HODI A, ACTI A, WiC-ITA and
text is expected to be extracted, it is simply provided as DisCOTEX 1) where the instruction is only expected to
output. In NERMuD [20], the list of expected Named answer yes or no, to reduce data sparseness.
Entities is reported as a sequence of text spans, each
associated with the corresponding entity type. CLinkaRT
[21] focuses on extracting the names of medical tests 3. Experimental Results
performed on patients from an input text and linking
them to the corresponding test results, treating it as a Experimental Setup. Models were trained using
PyRelation Extraction problem. Here the relations are en- Torch, the Huggingface library and the Peft packages
coded with a slightly more complex form to summarize a to implement the LoRA technique. Both models were
list of relations, each associating an Event with a corre- trained on the unified dataset of all the tasks of EVALITA.
sponding measure (or RML); as an example, the sentence Generally, one example in an EVALITA task corresponds
to an example in our learning setting. Below are some
ex“CLinkaRT : Il PSA aumentava da 2 a 62 ng/ml." is associ- ceptions. The dataset for the ACTI task was expanded by
ated with “[BREL] 2 [SEP] PSA [EREL] [BREL] 62 ng/ml
incorporating some4 sentences from dataset B and vice ployed a batch size of 64 for extremIT5 and 32 for
versa, resulting in an increase in the number of examples extremITLLaMA. To optimize the models’ performance,
from 460 to 1, 909 for ACTI A and from 300 to 777 for a linear scheduler with warmup was applied, utilizing
ACTI B. a warmup ratio of 0.1. The extremITLLaMA model’s</p>
        <p>
          Since in CLinkaRT only (long) documents were made training process utilized LoRA to refine the , , 
available, these medical reports were segmented into and  modules of the transformer (for more details
smaller parts with a minimum of 50 characters and a max- please refer to the original paper [33]), incorporating a
imum of 30 words using the Spacy library, respecting sen- matrix rank  = 8 and a parameter  = 16 for the
tence boundaries. Moreover, we augmented this dataset LoRA matrices. The decoding strategy in the generation
with examples derived from the dataset made available phase used a beam search equal to 4, temperature of 0.2,
in TESTLINK@Ibe
          <xref ref-type="bibr" rid="ref10">rLEF 2023</xref>
          5 that contains medical re- with a top probability of 0.75 amongst the first 40
candiports in Spanish: although of a diferent language, these dates. Two Tesla T4 GPUs with 16GB of memory each
texts contain similar phenomena about events and mea- were used in parallel. This was particularly beneficial
sures that are generally language invariant and were for the extremITLLaMA model, as its training duration
useful to augment the dataset. This process significantly exceeded 144 hours. The training data was divided into
augmented the dataset, expanding it from 83 large docu- a 95% training set and a 5% validation set initially for
ments to 3, 903 shorter sentences. In general, this process hyper-parameter optimization. We release the source
recovered more than 95% of annotated relations. In the code on GitHub7 for reproducing the experiment and
case of EMit, the dataset underwent a transformation dataset generation.
where emoji representations were converted into textual
descriptions, enhancing its compatibility with language
models. GeoLingIt was modified to solve task A and task
B simultaneously, enabling a single prediction for both
tasks. For HODI B, only sentences expressing
homotransphobia were considered, resulting in a reduction from
5, 000 to 1, 914 examples. The dataset of the LangLearn
task was truncated into sentences with a maximum of
100 tokens, and additional examples with inverted
sentence pairs (by flipping the label from positive to
negative and vice versa), augmenting the dataset from 3, 377
to 6, 438 examples. In MULTI-Fake-DetectiVE we
neglected images, and duplicate examples were removed
(i.e., same text and diferent image), leading to a decrease
from 1, 058 to 860 examples. NERMuD was transformed
into a sequence-to-sequence task from its original token
classification format. In PoliticIT, each text was divided
into sentences with a maximum length of 200 tokens,
enabling more manageable input for language models. At
classification time, a voting strategy was applied to select
the final class about gender and political ideas,
grouping all sentences written by the same author. Lastly, the
WiC-ITA dataset was expanded by including examples6
with inverted sentence pairs while preserving the same
label, resulting in an increase from 5, 610 to 6, 600
examples. Overall, the entire dataset is composed of a total
of 134, 018 examples.
        </p>
        <p>
          The extremIT5 model underwent 10 epochs of
training with a learning rate of 2 · 10− 5, while the
extremITLLaMA model underwent 2 epochs of
training with a learning rate of 3 · 10− 4. The models
emResults Discussion. The experimental results are
reported in Table 3. We presented the tasks categorized
by sub-task, followed by the Evaluation Metric, and the
scores and ranks achieved by our extremIT5 model,
extremITLLaMA model, and the best competitor. The
best-performing method for each subtask is highlighted
in bold. Our systems, particularly extremITLLaMA,
ranked first in 9 out of 22 subtasks (i.e., the 41% of
subtasks) in
          <xref ref-type="bibr" rid="ref26">EVALITA 2023</xref>
          . Additionally, it ranks in the
top-three position in 14 subtasks, i.e., 64% of all tasks.
        </p>
        <p>However, we faced challenges in tasks such as GeoLingIt,
LangLearn, and WiC-Ita, where our monolithic
architectures demonstrated its limitations. These tasks
specifically require a system to detect and analyze changes in
the author’s writing style or the contextual meaning of
words. Our models are primarily designed for sentence
classification or rewriting spans of input text to justify
previous decisions (e.g., HODI).</p>
        <p>There are also important considerations regarding the
computational cost of both training and inference.
Training extremIT5 (made of “only” 110 million parameters)
required approximately 12 hours on the entire EVALITA
dataset, while extremITLLaMA (made of 7 billion
parameters) took over 144 hours. In terms of inference,
extremITLLaMA processes only 2 or 3 sentences per
second, whereas extremIT5 handles over almost one
hundred sentences per second. This significant
diference in processing speed makes the extremITLLaMA
model less practical, despite its superior performance
across a wide range of tasks. Additionally, the number
of parameters between the two models difers by one
order of magnitude, with extremITLLaMA having 7 billion
parameters compared to extremIT5’s 110 million.</p>
        <p>Overall, the above results are quite impressive,
espe</p>
      </sec>
      <sec id="sec-1-3">
        <title>4Only the positive examples, i.e. the ones that involved any conspir</title>
        <p>acy theory, are added from the dataset A to B or viceversa.
5https://e3c.fbk.eu/testlinkiberlef
6Only the positive examples underwent sentence order flipping in
order to rebalance the class distribution.</p>
      </sec>
      <sec id="sec-1-4">
        <title>7https://github.com/crux82/ExtremITA</title>
        <p>Emit
EmotivITA</p>
        <p>Error Analysis. Since our team participated in all the
tasks, it would be unfeasible to provide a deeper analysis
of each individual result in this report. However, in order
to gain some insight into the inner working of the two
models we employed, here we present some error
analysis carried out on two tasks. We selected a task where our
systems ranked very high, and one where they ranked
cially when considering that no task-specific architec- very low. In the EmIt task A, extremITLLaMA ranked
tural designs were applied. Instead, a single LLM was ifrst in the oficial ranking, and extremIT5 was second.
utilized, demonstrating competitive performance across The task is a multi-label classification problem, where
almost all tasks. The key to achieving such results seems the labels are eight emotions defined by Plutchik [ 34]
to lie in properly prompting the model with natural lan- plus “love" and a label for neutral texts. Table 4 reports
guage requests or employing task-specific encoding tech- the performance of the two ExtremITA systems broken
niques for the outputs. We can expect higher results to down by labels. It is interesting to notice that the
adbe achieved using larger LLMs such as LLaMA 65B. To vantage shown by extremITLLaMA on the aggregated
conduct a more comprehensive evaluation and optimiza- result comes from a skewed distribution over the labels.
tion, it would have been beneficial to explore a broader In particular, extremIT5 is hardly capable of
modelrange of architectures and thoroughly investigate all the ing Fear, which is also the least represented label in the
hyper-parameters of the models. The estimation of these test set. An inverse correlation between the number of
parameters was done hastily due to the time constraints positive instances in the test set and the gain in
perforimposed by the EVALITA deadlines and the extensive mance of extremITLLaMA with respect to extremIT5
commitment required for the parallel completion of all is indeed present. This indicates that extremITLLaMA
13 tasks. is better suited than extremIT5 for the classification of
sparser phenomena. Moreover, extremITLLaMA shows
superior capability in modeling and correctly
predicting every emotion, besides “Trust", where extremIT5
results in a better performance.</p>
        <p>In the LangLearn task, our systems ranked quite low,
respectively 8th place for extremITLLaMA and 10th
place for extremIT5. LangLearn is a text pair
classiLabel
Anger
Anticipation
Disgust
Fear
Joy
Love
Neutral
Sadness
Surprise
Trust
56
85
165
13
100
103
210
95
102
272</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Conclusions</title>
      <sec id="sec-2-1">
        <title>In a recent position paper with a provocative title, Basile</title>
        <p>[35] asks himself “is EVALITA done?”, referring to the
mounting trend of LLMs and zero-shot approaches in
NLP and their impact on the evaluation campaign.
Judging by the results presented in this report, the answer
is still the same as the original paper, i.e., no. The
variety and challenge ofered by the tasks of EVALITA
continue to represent a fundamental resource to understand
and develop language resources and tools for the
Italian language, as shown, for instance, by the variability
of the ranking obtained by our transformer-based
modFigure 1: Accuracy of our systems on the LangLearn test els. However, the raw performance of extremIT5 and
set, with texts removed that are longer than an increasing extremITLLaMA, with minimal adaptations and tuning,
threshold (horizontal axis). is undoubtedly pushing the limits of some tasks,
especially text classification tasks with roots in text semantics.</p>
        <p>
          In any case, these results once again confirm the huge
ifcation task where the most informative features are potential of LLMs and their applicability in real-world
expected to be stylistic, rather than semantic, to capture scenarios. It is important to note that this experiment,
the development in language learning of the author of while not conclusive, used the smallest available models
the texts. With this premise, we were anticipating a sub- due to their size limitations. Additionally, it would be
par performance by our transformer-based models from worthwhile, from a sustainability standpoint, to explore
the beginning. However, another relevant characteristic the results that can be achieved by significantly reducing
of this task is the length of the texts. For computational the amount of annotated data available through zero or
reasons, we had to cut the texts to 100 tokens or less, few-shot learning approaches.
therefore leaving out a significant portion of the data
— we retained exactly 24.6% of the tokens from the two
training sets combined. We checked the impact of the text Acknowledgments
size on the accuracy of the prediction, under the
hypothesis that longer texts in the test set (which were cut by We would like to thank the “Istituto di Analisi dei Sistemi
our systems to a greater extent) are penalized. The plot ed Informatica - Antonio Ruberti” (IASI) for supporting
in Figure 1 shows the accuracy of our systems against the experimentations. Claudiu Daniel Hromei is a Ph.D.
portions of the test set where the texts were filtered by student enrolled in the National Ph.D. in Artificial
Intelsize. The number on the horizontal axis is a threshold on ligence, XXXVII cycle, course on Health and life sciences,
the minimum size in terms of characters of the two texts organized by the Università Campus Bio-Medico di Roma.
forming an instance of the test set. Indeed, the downward
trend indicates that the predictions of our systems are
more accurate on shorter pairs of texts, while more and
more errors are made by both systems on longer texts.
ings of the Eighth Evaluation Campaign of Natural eit, L. Jones, A. N. Gomez, L. Kaiser, I.
PoloLanguage Processing and Speech Tools for Italian. sukhin, Attention is all you need, Co
          <xref ref-type="bibr" rid="ref10">RR
Final Workshop (EVALITA 2023</xref>
          ), CEUR.org, Parma, abs/1706.03762 (2017). URL: http://a
          <xref ref-type="bibr" rid="ref10">rxiv.org/abs/
Italy, 2023</xref>
          . 1706.03762. arXiv:1706.03762.
[17] P. Russo, N. Stoehr, M. Horta Ribeiro, Subtask [26] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT:
a- conspirato
          <xref ref-type="bibr" rid="ref10">rial content classification, 2023</xref>
          . URL: pre-training of deep bidirectional transformers for
https://kaggle.com/competitions/acti-subtask-a. language understanding, in: J. Burstein, C. Doran,
[18] P. Russo, N. Stoehr, M. Horta Ribeiro, Subtask b - T. Solorio (Eds.), Proceedings of the NAACL 2019,
conspi
          <xref ref-type="bibr" rid="ref10">racy category classification, 2023</xref>
          . URL: https: 2019, pp. 4171–4186. URL: https://doi.org/10.18653/
//kaggle.com/competitions/acti-subtask-b. v1/n19-1423. doi:10.18653/v1/n19-1423.
[19] G. Russo, L. Verginer, M. H. Ribeiro, G. Casiraghi, [27] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
Spillover of antisocial behavior from fringe plat- O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
forms: The unintended consequences of commu- Roberta: A robustly optimized BERT pretraining
nity banning, in: Proceedings of the International approach, CoRR abs/1907.11692 (2019). URL: http:
AAAI Conference on Web and Social Media, vol- //arxiv.org/abs/1907.11692. arXiv:1907.11692.
ume 17, 2023, pp. 742–753. [28] P. He, X. Liu, J. Gao, W. Chen, Deberta:
decoding[20] A. Palmero Aprosio, T. Paccosi, Nermud at evalita enhanced bert with disentangled attention, in: 9th
2023: Overview of the named-entities recognition International Conference on Learning
Representaon multi-domain documents task, in: Pro
          <xref ref-type="bibr" rid="ref11">ceedings tions, ICLR 2021</xref>
          , Virtual Event, Austria, May 3-7,
of the Eighth Evaluation Campaign of Natural Lan-
          <xref ref-type="bibr" rid="ref15">2021, 2021</xref>
          . URL: https://openreview.net/forum?id=
guage Processing and Speech Tools for Italian. Final XPZIaotutsD.
        </p>
        <p>
          Wo
          <xref ref-type="bibr" rid="ref10">rkshop (EVALITA 2023</xref>
          ), CEUR.org, Parma, Italy, [29] M. Lewis, Y. Liu, N. Goyal,
          <xref ref-type="bibr" rid="ref2">M. Ghazvininejad, A.
Mo2023</xref>
          . hamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART:
[21] B. Altuna, G. Karunakaran, A. Lavelli, B. Magnini, denoising sequence-to-sequence pre-training for
M. Speranza, R. Zanoli, CLinkaRT at EVALITA natural language gene
          <xref ref-type="bibr" rid="ref10">ration, translation, and
com2023</xref>
          : Overview of the Task on Linking a Lab Re- prehension, CoRR abs/1910.13461 (2019). URL: http:
sult to its Test Event in the Clinical Domain , in: //arxiv.org/abs/1910.13461. arXiv:1910.13461.
Proceedings of the Eighth Evaluation Campaign of [30] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L.
WainNatural Language Processing and Speech Tools for wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama,
Italian. Final Wo
          <xref ref-type="bibr" rid="ref10">rkshop (EVALITA 2023</xref>
          ), CEUR.org, A. Ray, J. Schulman, J. Hilton, F. Kelton, L.
          <xref ref-type="bibr" rid="ref2">Miller,
Parma, Italy, 2023</xref>
          . M. Simens, A. Askell, P. Welinder, P. Christiano,
[22] P. Cassotti, L. Siciliani, L. Passaro, M. Gatto, J. Leike, R. Lowe, Training language models to
P. Basil
          <xref ref-type="bibr" rid="ref26">e, Wic-ita at evalita2023</xref>
          : Overview of the follow instructions with human feedback, 2022.
          <xref ref-type="bibr" rid="ref26">evalita2023</xref>
          word-in-context for italian task, in: arXiv:2203.02155.
        </p>
        <p>
          Proceedings of the Eighth Evaluation Campaign of [31] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li,
Natural Language Processing and Speech Tools for C. Guestrin, P. Liang, T. B. Hashimoto, Stanford
alItalian. Final Wo
          <xref ref-type="bibr" rid="ref10">rkshop (EVALITA 2023</xref>
          ), CEUR.org, paca: An instruction-following lla
          <xref ref-type="bibr" rid="ref2">ma model, https:
Parma, Italy, 2023</xref>
          . //github.co
          <xref ref-type="bibr" rid="ref2">m/tatsu-lab/stanford_alpaca, 2023</xref>
          .
[23] D. Brunato, D. Colla, F. Dell’Orletta, I. Dini, D. P. [32] A. Santilli, Camoscio: An italian instruction-tuned
Radicioni, A. A.
          <xref ref-type="bibr" rid="ref10">Ravelli, Discotex at evalita 2023</xref>
          : llama, https://github.co
          <xref ref-type="bibr" rid="ref2">m/teelinsan/camoscio, 2023</xref>
          .
Overview of the assessing discourse coherence in [33] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu,
italian texts task, in: Proceedings of the Eighth Y. Li, S. Wang, W. Chen, Lora: Low-rank
Evaluation Campaign of Natural Language Process- adaptation of large language models, CoRR
ing and Speech Tools for Italian. Final Workshop abs/2106.09685 (2021). URL: https://a
          <xref ref-type="bibr" rid="ref10">rxiv.org/abs/
(EVALITA 2023</xref>
          ), CEUR.org, Par
          <xref ref-type="bibr" rid="ref2">ma, Italy, 2023</xref>
          . 2106.09685. arXiv:2106.09685.
[24] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, [34] R. Plutchik, H. Kellerman, Theories of emotion 1
W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, (1980).
        </p>
        <p>
          A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, [35] V. Basile, Is EVALITA done? on the impact of
A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robin- prompting on the italian NLP evaluation campaign,
son, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao, in: D. Nozza, L. C. Passaro, M. Polignano (Eds.),
Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean, Proceedings of the Sixth Workshop on Natural
LanJ. Devlin, A. Roberts, D. Zhou, Q. V. Le, J. Wei, Scal- guage for Artificial Intelligence (NL4AI 2022),
voling instruction-finetuned language
          <xref ref-type="bibr" rid="ref21">models, 2022</xref>
          . ume 3287 of CEUR Workshop Proceedings,
CEURarXiv:2210.11416. WS.org, 2022, pp. 127–140. URL: https://ceur-ws.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor- org/Vol-3287/paper13.pdf .
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          of the Eighth Evaluation Campaign of Natural Lan[1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Narang, guage Processing and Speech Tools for Italian</article-title>
          . Final
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Explor- Workshop (EVALITA
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>ing the limits of transfer learning with a unified 2023.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>text-to-text transformer</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>21</volume>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Araque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          (
          <year>2020</year>
          )
          <volume>140</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>140</lpage>
          :
          <fpage>67</fpage>
          . URL: http://jmlr.org/papers/ V. Patti, EMit at EVALITA 2023:
          <article-title>Overview of the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
          <source>Categorical Emotion Detection in Italian Social Me</source>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kale</surname>
          </string-name>
          , R. Al- dia
          <string-name>
            <surname>Task</surname>
          </string-name>
          , in: Proceedings of the Eighth Evalua-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Rfou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Siddhant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Barua</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Rafel, mt5: A mas- tion Campaign of Natural Language Processing</article-title>
          and
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          former, in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
          </string-name>
          , L. Zettle- 2023), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>moyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tür</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            , S. Bethard, [10]
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Gafà</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cutugno</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Venuti</surname>
          </string-name>
          , Emotivita at
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.), Pro- EVALITA2023:
          <article-title>Overview of the dimensional and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>ceedings of the NAACL-HLT 2021, Online, June multidimensional emotion analysis task</article-title>
          , in: Pro-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          6-
          <fpage>11</fpage>
          ,
          <year>2021</year>
          ,
          <article-title>Association for Computational Linguis- ceedings of the Eighth Evaluation Campaign of Nat-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>498</lpage>
          . URL: https://doi.org/10. ural Language Processing and Speech Tools for Ital-
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2021</year>
          .naacl-main.
          <volume>41</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/ ian. Final Workshop (EVALITA
          <year>2023</year>
          ), CEUR.org,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          2021.naacl-main.
          <volume>41</volume>
          .
          <string-name>
            <surname>Parma</surname>
          </string-name>
          , Italy,
          <year>2023</year>
          . [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, IT5: large-scale text-to-text pre-</article-title>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. A</surname>
          </string-name>
          . García-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          eration,
          <source>CoRR abs/2203</source>
          .03759 (
          <year>2022</year>
          ). URL: https:// López, R. Valencia-García, Overview of
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          doi.org/10.48550/arXiv.2203.03759. doi:
          <volume>10</volume>
          .48550/ PoliticIT2023@EVALITA: Political Ideology
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          arXiv.
          <volume>2203</volume>
          .03759. arXiv:
          <volume>2203</volume>
          .03759. Detection in Italian Texts,
          <year>2023</year>
          . [4]
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          , [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          , C. Casula, GeoLingIt at EVALITA 2023:
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          (
          <article-title>EVALITA 2023), CEUR</article-title>
          .org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Scaling</surname>
            instruction-finetuned language [13]
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Alzetta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Brunato</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Delll'Orletta</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Miaschi,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          models,
          <source>CoRR abs/2210</source>
          .11416 (
          <year>2022</year>
          ). URL: https:// K. Sagae,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Sánchez-Gutiérrez</surname>
          </string-name>
          , G. Venturi, Lan-
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          doi.org/10.48550/arXiv.2210.11416. doi:
          <volume>10</volume>
          .48550/ glearn at evalita 2023:
          <article-title>Overview of the language</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          arXiv.
          <volume>2210</volume>
          .11416. arXiv:
          <volume>2210</volume>
          .11416.
          <article-title>learning development task</article-title>
          , in: Proceedings of the [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , Eighth Evaluation Campaign of Natural Language
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>erative</surname>
          </string-name>
          pre-training (
          <year>2018</year>
          ).
          <source>shop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          . [6]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            , [14]
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Celli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramponi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tonelli</surname>
          </string-name>
          , C. Bosco,
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , Haspeede3 at evalita 2023:
          <article-title>Overview of the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Sigler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , Italian. Final Workshop (EVALITA
          <year>2023</year>
          ), CEUR.org,
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          , G. Damo, T. Caselli,
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          CoRR abs/
          <year>2005</year>
          .14165 (
          <year>2020</year>
          ). URL: https://arxiv.org/ V. Patti, HODI at EVALITA 2023:
          <article-title>Overview of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          abs/
          <year>2005</year>
          .14165. arXiv:
          <year>2005</year>
          .
          <article-title>14165. the Homotransphobia Detection in Italian Task</article-title>
          , in: [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-A. Proceedings of the Eighth Evaluation Campaign of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>bro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, Italian. Final Workshop (EVALITA
          <year>2023</year>
          ), CEUR.org,
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          , Llama: Open and eficient foundation Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>language models</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          . [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dell'Oglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcelloni</surname>
          </string-name>
          , [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprug- L. C. Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <surname>Multi-</surname>
          </string-name>
          fake-detective
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>noli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th at evalita 2023: Overview of the multimodal fake</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>