<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Fauno: The Italian Large Language Model that will leave you senza parole!</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Bacciu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Trappolini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Santilli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Rodolà</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Silvestri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University - Computer Science Department</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza University - Department of Computer, Control and Management Engineering</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>13</volume>
      <abstract>
        <p>This paper presents Fauno, the first and largest open-source Italian conversational Large Language Model (LLM). Our goal with Fauno is to democratize the study of LLMs in Italian, demonstrating that obtaining a fine-tuned conversational bot with a single GPU is possible. In addition, we release a collection of datasets for conversational AI in Italian. The datasets on which we fine-tuned Fauno include various topics such as general question answering, computer science, and medical questions. We release our code and datasets on https://github.com/RSTLess-research/Fauno-Italian-LLM Recently, there has been significant progress in enhancing the capabilities of natural language processing (NLP) methods. Notably, ChatGPT [1] and GPT-4 [2] have emerged as highly proficient chat models. These models exhibit an impressive aptitude for comprehending and producing human-like responses across various domains. Consequently, they have gained popularity in customer support, virtual assistants, and social media moderation applications. However, these models are often only accessible through restricted APIs, which hampers new research and development. Additionally, the limited availability of chat models creates challenges for researchers and practitioners, impeding the growth of the NLP community. This problem is further aggravated by the scarcity of publicly accessible and high-quality chat datasets for analyzing and improving these models. Recently, Baize [3], whose depiction can be seen in Figure 3, has been introduced to tackle these challenges efectively, utilizing the capabilities of ChatGPT to generate a top-notch chat dataset comprising multiple conversational turns automatically. Their approach involves employing ChatGPT to engage in a simulated dialogue with itself, mimicking both user and AI responses. This generated dataset is a valuable resource for training and evaluating chat models in the context of multi-turn conversations. Additionally, by specifying an initial dataset, they extract samples from specific domains and ifne-tune chat models to specialize in technology or healthcare. To fine-tune Large Language</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>Conversational AI</kwd>
        <kwd>Multilingual Models</kwd>
        <kwd>ChatGPT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Models (LLM) with limited resources, they adopt an eficient tuning technique that maximizes
the utilization of available computational capabilities. This strategy allows state-of-the-art
language models to adapt and perform well in resource-constrained scenarios. Their primary
objective is to enhance an open-source LLM, LLaMA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which they believe has the potential
to serve as an accessible alternative to proprietary chat models. By fine-tuning LLaMA with
their generated chat dataset, they introduce a new model called Baize (Bái zé), inspired by a
legendary creature from Chinese folklore known for its linguistic prowess and vast knowledge.
Baize reaches incredible performance; however, as with many other models in this category,
its performance heavily depends on the language considered [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In particular, these models
tend to prefer the English language since this is the language they have seen the most during
their training. Their performance in other languages, such as Italian, is vastly inferior, as it
can be easily checked empirically by merely using these tools. For this reason, inspired by
Latin mythology, in this paper, we introduce Fauno, the first and the largest open-source Italian
conversational LLM. To build Fauno, we translated several datasets originally used by Baize
in Italian. We use them to fine-tune and train our model, reaching good generation scores,
as highlighted in later sections of this paper. Our goal is twofold. First, we want to provide
the Italian (and the international) research community with an LLM that is highly proficient
in Italian. We hope this will enhance and enable new research on this particular language
by surpassing the current state-of-the-art capabilities of Italian. Second, we want to provide
practitioners with a valid conversational AI able to deal with the Italian language. We provide an
example for this latter case to hint at this tool’s potential. Italy has one of the largest populations
of elderly people. In the last twenty years, the number of people older than eighty has doubled,
according to the Italian statistical entity (ISTAT). Furthermore, roughly 1.2 million people are
older than 65 and live alone. These people are known in the medical literature to sufer from
loneliness, which aggravates their health issues. With tools like Fauno, however, one could
ofer a conversational AI as a telehealth service. While not being a substitute for a human, this
could improve the quality of life of millions of people and reduce the weight of the national
health system. This is just one example of applications, and we hope our model can spark many
such projects and research. Fauno can be operational on accessible hardware, thus expanding
accessibility to a broader range of practitioners and researchers.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Language Models for Chat</title>
        <p>
          Transformers [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], introduced initially as a machine translation system, have had an arguably
unprecedented impact on the AI world. Transformers and deep learning models, in general,
are now used not only for NLP [
          <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7, 8, 9, 10</xref>
          ], but also other fields such as image processing [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
audio [
          <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
          ], 3D data [
          <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
          ], recommendation systems [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]. Language Models
(LMs), based on the transformer architecture, have gained significant attention in Natural
Language Processing due to their ability to produce coherent human-like text. Radford et al.
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] showed that LMs could learn a variety of tasks without direct human supervision but using
self-supervision on large, noisy web corpora. Several works have developed this idea, further
scaling the data and model size [
          <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
          ]. However, all these models are not explicitly designed
with a conversational objective or training curriculum. Zhang et al. [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] introduced DialoGPT,
which is a fine-tuned version of GPT-2 on Reddit data; Adiwardana et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] proposed Meena,
a 2.6B model trained with conversational data from social media; [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] proposed LaMDA, a 137B
model trained on public dialog data. Recently OpenAI released ChatGPT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a model optimized
for conversational interaction trained with Reinforcement Learning with Human Feedback [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
With GPT-4 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], this approach was further improved with additional reasoning capabilities.
Recently, Xu et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] introduced Baize as a finetuned version of LLaMA [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for conversation by
training the model on dialogs generated by ChatGPT. Our approach stems from this work and
extends it to the Italian language.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Language Models for Italian</title>
        <p>
          To the best of our knowledge, no LMs are trained explicitly with a conversational objective or
training curriculum for the Italian language. The currently available LMs for Italian include
GePpeTto[
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], a GPT-2 model finetuned on Italian Wikipedia and the ItWac corpus [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]; IT5
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], an Italian version of T5 trained on a cleaned version of the Italian mC4 corpus[
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]; BART-IT
[
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], an Italian version of BART[
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] trained on the same data mixture of IT5; Camoscio [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], an
Italian version of Stanford Alpaca [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] trained on the Alpaca dataset translated to Italian.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Parameter-Eficient Tuning</title>
        <p>
          Typical fine-tuning involves training all parameters of an LLM, which can be ineficient as
the number of parameters increases. Various techniques have been proposed to address this
issue under the umbrella term of Parameter-Eficient Fine-Tuning (PEFT). These techniques
include Adapters[
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], which adds a trainable layer inside the Transformer while freezing the
others; BitFit[
          <xref ref-type="bibr" rid="ref37">37</xref>
          ], which only tunes bias terms in the linear layers; Prompting approaches
like Prefix Tuning [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ], P-Tuning [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] or Prompt Tuning [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ] which fine-tune the optimal
prompt parameters for a task; or LoRA [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] which trains low-rank matrices while keeping all
the remaining parameters frozen. This is a very active research area, and we refer the reader
to Zhuang et al. [42] for an up-to-date overview. Recently the community leveraged these
techniques to train low-cost versions of Stanford Alpaca [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] under the name of Alpaca-LoRA1.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conversational AI Datasets in Italian</title>
      <p>
        In this section, we report the data we use to fine-tune Fauno and the data generation and
translation process. We train Fauno on extensive Italian synthetic datasets, encompassing a
wide range of fields such as medical data (MedQuad [ 43]), technical content from StackOverflow,
Quora discussions, and Alpaca [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] data translated into Italian with 47.000, 57.000, 54.000 and,
22.000 dialogues, respectively. The synthetic generation of the data is described in Section 3.1.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Data Generation</title>
        <sec id="sec-3-1-1">
          <title>1https://github.com/tloen/alpaca-lora</title>
          <p>
            To produce conversational AI Datasets, we inherit Xu
et al. 2023 [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]’s dataset generation methodology. This
methodology consists of allowing ChatGPT to engage
in self-chatting to produce a conversation between a
user and a helpful AI assistant in a conversational
format. This self-chatting process is initiated with a
question (or seed) from real users. For example, [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] takes as
seed questions froms Quora2 and Stack Overflow 3. The
generation methodology is shown in Figure 3.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Translation</title>
        <sec id="sec-3-2-1">
          <title>To translate Fauno’s data, we use two diferent trans</title>
          <p>lation systems to maximize translation accuracy and
quality. Jiao et al. (2023) [44] have demonstrated that</p>
          <p>Google Translate is state-of-the-art among commercial
translation systems. Nevertheless, Google Translate encounters significant issues when
handling text incorporating natural language utterances and programming code. For instance,
when translating a code snippet like "for i in range(10)" into Italian, Google Translate would
incorrectly render it as "per i in intervallo(10)", which disrupts the syntax and functionality
of the original code. This is due to Google Translate’s inability to discern between natural
language and programming code. To address this limitation, we opted to utilize ChatGPT 3.5
to translate datasets containing programming code, as it is better suited for preserving the
integrity of the code’s syntax and structure. Hence, we translate Alpaca and StackOverflow
with ChatGPT 3.5 and the remaining datasets with Google Translate. We release the translated
dataset of StackOverflow 4, MedQuad5, Quora6 and Alpaca7 in the Huggingface hub. The cost
of translating the StackOverflow dataset using the ChatGPT-3.5 API is $80, while the Alpaca
Dataset translation costs around $30. For other datasets, we utilize Google’s free translation
API8.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Train setup</title>
      <p>
        We use the same hyper-parameters of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with a single epoch and batch size of 128 examples.
We perform our experiments on a workstation equipped with an Intel Core i9-10940X (14-core
CPU running at 3.3GHz) and 256GB of RAM, and a single Nvidia RTX A6000 with 48GB of
VRAM. The training time for the 7B and 13B models are 19 and 35 hours, respectively.
2https://huggingface.co/datasets/quora
3https://huggingface.co/datasets/pacovaldez/stackoverflow-questions
4https://huggingface.co/datasets/andreabac3/StackOverflow-Italian-Fauno-Baize
5https://huggingface.co/datasets/andreabac3/MedQuaAD-Italian-Fauno-Baize
6https://huggingface.co/datasets/andreabac3/Quora-Italian-Fauno-Baize
7https://huggingface.co/datasets/teelinsan/camoscio_cleaned
8https://pypi.org/project/googletrans/
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>
        As done by Xu et al. 2023 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Camoscio [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], we rely on a qualitative analysis of Fauno’s
outputs. In Table 1, we compare ChatGPT 3.5, Camoscio, and Fauno.
      </p>
      <p>Camoscio</p>
      <p>Il senso della vita è trovare la felicità e la soddisfazione nella vita.</p>
      <p>Model
Question
ChatGPT
Fauno
Question
ChatGPT
Camoscio
Fauno</p>
      <p>Answer
Qual è il senso della vita?
La domanda sul senso della vita è stata oggetto di riflessione filosofica e teologica per secoli,
e non esiste una risposta definitiva o concorde. Tuttavia, molti individui trovano significato
e scopo nella vita attraverso diverse vie. Alcuni cercano il senso nella connessione umana e
nelle relazioni significative, altri nella realizzazione personale e nello sviluppo delle proprie
passioni. Altri ancora si rivolgono alla spiritualità o alla fede per trovare un senso più
profondo. In generale, il senso della vita può essere un’esperienza altamente individuale e
soggettiva, e ognuno potrebbe trovare una risposta diversa a questa domanda.</p>
      <p>Il senso della vita è una domanda popolare e complessa che ha suscitato dibattito in molte
culture e generazioni. Alla fine, la comprensione di qual è il senso della vita dipende dalla
prospettiva e dalle convinzioni personali di ogni singolo individuo. Alcune persone possono
vedere il senso della vita in termini di un significato unico e globale, mentre altri possono
sforzarsi di risolvere il mistero della vita trovando autentico benessere e piacere nei loro
giorni quotidiani.</p>
      <p>Come posso copiare un file da un server tramite ssh?
Per copiare un file da un server tramite SSH, puoi utilizzare il comando "scp" (Secure Copy).</p>
      <p>Ecco come puoi farlo: Apri una finestra di terminale sul tuo computer locale. Assicurati di
avere il client SSH installato sul tuo sistema. Se stai utilizzando un sistema operativo basato
su Unix (ad esempio Linux o macOS), lo avrai probabilmente già installato. Se stai
utilizzando Windows, puoi utilizzare un client SSH come PuTTY o installare un’implementazione
di SSH come OpenSSH. Utilizza il seguente comando "scp" per copiare il file dal server
al tuo computer locale: scp username@indirizzo_server:percorso_del_file_locale
percorso_di_destinazion
Per copiare un file da un server tramite ssh, è necessario utilizzare l’opzione "scp"
dell’utente ssh. Per esempio, per copiare un file dal server "server.example.com"
all’utente "utente.example.com", l’utente dovrebbe utilizzare l’opzione "scp
server.example.com:utente.example.com".</p>
      <p>Per copiare un file da un server tramite ssh, puoi utilizzare il seguente comando:
scp username@ip_dello_server:percorso/della/sorgente/del/file
/percorso/della/destinazione/del/file
Sostituisci i dettagli con il tuo nome utente, il nome dell’host, il percorso della sorgente e il
percorso finale del file e clicca su "Avanti". Il file verrà copiato dal server alla tua macchina.</p>
      <p>C’è qualcos’altro di cui hai bisogno?</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This work shows that it is possible to democratize the research for LLMs in multiple languages
with limited resources. Furthermore, we believe that the release of the Italian datasets for
conversational AI will help researchers to develop further Large Language Models.
This work was partially supported by projects FAIR (PE0000013) and SERICS (PE00000014) under
the MUR National Recovery and Resilience Plan funded by the European Union -
NextGenerationEU and by ERC Starting Grant No. 802554 (SPECGEO) and PRIN 2020 project n.2020TA3K9N
"LEGO.AI". Supported by the ERC Advanced Grant 788893 AMDROMA, EC H2020RIA project
“SoBigData++” (871042), PNRR MUR project IR0000013-SoBigData.it.
[42] B. Zhuang, J. Liu, Z. Pan, H. He, Y. Weng, C. Shen, A survey on eficient training of
transformers, ArXiv abs/2302.01107 (2023).
[43] A. Ben Abacha, D. Demner-Fushman, A question-entailment approach to question
answering, BMC bioinformatics 20 (2019) 1–23.
[44] W. Jiao, W. Wang, J. tse Huang, X. Wang, Z. Tu, Is chatgpt a good translator? yes with
gpt-4 as the engine, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] OpenAI, Chatgpt: Optimizing language models for dialogue, 2023</article-title>
          . URL: https://openai. com/blog/chatgpt/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] OpenAI, Gpt-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <string-name>
            <surname>Baize:</surname>
          </string-name>
          <article-title>An open-source chat model with parametereficient tuning on self-chat data</article-title>
          ,
          <source>arXiv preprint arXiv:2304.01196</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ducel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fort</surname>
          </string-name>
          , G. Lejeune,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lepage</surname>
          </string-name>
          ,
          <article-title>Do we name the languages we study? the# benderrule in lrec and acl articles</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>564</fpage>
          -
          <lpage>573</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>5485</fpage>
          -
          <lpage>5551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Trappolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Santilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodolà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <article-title>Multimodal neural databases</article-title>
          ,
          <source>in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Conia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <article-title>Unifying cross-lingual semantic role labeling with heterogeneous linguistic resources</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>338</fpage>
          -
          <lpage>351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          , et al.,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>11929</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Payne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Jukebox: A generative model for music</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>00341</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Agostinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Denk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Borsos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Verzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tagliasacchi</surname>
          </string-name>
          , et al.,
          <article-title>Musiclm: Generating music from text</article-title>
          ,
          <source>arXiv preprint arXiv:2301.11325</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Barnabò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Trappolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lastilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Campagnano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <article-title>Cycledrums: automatic drum arrangement for bass lines using cyclegan</article-title>
          ,
          <source>Discover Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <article-title>4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Koltun</surname>
          </string-name>
          ,
          <article-title>Point transformer</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF international conference on computer vision</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>16259</fpage>
          -
          <lpage>16268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Trappolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cosmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moschella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzi</surname>
          </string-name>
          , E. Rodolà,
          <article-title>Shape registration in the time of transformers</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>5731</fpage>
          -
          <lpage>5744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>O.</given-names>
            <surname>Halimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Imanuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Litany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Trappolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodolà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guibas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kimmel</surname>
          </string-name>
          ,
          <article-title>Towards precise completion of deformable shapes</article-title>
          , in: Computer Vision-ECCV
          <year>2020</year>
          : 16th European Conference, Glasgow, UK,
          <year>August</year>
          23-
          <issue>28</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <source>Part XXIV 16</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>377</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>W.-C. Kang</surname>
            ,
            <given-names>J. McAuley</given-names>
          </string-name>
          ,
          <article-title>Self-attentive sequential recommendation, in: 2018 IEEE international conference on data mining (ICDM)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Siciliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <article-title>Integrating item relevance in training loss for sequential recommender systems</article-title>
          ,
          <source>arXiv preprint arXiv:2305.10824</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , E. Adeli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          , S. von Arx,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bohg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brunskill</surname>
          </string-name>
          , et al.,
          <article-title>On the opportunities and risks of foundation models</article-title>
          ,
          <source>arXiv preprint arXiv:2108.07258</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          , E. Buchatskaya,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rutherford</surname>
          </string-name>
          , D. d. L.
          <string-name>
            <surname>Casas</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hendricks</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Welbl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Training compute-optimal large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2203.15556</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Sun,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brockett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          , Dialogpt:
          <article-title>Large-scale generative pre-training for conversational response generation</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>00536</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Adiwardana</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fiedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Thoppilan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kulshreshtha</surname>
          </string-name>
          , G. Nemade,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          , et al.,
          <article-title>Towards a human-like open-domain chatbot</article-title>
          , arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>09977</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Thoppilan</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. De Freitas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kulshreshtha</surname>
            , H.-
            <given-names>T.</given-names>
            Cheng, A. Jin, T.
          </string-name>
          <string-name>
            <surname>Bos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Du</surname>
          </string-name>
          , et al.,
          <article-title>Lamda: Language models for dialog applications</article-title>
          ,
          <source>arXiv preprint arXiv:2201.08239</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wainwright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Agarwal,
          <string-name>
            <given-names>K.</given-names>
            <surname>Slama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          , et al.,
          <article-title>Training language models to follow instructions with human feedback</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>27730</fpage>
          -
          <lpage>27744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Mattei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cafagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guerini</surname>
          </string-name>
          ,
          <article-title>Geppetto carves italian into a language model</article-title>
          , in: J.
          <string-name>
            <surname>Monti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Tamburini</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Seventh Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2020</year>
          , Bologna, Italy, March 1-
          <issue>3</issue>
          ,
          <year>2021</year>
          , volume
          <volume>2769</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2769</volume>
          /paper_46.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bernardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferraresi</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Zanchetta,</surname>
          </string-name>
          <article-title>The wacky wide web: a collection of very large linguistically processed web-crawled corpora</article-title>
          ,
          <source>Language resources and evaluation 43</source>
          (
          <year>2009</year>
          )
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, It5: Large-scale text-to-text pretraining for italian language understanding and generation</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .
          <fpage>03759</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddhant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rafel, mT5: A massively multilingual pre-trained text-to-text transformer, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>498</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>41</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>41</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>M. La Quatra</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <article-title>Bart-it: An eficient sequence-to-sequence model for italian text summarization</article-title>
          ,
          <source>Future Internet</source>
          <volume>15</volume>
          (
          <year>2023</year>
          ). URL: https://www.mdpi.com/1999-5903/15/1/15. doi:
          <volume>10</volume>
          .3390/fi15010015.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Santilli</surname>
          </string-name>
          ,
          <string-name>
            <surname>Camoscio:</surname>
          </string-name>
          <article-title>An italian instruction-tuned llama</article-title>
          , https://github.com/teelinsan/ camoscio,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taori</surname>
          </string-name>
          , I. Gulrajani,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dubois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Hashimoto</surname>
          </string-name>
          , Stanford alpaca:
          <article-title>An instruction-following llama model</article-title>
          , https://github.com/tatsu-lab/ stanford_alpaca,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giurgiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jastrzebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Morrone</surname>
          </string-name>
          , Q. de Laroussilhe,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gesmundo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Attariyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          ,
          <article-title>Parameter-eficient transfer learning for NLP</article-title>
          , in: ICML, volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2790</fpage>
          -
          <lpage>2799</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Zaken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravfogel</surname>
          </string-name>
          , Bitfit:
          <article-title>Simple parameter-eficient fine-tuning for transformer-based masked language-models</article-title>
          , in: ACL, Association for Computational Linguistics,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Prefix-tuning: Optimizing continuous prompts for generation, in: ACL-IJCNLP, Association for Computational Linguistics</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>4582</fpage>
          -
          <lpage>4597</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , P-tuning:
          <article-title>Prompt tuning can be comparable to fine-tuning across scales and tasks</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>68</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .acl-short.8. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .acl-short.
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <article-title>The power of scale for parameter-eficient prompt tuning</article-title>
          ,
          <source>arXiv preprint arXiv:2104.08691</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          , in: ICLR, OpenReview.net,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>