<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sixth Workshop on Natural Language for Artificial Intelligence, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Is EVALITA Done? On the Impact of Prompting on the Italian NLP Evaluation Campaign.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin</institution>
          ,
          <addr-line>C.so Svizzera 185, 10147</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>30</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Prompt-based learning is a recent paradigm in NLP that leverages large pre-trained language models to perform a variety of tasks. With this technique, it is possible to build classifiers that do not need training data (zero-shot). In this paper, we assess the status of prompt-based learning applied to several text classification tasks in the Italian language. The results indicate that the performance gap towards current supervised methods is still relevant. However, the diference in performance between pre-trained models and the characteristic of the prompt-based classifier of operating in a zero-shot fashion open a discussion regarding the next generation of evaluation campaigns for NLP.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Prompt-based learning</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Benchmarking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>own NLP evaluation campaigns, such as GermEval4 for German or IberLEF (previously known
as IberEval)5 for Spanish and other Iberian languages.</p>
      <p>
        EVALITA is the “periodic evaluation campaign of Natural Language Processing (NLP) and
speech tools for the Italian language”6. Started in 2007, EVALITA was held seven times in
2007, 2009, 2011, 2014, 2016, 2018, and 2020, and its eighth edition is scheduled for 2023. The
retrospective article by Passaro et al.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes a healthy community, reflected by a growing
number of shared tasks proposed at each edition, culminating with the 14 tasks at EVALITA
2020 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. At the same time, more interestingly for this paper, the number of classification tasks
has consistently grown over the years. This phenomenon has become apparent in the 2018
edition EVALITA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where a single system was submitted to four diferent tasks (ABSITA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
GxG [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], HaSpeeDe [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and IronITA [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) and ranked first in most of the individual subtasks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
This system was able to achieve very high results on all the tasks by leveraging multi-task
learning. While this advancement was rightly praised, it also spurred the didscussion about the
format of the shared tasks organized at EVALITA, i.e., if many tasks follow the same format
(text classification), then the evaluation campaign may be shifting its focus towards learning
models, with less regard for the underlying language phenomena.
      </p>
      <p>
        The latest edition of EVALITA in 2020 confirmed this trend, with at least four “pure” text
classification tasks (AMI [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], SARDISTANCE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], HaSpeeDe 2 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and TAG-it [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) and
a few more where classification is partially involved important role (DANKMEMES [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and
ATE_ABSITA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]).
      </p>
      <p>In this paper, we revisit a number of tasks from the past editions of EVALITA in the light of
the newest technologies available for NLP. We focus on classification tasks (Section 4), although
in principle the experiment could be extended to other forms of inference over textual data. In
particular, we consider the recently proposed paradigm of prompt-based learning (Section 2),
which makes use of large pre-trained language models (Section 3) to perform classification
in a zero-shot fashion. With the right combiniation of parameters, prompt-based zero-shot
classifiers often performs surprisingly well, therefore raising important questions about the
future of the evaluation in NLP:</p>
      <p>R1: Is supervised learning becoming obsolete in NLP, along with the need for
training data?
If pre-trained language models can provide acceptable predictions without training data, in
particular superior to those of classical, pre-neural machine learning models, then perhaps the
baseline methods typically associated with shared tasks should be rethought.</p>
      <p>R2: Should zero-shot methods become the new baseline for NLP tasks?
The rest of this paper presents an experiment where a number of language models are used in
combination with prompt-based learning and tested against benchmarks provided by EVALITA,
in order to answer these questions.</p>
      <sec id="sec-1-1">
        <title>4https://germeval.github.io/ 5https://sites.google.com/view/iberlef2022 6https://www.evalita.it</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        Prompt-based learning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is a recent paradigm which gained enormous traction in the NLP
community, applied, among other tasks, to zero-shot classification. In a nutshell, prompt-based
classification makes use of large pre-trained language models to map labels to handcrafted or
automatically derived natural language expressions. The plausibility of the instance to classify,
augmented with the prompt, determines the label without the need for further training or
ifne-tuning. Prompting for NLP is an active area of research. Solutions have been proposed for
automatically inducing prompts [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], to improve the learning process, e.g. with calibration [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
and to adapt the method to few-shot learning [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>In this paper, we propose an experiment of classification with prompts and pre-trained
models with purposely simplistic characteristics. For each binary classification task, we create
exactly two verbalizations, one for each label. The template for the verbalizations is fixed and
it belongs to one of two types, namely text classification and author profiling. Furthermmore,
the templates provide exactly one slot which is filled with exactly one word. Table 1 illustrates
the verbalizations associated to each label in our experiments. The verbalizations are manually
crafted, without any efort to optimize them or tuning any parameters.</p>
      <p>Label
irony
hate
subjective
positive
negative
misogyny
aggressiveness
man/woman</p>
      <p>Template Positive filler Negative filler
ironica normale
(EN) ironic (EN) normal
ofensiva normale
(EN) ofensive (EN) normal
soggettiva oggettiva
(EN) subjective (EN) objective
positiva normale
Questa frase è [mask] (EN) positive (EN) normal
(EN) This sentence is [mask] negativa normale
(EN) negative (EN) normal
misogina normale
(EN) misogynous (EN) normal
aggressiva normale
(EN) aggressive (EN) normal
(EN) L’autore di questa frase è [mask] uomo donna</p>
      <p>The author of this sentence is [mask] (EN) man (EN) woman</p>
      <p>
        The experiment is implemented with OpenPrompt [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a Python library that streamlines the
process of creating templates and verbalizers, up to the prediction of labels on textual data.7
      </p>
      <sec id="sec-2-1">
        <title>7https://github.com/thunlp/OpenPrompt</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Models</title>
      <p>
        The classification power of prompt-based learning is only as good as the pre-trained model
that serves as the basis for the classification algorithm. In this section, we briefly describe
the three models used in the experiments presented in this paper. The models are based on
Bidirectional Encoder Representations from Transformers [21, BERT], a popular and
highperforming language model based on transformers [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>Two of the models used in this paper are monolingual and have been created specifically to
encode the properties of the Italian language. The third model is multilingual, i.e., trained on
text from multiple languages</p>
      <sec id="sec-3-1">
        <title>3.1. AlBERTo</title>
        <p>
          The first neural language model that has been proposed for the Italian language is called
AlBERTo [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. AlBERTo is based on BERT and trained on a collection of 200 million posts from
Twitter from the corpus TWITA [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The hyperparameter setting of AlBERTo mimics the first
base model for English, with 12 hidden layers, 768-dimensional embeddings, and 12 attention
heads, for a total of 110 million parameters. AlBERTo is available from the Huggingface model
repository8 with the identifier:
m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. MDZ Italian BERT</title>
        <p>
          The MDZ Digital Library team at the Bavarian State Library published a set of BERT and
ELECTRA [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] models trained on a Wikipedia dump, the OPUS corpora collection [26], and the
Italian part of the OSCAR corpus [27] for a total of about 13 million tokens. The architecture
of the network is for the most part the same as AlBERTo: 12 hidden layers, 768-dimensional
embeddings, and 12 attention heads. The Italian BERT model used for the experiments in this
paper is available from the Huggingface model repository with the identifier:
dbmdz/bert-base-italian-xxl-uncased
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Multilingual BERT</title>
        <p>
          The multilingual BERT, in its cased and uncased variants, is one of the first models released
together with the BERT architecture itself [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. It is trained on text in 102 languages from
Wikipedia with a masked language model goal. Although it has been surpassed in performance
for many NLP tasks, Multilingual BERT has been widely adopted, also because pre-trained
language models for languages other than English are often unavailable or smaller than their
English counterparts. The Multilingual BERT model used for the experiments in this paper is
available from the Huggingface model repository with the identifier:
bert-base-multilingual-uncased
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Tasks</title>
      <p>
        Six shared tasks have been selected from the past three editions of EVALITA, one from EVALITA
2016 [28], four from EVALITA 2018 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and one from EVALITA 2020 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. All tasks are
classification tasks, and more specifically binary classification tasks, i.e., where the label to predict for
each textual instance can have one of two possible values. Table 2 summarizes the tasks selected
for the experiments presented in this paper and statistics on their size and label distribution.
      </p>
      <p>Task Label
IronITA irony
HaSpeeDe (TW) hate
HaSpeeDe (FB) hate
HaSpeeDe 2 hate
AMI amgigsoregsysnivyeness</p>
      <p>subjective
SENTIPOLC pnoegsiattivivee</p>
      <p>irony
GxG (CH) man/woman
GxG (DI) man/woman
GxG (JO) man/woman
GxG (TW) man/woman
GxG (YT) man/woman</p>
      <p>For all the shared tasks, we downloaded the test set textual data and labels from the European
Language Grid9 (ELG) [29]. The ELG is a recently proposed platform for Language Technology
in Europe funded by the Horizon 2020 scheme. The main goal of ELG is to create an open and
shared linguistic benchmark for Italian on a large set of representative tasks. The EVALITA4ELG
project [30]10 integrated a large number of datasets and other resources, including pre-trained
models and systems, from all editions of EVALITA to date into the ELG. It is therefore suficient
to register an account on the platform and the data can be accessed programmatically with the
oficial ELG Python library.</p>
      <sec id="sec-4-1">
        <title>4.1. IronITA</title>
        <p>The EVALITA 2018 Task on Irony Detection in Italian Tweets [8, IronITA] is a shared task
focused on the automatic detection of irony in Italian tweets. The shared task is articulated in
two subtasks with increasing level of granularity. The first subtask is a binary classification of
tweets into ironic vs. non-ironic. The second task adds the level of sarcasm to the classification,</p>
        <sec id="sec-4-1-1">
          <title>9https://live.european-language-grid.eu/</title>
          <p>10https://live.european-language-grid.eu/meta-forum-2022/project-expo/evalita4elg
conditioned on the presence of irony in the tweets. For the experiments of this task, we only
consider the first subtask.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. HaSpeeDe and HaSpeeDe 2</title>
        <p>
          Hate Speech Detection (HaSpeeDe) is a classification task that was run twice, at EVALITA
2018 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and 2020 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], with similar scheme but updating the dataset from one edition to the
other. The task focuses on the classification of hateful, aggressive, and ofensive content in
social media data from Twitter and Facebook. The first edition of HaSpeeDe features a binary
classification task (hate vs. not hate) and a cross-domain subtask. In this paper, we used the test
set of the first two subtasks, i.e., binary classification of hate on Twitter (TW) and Facebook
(FB). HaSpeeDe 2 proposed a couple of additional subtasks, namely stereotype detection and
the identification of nominal utterances linked to hateful content. For the purpose of this paper,
we only used the data and labels from the main subtask of HaSpeeDe 2.
4.3. AMI
The Automatic Misogyny Identification shared task at EVALITA 2020 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] proposes a benchmark
for the classification of misogynistic and aggressive content towards women in Italian tweets.
The main task is a double binary classification where systems are called to label tweets with
two independent labels: misogynous vs not misogynous and aggressive vs. not aggressive.
Furthermore, the second subtask of AMI introduces a synthetic dataset to measure the fairness
of misogyny classification models. In this paper, we only used the binary classification data
from the first subtask of AMI (misogyny and aggressiveness).
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. SENTIPOLC</title>
        <p>The Sentiment Polarity Classification task (SENTIPOLC) was organized at EVALITA 2014 [ 31]
and 2016 [32], with the second edition including the data used for the prevvious one plus a new
test set. The task is focused on sentiment analysis on Italian tweets, with three classification
tasks: subjectivity, polarity, and irony. The main task, classification of polarity is cast as a double
binary classification task, where systems must produce two independent labels for positive and
negative sentiment found in the text. In this way, the SENTIPOLC annotation scheme is able
to encode poositive and negative sentiment, as well as neutral (both the positive and negative
labels are absent) and mixed sentiment (both the positive and negative labels are present). For
the experiments in this paper, we use the test sets of all the four binary classification tasks of
SENTIPOLC 2016.
4.5. GxG
The Cross-Genre Gender Prediction task [6, GxG] was organized at EVALITA 2018. The shared
task falls in the area of author profiling, in particularly asking participant systems to predict
whether the author of a short text is a man or a woman. The texts come from five diferent
sources: Twitter (TW), YouTube (YT), children writings (CH), newspapers (JO, for journalism),
and personal diaries (DI). GxG places an emphasis on cross-dataset prediction, where a model
is trained on a set of data from one domain (or source, in this case) and predictions are made on
data from a diferent one. For this paper, we use the five sets independently, since no training is
involved in our experiment. In this binary classification task, there is no natural negative and
positive label, therefore we impose the arbitrary mapping man=negative label; woman=positive
label.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>In this section, we present the results of the experiment of prompt-based classification on
EVALITA tasks. The results are presented separately for each task, because evaluation metrics
may vary from one task to another — accuracy, F1-score of the positive class, and
macroaveraged F1-score are used. Moreover, we present the results, in Tables 4–7, along with the
baseline(s) and best systems according to the reports of the individual tasks.</p>
      <p>Task
IronITA task A</p>
      <p>System
Prompt-based 
Prompt-based
Prompt-based
Baseline (most frequent class)
Baseline (random)
Best system (ItaliaNLP)</p>
      <p>Score
.419
.469
.573
.334
.505
.731
AMI task A</p>
      <p>Prompt-based 
Prompt-based
Prompt-based
Baseline (most frequent class)
Best system (jigsaw)</p>
      <p>The results of this experiment show that prompt-based classification (at least, this simplified
version of it) usually beats trivial baselines, but otherwise underperforms with respect to
supervised models on benchmarks for the Italian language. This is expected, since the method
is fully zero-shot. The results on GxG, the only task related to author profiling, are closer to the
best performing systems of the shared task, indicating an expressive power of the language
models beyond the standing meaning of the text. Interestingly, the results vary widely between
pre-trained language models, with none of the three models being clearly superior to the others
across tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Conclusion</title>
      <p>The Betteridge’s law of headlines11 states that “any headline that ends in a question mark can be
answered by the word no”. This paper is no exception: the answer to the question Is EVALITA
11https://web.archive.org/web/20090226202006/http://www.technovia.co.uk/2009/02/
techcrunch-irresponsible-journalism.html
GxG CH
GxG DI
GxG JO
GxG TW
GxG YY</p>
      <p>System
Prompt-based 
Prompt-based
Prompt-based
Best system (ItaliaNLP)
Prompt-based 
Prompt-based
Prompt-based
Best system (ItaliaNLP)
Prompt-based 
Prompt-based
Prompt-based
Best system (UniOR)
Prompt-based 
Prompt-based
Prompt-based
Best system (ItaliaNLP)
Prompt-based 
Prompt-based
Prompt-based
Best system (ItaliaNLP)
done? is certainly no. The prompt-based systems presented in this papers are far from
the classification performance of their supervised counterparts on the EVALITA benchmarks.
This result is in stark contrast to results reported on English benchmarks12. Moreover, the
performance of the two Italian models and the multilingual model tested in this paper are
unstable, with some models apparently more fit to certain tasks than others, raising the question
whether the subpar performance is due to the method or the underlying language-specific
pre-trained models. However, the results of the prompt-based models could be undermined by
the lack of optimization of verbalizers and templates. There is certainly space for improvement,
which was not the main focus of this paper, including an analysis of the disagreement between
verbalizers, and of the actual output of the prompt-based models.</p>
      <p>It is worth noting that this new technology allows us to create zero-shot classifiers for
rather abstract language classification problems. Recent literature indicates that often few
training instances (few-shot learning) are suficient to increase the performance of
promptbased classifiers greatly [ 33]. Considering that the experiments in this paper make use only of
the most basic elements of prompt-based classification, this paradigm should be regarded as
a new frontier, not only for the advancement of text classification methodology, but also for
its evaluation. Supervised learning in NLP is perhaps not on its way to obsolescence (R1), but
the growing literature on zero-shot classification indicates at least that there is a new player
on the field. Would it make sense to organize a shared task as part of an evaluation campaign
like EVALITA where training data is not provided at all (R2)? The first results presented in this
12https://github.com/thunlp/OpenPrompt/tree/main/results/
paper seem to indicate that this is the case, paving the way for evaluation campaigns focused
on zero-shot learning for NLP.
discriminators rather than generators, in: International Conference on Learning
Representations, 2020, pp. 1 – 18. URL: https://openreview.net/forum?id=r1xMH1BtvB.
[26] J. Tiedemann, L. Nygaard, The OPUS corpus - parallel and free: http://logos.uio.no/opus, in:
Proceedings of the Fourth International Conference on Language Resources and Evaluation
(LREC’04), European Language Resources Association (ELRA), Lisbon, Portugal, 2004, pp.
1183–1186. URL: http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
[27] J. Abadji, P. J. O. Suárez, L. Romary, B. Sagot, Ungoliant: An optimized pipeline for the
generation of a very large-scale multilingual web corpus, in: H. Lüngen, M. Kupietz,
P. Bański, A. Barbaresi, S. Clematide, I. Pisetta (Eds.), Proceedings of the Workshop on
Challenges in the Management of Large Corpora (CMLC-9) 2021. Limerick, 12 July 2021
(Online-Event), Leibniz-Institut für Deutsche Sprache, Mannheim, 2021, pp. 1 – 9. URL:
https://nbn-resolving.org/urn:nbn:de:bsz:mh39-104688. doi:10.14618/ids-pub-10468.
[28] P. Basile, F. Cutugno, M. Nissim, V. Patti, R. Sprugnoli, et al., Evalita 2016: Overview of
the 5th evaluation campaign of natural language processing and speech tools for italian,
in: 3rd Italian Conference on Computational Linguistics, CLiC-it 2016 and 5th Evaluation
Campaign of Natural Language Processing and Speech Tools for Italian, EVALITA 2016,
volume 1749, CEUR-WS, 2016, pp. 1–4.
[29] G. Rehm, M. Berger, E. Elsholz, S. Hegele, F. Kintzel, K. Marheinecke, S. Piperidis, M.
Deligiannis, D. Galanis, K. Gkirtzou, P. Labropoulou, K. Bontcheva, D. Jones, I. Roberts, J.
Hajič, J. Hamrlová, L. Kačena, K. Choukri, V. Arranz, A. Vasil,jevs, O. Anvari, A. Lagzdin, š,
J. Mel,n, ika, G. Backfried, E. Dikici, M. Janosik, K. Prinz, C. Prinz, S. Stampler, D.
ThomasAniola, J. M. Gómez-Pérez, A. Garcia Silva, C. Berrío, U. Germann, S. Renals, O. Klejch,
European language grid: An overview, in: Proceedings of the Twelfth Language Resources
and Evaluation Conference, European Language Resources Association, Marseille, France,
2020, pp. 3366–3380. URL: https://aclanthology.org/2020.lrec-1.413.
[30] V. Basile, C. Bosco, M. Fell, V. Patti, R. Varvara, Italian NLP for everyone: Resources and
models from EVALITA to the European language grid, in: Proceedings of the Thirteenth
Language Resources and Evaluation Conference, European Language Resources
Association, Marseille, France, 2022, pp. 174–180. URL: https://aclanthology.org/2022.lrec-1.19.
[31] V. Basile, A. Bolioli, V. Patti, P. Rosso, M. Nissim, Overview of the evalita 2014 sentiment
polarity classification task, Overview of the Evalita 2014 SENTIment POLarity Classification
Task (2014) 50–57.
[32] F. Barbieri, V. Basile, D. Croce, M. Nissim, N. Novielli, V. Patti, Overview of the evalita
2016 sentiment polarity classification task, in: P. Basile, A. Corazza, F. Cutugno, S.
Montemagni, M. Nissim, V. Patti, G. Semeraro, R. Sprugnoli (Eds.), Proceedings of Third Italian
Conference on Computational Linguistics (CLiC-it 2016) &amp; Fifth Evaluation Campaign
of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA
2016), Napoli, Italy, December 5-7, 2016, volume 1749 of CEUR Workshop Proceedings,
CEUR-WS.org, 2016, pp. 1–11. URL: http://ceur-ws.org/Vol-1749/paper_026.pdf.
[33] T. Schick, H. Schütze, Exploiting cloze-questions for few-shot text classification and natural
language inference, in: Proceedings of the 16th Conference of the European Chapter of the
Association for Computational Linguistics: Main Volume, Association for Computational
Linguistics, Online, 2021, pp. 255–269. URL: https://aclanthology.org/2021.eacl-main.20.
doi:10.18653/v1/2021.eacl-main.20.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          , Preface to the
          <source>Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI)</source>
          , in: D.
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          , M. Polignano (Eds.),
          <source>Proceedings of the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI</source>
          <year>2022</year>
          )
          <article-title>co-located with 21th International Conference of the Italian Association for Artificial Intelligence (AI*IA</article-title>
          <year>2022</year>
          ), November 30,
          <year>2022</year>
          , CEUR-WS.org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Maro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          ,
          <article-title>Lessons learned from evalita 2020 and thirteen years of evaluation of italian language technology</article-title>
          ,
          <source>IJCoL. Italian Journal of Computational Linguistics</source>
          <volume>6</volume>
          (
          <year>2020</year>
          )
          <fpage>79</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Maro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>EVALITA</surname>
          </string-name>
          <year>2020</year>
          :
          <article-title>Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <source>Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online event</article-title>
          ,
          <year>December 17th</year>
          ,
          <year>2020</year>
          , volume
          <volume>2765</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2765</volume>
          /overview.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Evalita 2018: Overview on the 6th evaluation campaign of natural language processing and speech tools for italian</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), Turin, Italy,
          <source>December 12-13</source>
          ,
          <year>2018</year>
          , volume
          <volume>2263</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2263</volume>
          /paper001.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Polignano, Overview of the evalita 2018 aspect-based sentiment analysis task (absita)</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          ,
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Torino</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, Overview of the Evalita 2018 cross-genre gender prediction (GxG) task</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          ,
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Torino</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Tesconi, Overview of the EVALITA 2018 hate speech detection task</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          ,
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Torino</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Overview of the Evalita 2018 task on Irony Detection in Italian Tweets (IRONITA)</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          ,
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Torino</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Mattei</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Dell'Orletta, Multi-task learning in deep neural networks at EVALITA 2018</article-title>
          , in: T. Caselli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Novielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso (Eds.),
          <source>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), Turin, Italy,
          <source>December 12-13</source>
          ,
          <year>2018</year>
          , volume
          <volume>2263</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2263</volume>
          /paper013.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          , P. Rosso, AMI@EVALITA2020:
          <article-title>Automatic misogyny identification</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , P. Rosso, SardiStance@EVALITA2020:
          <article-title>Overview of the Task on Stance Detection in Italian Tweets</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Comandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Nuovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Russo</surname>
          </string-name>
          ,
          <article-title>HaSpeeDe 2@EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, TAG-it@EVALITA2020: Overview of the topic, age, and gender prediction task for italian</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miliani</surname>
          </string-name>
          , G. Giorgi, I. Rama, G. Anselmi,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Lebani</surname>
          </string-name>
          , DANKMEMES@EVALITA2020:
          <article-title>The memeing of life: memes, multimodality and politics</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>L. De Mattei</surname>
            , G. De Martino,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Iovine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Miaschi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Polignano</surname>
          </string-name>
          , G. Rambelli, ATE_ABSITA@
          <article-title>EVALITA2020: Overview of the aspect term extraction and aspect-based sentiment analysis task</article-title>
          , in: V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>M. Di</given-names>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , L. C. Passaro (Eds.),
          <article-title>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</article-title>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys (CSUR)</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Prototypical verbalizer for prompt-based few-shot tuning</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>7014</fpage>
          -
          <lpage>7024</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>483</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>483</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Calibrate before use: Improving few-shot performance of language models</article-title>
          , in: M.
          <string-name>
            <surname>Meila</surname>
          </string-name>
          , T. Zhang (Eds.),
          <source>Proceedings of the 38th International Conference on Machine Learning</source>
          , volume
          <volume>139</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>12697</fpage>
          -
          <lpage>12706</lpage>
          . URL: https://proceedings.mlr.press/v139/ zhao21c.html.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Le Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>How many data points is a prompt worth?, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2627</fpage>
          -
          <lpage>2636</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>208</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>208</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , M. Sun,
          <string-name>
            <surname>OpenPrompt:</surname>
          </string-name>
          <article-title>An opensource framework for prompt-learning, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</article-title>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>113</lpage>
          . URL: https://aclanthology. org/
          <year>2022</year>
          .acl-demo.
          <volume>10</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .acl-demo.
          <volume>10</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: NIPS'
          <fpage>17</fpage>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>6000</fpage>
          -
          <lpage>6010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets</article-title>
          , in: R. Bernardi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , G. Semeraro (Eds.),
          <source>Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          , Bari, Italy,
          <source>November 13-15</source>
          ,
          <year>2019</year>
          , volume
          <volume>2481</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2481</volume>
          /paper57.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sanguinetti, Long-term social media data collection at the university of turin</article-title>
          , in: E.
          <string-name>
            <surname>Cabrio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mazzei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Tamburini</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Fifth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2018</year>
          ), Torino, Italy,
          <source>December 10-12</source>
          ,
          <year>2018</year>
          , volume
          <volume>2253</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2253</volume>
          /paper48.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Electra:
          <article-title>Pre-training text encoders as</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>