<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>PERSEID - Perspectivist Irony Detection: A CALAMITA Challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ValerioBasile</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SilviaCasol a</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SimonaFrenda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soda MaremLo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Perspectivism, Irony Detection, Evaluation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Interaction Lab, Heriot-Watt University</institution>
          ,
          <addr-line>Edinburgh, Scotland</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MaiNLP &amp; MCML, LMU Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>aequa-tech</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>provided. Works in perspectivism and human label variation have emphasized the need to collect and leverage various voices and points of view in the whole Natural Language Processing pipeline. PERSEID places itself in this line of work. We consider the task of irony detection from short social media conversations in Italian collected from Twitter (X) and Reddit. To do so, we leverage data from MultiPICO, a recent multilingual dataset with disaggregated annotations and annotators' metadata, containing 1000 Post, Reply pairs with five annotations each on average. We aim to evaluate whether prompting LLMs with additional annotators' demographic information (namely gender only, age only, and the combination of the two) results in improved performance compared to a baseline in which only the input text is The evaluation is zero-shot; and we evaluate the results on the disaggregated annotations using f1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Challenge: Introduction and</title>
    </sec>
    <sec id="sec-3">
      <title>Motivation</title>
      <p>intrinsically subjectiv10e][, as points of view might
differ depending on users’ social background, beliefs, and
demographics. Using a single aggregated label has thus
Recently, researchers have shown a growing interesbteienn increasingly questione1d1,[12, 13], and
preservhuman-centered technologies to make Artificial Inteinllgi-disaggregated data is preferred. On the other hand,
gence (AI) models and products more attentive tortehceent work has shown that design choices and biases
users’ sensitivity and needs.</p>
      <p>
        In Natural Language Processing (NLP), works on peurn-expectedly aligned with a given population segment
spectivism 1[] and human label variatio2n]h[ave
emand thus the importance of incorporating a diverse seotthoefrs 1[
        <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
        ].
phasized the intrinsic variability in human annotattoiorneflect a minority of perspectives, under-representing
alyzing existing disagreemen6]t, [learning from disag-and by Plank2[]2.
voices; this aspect afects all phases of the NLP pipeline, As a result, disaggregated datasets have become more
including collecting disaggregated dat3a,s4e,t5s],[an- popular, as listed in the Perspectivist Data Man1 ifesto
gregated data7,[8], and evaluating considering severalResearchers are incresingly reporting annotators’
deafect datasets and models and often result in models
more than with anothe1r4][; in fact, aggregated data tend
voices as valid9[
        <xref ref-type="bibr" rid="ref1">, 1</xref>
        ].
      </p>
      <p>
        During the data collection and annotation pdhaastea,set, which was first advised as a good practice to
works in this area have gone beyond considering daivs-oid excluding, minimizing, and misrepresenting
ceragreement as motivated by noise only and thus astaanin groups of users16[]. Recent work has also explored
attribute to be minimized and resolved, e.g., throwughehther annotators’ demographics and background — as
majority voting. In contrast, research has emphasidzeesdcribed by available metadata — influence their
annothe necessity of collecting a variety of voices and tcaotni-on5[
        <xref ref-type="bibr" rid="ref4">, 17, 18, 19, 4</xref>
        ] and can help during the modeling
sidering all such voices as valid. The reason is twofoofldth.e phenomenon under stud2y0[, 8, 21].
On the one hand, researchers have argued that manDyespite the increasing interest in disaggregated and
tasks that are popular in the NLP community (inclmuedt-adata-rich datasets, few such datasets for irony
deing, for example, hate speech and humor detection) atreection exist. Simpson et [a2l2.] released a corpus for
mographics and other metadata when describing the
humor detection in English, used as a benchmark in the
shared task2[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. No annotators’ metadata, however, are
      </p>
      <sec id="sec-3-1">
        <title>1https://pdai.info/</title>
        <p>included. Frenda et a[4l]. proposed a dataset for irony • Gender (Task 2): theperspective is the binary
detection and investigated the influence of the annota- self-identified gender of the annotator.
tors’ demographics on their percepti6o]n.T[he dataset • Age + Gender (Task 3): in this case, both
atcontains English texts only. tributes are provided as tpherespective.</p>
        <p>For this challenge at CALAMIT2A4][, we propose to The post is a textual post, to which the tarregpelty
use the Italian portion of MultiPICo (MultilingualisPearr-eply. The output of the prediction is a binary label
spectivist Irony Corpu3s[)25]. Multipico is a multilinguailndicating whether trheeply is ironic (or non-ironic) for
corpus of short Post-Reply conversational pairs extraactheudman bearing the characteristic opfetrhsepective
from Twitter and Reddit and annotated as ironic otrontohtetext. The performance of the model is evaluated
ironic by crowdsourcing workers with diferent demtoh-rough a global f1 metric on the disaggregated
annotagraphics and backgrounds. MultiPICo cov9elrasnguages tions.
(Arabic, English, Dutch, French, German, Hindi, ItalianT, he challenge is zero-shot: no training, fine-tuning,
Portuguese, and Spanish) an2d5 language variet4ie,s or in-context learning is considered for this version of
ranging from high- to low-resourced ones. MoreovPeErR, SEID and the whole dataset can be used for inference.
a rich set of annotators’ sociodemographic informatioNnote that since each annotator can be described by no
(balanced gender, age, nationality, ethnicity, studentt,raanidts (Task 0), one single trait (Task 1 and Task 2), and
employment status) is provided. two traits (Task 3), we do not aim at optimal performance</p>
        <p>While no perspectivist task leveraging the datasetwhhaesn considering personalized irony detection; instead,
been proposed so far, PERSEID is related to the Learonur- goal is to understand whether models improve their
ing With Disagreement task held at SemEval 21012]1 [performance when one or multiple traits is provided and
and 2023 [13]. In LeWiDi, participant systems were chatl-o understand the impact of diferent configurations.
lenged to learn the distribution of labels, tested by cross
entropy-based metrics. In contrast, PERSEID aims at
stimulating the development of models of human p3e.r- Data description
spectives, in order to explain the label distributions rather
than just quantifying them. 3.1. Origin of data</p>
      </sec>
      <sec id="sec-3-2">
        <title>The data for the challenge are part of Multi2P5I]C,o [</title>
        <p>2. Challenge: Description a corpus of18, 778 short conversations collected from</p>
        <p>Reddit 8(, 956) and Twitter9,(822) in 9 languages, and a
The task of Perspectivist Irony Detection aims to meastuorteal o2f5 varieties.
models’ capability to detect irony in a short verbal Dexa-ta were collected to reproduce the structure of short
change for each annotator, conditioned on the knowlceodngeversations.
of demographic information about them. To this purposeF,or both Reddit and Twitter, ptohset is typically a
we want to look at diferent model performances if itmiesssage initiating a thread andrtehpley a direct reply
informed by one demographic trait or a combinationtootfhat messa5g.e
two. In particular, we focus on the gender and age of thReeddit data were retrieved using the Pushshift
reposiannotator, due to the balanced number of male antdofrey6- from January2020 to June2021. For Italian, data
male annotators by desi3g.2n, and due to the fact that agweere downloaded from the subreddit /r/Italy.
was shown to be one of the most polarized dimensionsPairs having at least one deleted or removed comment
in [25]. were filtered out, and the language of the messages was</p>
        <p>The input to the task does not consist only of a tfeuxrt,her validated using the Python library for language
but rather of a tuplpee&lt;rspective, post, reply&gt;. identification LangID7.</p>
        <p>In this iteration of PERSEID, we considered severalTwitter data were collected via Twitter Stream API,
variables for thperspective attribute: using the geolocation service and excluding quotes and
retweets. Then, the full conversation was retrieved, and
• None (Task 0): acting as a baseline, we want ttoweets that directly replied to the starting ones were
investigate the models’ outputs when no infroert-ained.</p>
        <p>mation about the annotator is provided. The data collection resulte1d8,in778 instances,
to• Age (Task 1): theperspective is one of four val- gether with their metadata, consisting of Post-Reply
origues encoding the age group of the annotator. inal IDs, subreddits, and geolocation information.
3MultiPICo is available ahtttps://huggingface.co/datasets5/For Reddit, second-level replies were collected in a minority of
Multilingual-Perspectivist-NLU/MultiPwICitoh a CC-BY 4.0 cases; for Twitter, tphoest is a reply to a thread-starting message
license. in a minority of cases.
4For example, texts in Austrian, German, and Swiss German 6ahrtetps://redditsearch.io/
included in the dataset. 7https://github.com/saffsd/langid.py
#Texts
2,181
1,000
2,999
1,760
2,375
786
1,000
1,994
4,683
18,778</p>
        <p>For Italian, data account for 1p0o0s0t, reply pairs,
equally sourced from Reddit and Twitter.
• Their completion rate had to be greater or equal
to99%
3.2. Annotation details
• They had to be native speakers of the considered
Annotators were asked to read a septosotf andreply language (i.e., Italian, for the portion of data used
pairs and answer whether the text orfetphley was ironic in the challenges)
or not, given the context. • The set of annotators needed to be balanced</p>
        <p>The human annotation of the collected data was per- across genders.
formed on the crowdsourcing platform Pr8o,ltihficrough
a custom-built annotation interface designed to collTehcetquality of the annotation was further assured
usa diverse and balanced set of annotators. The interi nfagcaettention check questions in the for“mPleoafse
anmimicked a message conversation, havingptohset as swer X to this question”. Annotators ha1d% probability of
context and asking whether rtehpely was Ironic or Not receiving these special questions. Annotators who failed
ironic. to respond correctly to at least 50% of these questions</p>
        <p>For Italian, 24 native-speaker annotators were hwireerde, excluded from the final corpus.
who performed 4,790 annotations in total, resulting inAarich set of metadata is also provided. These include
mean of 4,79 annotations per instance (see T1a)b.le the self-identified Gender (balanced by design), their
nationality, theAirge Group (1 GenX, 15 GenY, 8 GenZ, for
Italian)E, thnicity (23 white people, 1 mixed person, for</p>
      </sec>
      <sec id="sec-3-3">
        <title>Annotators were selected based on three criteria:</title>
        <p>Italian)S,tudent status (14 yes, 9 no, for ItalianE)m,ploy- 'reply_id': 2497527360959166890,
ment status (9 in full-time jobs, 7 unemployed, 5 working'source': 'twitter',
part-time, 1 not in paid work and 1 due to start, for'Ittiamle-stamp': '2022-12-07 15:49:50'
ian), as reported in Tab2l.e
3.3. Data format
3.4. Example of prompts used for
9No workers whose age i&gt;s 42, i.e., from the baby boomer
generations, participated in the annotation of the Italian portion of the
dataset
• “una persona giovane della generazione Z”</p>
        <p>if Generation == GenZ (Age &lt; 26)
• “una persona giovane della generazione Y”</p>
        <p>if Generation == GenY (26 ≤ Age &lt; 42)
• “una persona adulta della generazione X”</p>
        <p>if Generation == GenX (42 ≤ Age &lt; 58)
• “una persona adulta della generazione baby
boomer”
if Generation == Boomer (Age &gt; 58)</p>
      </sec>
      <sec id="sec-3-4">
        <title>Task 2 The perspective variable is a verbalization of the Gender variable, which is expressed as a string in English. It can be instantiated with one of two values:</title>
        <p>In the vast majority∼9(0%) of cases, the
conversation-starting messages and their direct
replies were downloaded to capture the full
conversational context. In a few cases, the
downloaded reply was not direct but rather a
secondlevel reply (a reply to a direct reply); thus, some
conversational context might be missing.</p>
        <p>
          Challenge design We describe annotators by no
sociodemographic traits (Task 0), one single
demographic trait (Task 1 and Task 2), or two
demographic traits (Task 3). We evaluate disaggregated
annotations at inference time, having the
annotators represented only by those traits. Annotators’
sociodemographic information does not always
align with the most relevant grouping of
annotators according to the language phenomenon
under study2[
          <xref ref-type="bibr" rid="ref1">1, 28</xref>
          ], and the limited amount of
sociodemographic traits we provide is
undoubtedly not enough to describe every single
annotator. We are aware of this limitation. In fact,
our main aim is to understand whether providing
one or more annotator traits makes the model
predictions more aligned with annotators having
a given characteristic.
        </p>
        <p>Task 3 The perspective variable is a verbalization of
both theAge andGender variables, e.g., “una
giovane donna della generazione Z.”</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Metrics</title>
      <p>• “una donna”</p>
      <p>if Gender == “Female”
• “un uomo”
if Gender == “Male”</p>
      <sec id="sec-4-1">
        <title>Inspired by Mokhberian et a[2l.6], the Perspectivist Irony</title>
        <p>Detection task is evaluated by meangsloobfal F1, that 6. Ethical issues
is, the F1-score computed across all the individual
annotations in the dataset against the predictions oTfhtishweork places itself in an increasing amount of work
model. that calls to consider and include the subjectivity of
the annotators in NLP applications, encouraging
reflection on the diferent perspectives encoded in annotated
5. Limitations datasets to minimize the amplification of biases. We hope
this challenge will be a starting point for investigating
Data The sociodemographic information about the aann-d evaluating LLMs in Italian to make them suitable for
notators is partial, bound to what was avai lfinaablleusers.
from the crowdsourcing platform, and following Tahe dataset used in the challenge was built by
adoptdiscretization of human personal traits that cionugldmeasures to protect the privacy of annotators, and
be perceived as forced (e.g., representing seltfh-e data handling protocols were designed to safeguard
identified gender as a single binary label). Fupre-rsonal information (like anonymization of users’
menthermore, as shown by Orlikowski et[2a1l]., an- tions). Although the attention during the collection of
notators’ sociodemographics do not always aldiganta was focused on ironic content spread online, we
with the most relevant grouping of annotataocrksnowledge that some of the material contains racist,
according to the language phenomenon undseerxist, stereotypical, violent, or generally disturbing
constudy. tent.</p>
        <p>Annotators of the Italian portion of MultiPICAOnnotators are balanced through their self-identified
tend to be young (with no annotators from gtehneder. However, we are aware that considering
genbaby boomer generation and only one frodmer in a binary form is limited; moreover, a substantial
GenX). This aspect might influence the results. unbalance for some dimensions, like the self-identified
ethnicities, is present in the dataset. This pattern
sugSimilarly to Sachdeva et[a5]l,. Sap et al.[19], gests the need to interact diferently with annotators or
Forbes et al[.27], we noticed the ethnicity of
an</p>
        <p>social communities if we want a diversity of annotators
notators is unbalanced, and all but one annotaatnodrpserspectives in terms of social background.
are white for the considered data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7. Data license and copyright issues</title>
      <p>leasing annotator-level labels and information inComputational Linguistics, Florence, Italy, 2019, pp.
datasets, in: Proceedings of the Joint 15th Linguis- 5716–5728. URL: https://aclanthology.org/P19-1.572
tic Annotation Workshop (LAW) and 3rd Designing doi:10.18653/v1/P19-1572.</p>
      <p>Meaning Representations (DMR) Workshop, 2021[,23] A. Uma, T. Fornaciari, A. Dumitrache, T. Miller,
p. 133–138. J. Chamberlain, B. Plank, E. Simpson, M. Poesio,
[16] E. M. Bender, B. Friedman, Data statements for nat- SemEval-2021 task 12: Learning with
disagreeural language processing: Toward mitigating sys- ments, in: A. Palmer, N. Schneider, N. Schluter,
tem bias and enabling better science, Transactions G. Emerson, A. Herbelot, X. Zhu (Eds.), Proceedings
of the Association for Computational Linguistics 6 of the 15th International Workshop on Semantic
(2018) 587–604. Evaluation (SemEval-2021), Association for
Com[17] D. Almanea, M. Poesio, ArMIS - the Arabic Misog- putational Linguistics, Online, 2021, pp. 338–347.
yny and Sexism Corpus with Annotator Subjective URL: https://aclanthology.org/2021.semeval.-1.41
Disagreements, in: N. Calzolari, F. Béchet, P. Blache, doi:10.18653/v1/2021.semeval-1.41.
K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isa[-24] G. Attanasio, P. Basile, F. Borazio, D. Croce, M.
Franhara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, cis, J. Gili, E. Musacchio, M. Nissim, V. Patti, M.
RiS. Piperidis (Eds.), Proceedings of the Thirteenth naldi, D. Scalena, CALAMITA: Challenge the
AbiliLanguage Resources and Evaluation Conference, ties of LAnguage Models in ITAlian, in:
ProceedEuropean Language Resources Association, Mar- ings of the 10th Italian Conference on
Computaseille, France, 2022, pp. 2282–2291. URLh:ttps: tional Linguistics (CLiC-it 2024), Pisa, Italy,
Decem//aclanthology.org/2022.lrec-1.244 ber 4 - December 6, 2024, CEUR Workshop
Proceed[18] S. Akhtar, V. Basile, V. Patti, Whose opinions mat- ings, CEUR-WS.org, 2024.</p>
      <p>ter? Perspective-aware models to identify opinio[2n5s] S. Casola, S. Frenda, S. Lo, E. Sezerer, A. Uva,
of hate speech victims in abusive language detec- V. Basile, C. Bosco, A. Pedrani, C. Rubagotti, V. Patti,
tion, arXiv preprint arXiv:2106.15896 (2021). D. Bernardi, MultiPICo: Multilingual perspectivist
[19] M. Sap, S. Swayamdipta, L. Vianna, X. Zhou, Y. Choi, irony corpus, in: L.-W. Ku, A. Martins, V. Srikumar
N. A. Smith, Annotators with attitudes: How an- (Eds.), Proceedings of the 62nd Annual Meeting
notator beliefs and identities bias toxic languageof the Association for Computational Linguistics
detection, in: Proceedings of the 2022 Conference (Volume 1: Long Papers), Association for
Compuof the North American Chapter of the Association tational Linguistics, Bangkok, Thailand, 2024, pp.
for Computational Linguistics: Human Language 16008–16021. URL: https://aclanthology.org/2024.
Technologies, Association for Computational Lin- acl-long.84.9
guistics, Seattle, United States, 2022, pp. 5884–590[266.] N. Mokhberian, M. Marmarelis, F. Hopp, V. Basile,
URL: https://aclanthology.org/2022.naacl-mai.n.431 F. Morstatter, K. Lerman, Capturing perspectives
doi:10.18653/v1/2022.naacl-main.431. of crowdsourced annotators in subjective learning
[20] R. Wan, J. Kim, D. Kang, Everyone’s voice mat- tasks, in: K. Duh, H. Gomez, S. Bethard (Eds.),
ters: Quantifying annotation disagreement using Proceedings of the 2024 Conference of the North
demographic information, in: Proceedings of the American Chapter of the Association for
Compu37th AAAI Conference on Anrtificial Intelligence - tational Linguistics: Human Language
TechnoloAAAI Special Track on AI for Social Impact, 2023. gies (Volume 1: Long Papers), Association for
Com[21] M. Orlikowski, P. Röttger, P. Cimiano, D. Hovy, The putational Linguistics, Mexico City, Mexico, 2024,
ecological fallacy in annotation: Modeling human pp. 7337–7349. URL: https://aclanthology.org/2024.
label variation goes beyond sociodemographics, in: naacl-long.4 0.7
A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Pro[-27] M. Forbes, J. D. Hwang, V. Shwartz, M. Sap, Y. Choi,
ceedings of the 61st Annual Meeting of the Associa- Social chemistry 101: Learning to reason about
sotion for Computational Linguistics (Volume 2: Short cial and moral norms, in: Proceedings of the 2020
Papers), Association for Computational Linguis- Conference on Empirical Methods in Natural
Lantics, Toronto, Canada, 2023, pp. 1017–1029. URL: guage Processing (EMNLP), Association for
Comhttps://aclanthology.org/2023.acl-sho.rdto.8i:810. putational Linguistics, Online, 2020, pp. 653–670.
18653/v1/2023.acl-short.88. URL: https://aclanthology.org/2020.emnlp-mai.n.48
[22] E. Simpson, E.-L. Do Dinh, T. Miller, I. Gurevych, doi:10.18653/v1/2020.emnlp-main.48.</p>
      <p>Predicting humorousness and metaphor novel[t2y8] S. M. Lo, V. Basile, Hierarchical clustering of
labelwith Gaussian process preference learning, in: based annotator representations for mining
perA. Korhonen, D. Traum, L. Màrquez (Eds.), Proceed- spectives, in: G. Abercrombie, V. Basile, D. Bernardi,
ings of the 57th Annual Meeting of the Associa- S. Dudy, S. Frenda, L. Havens, E. Leonardelli,
tion for Computational Linguistics, Association for S. Tonelli (Eds.), Proceedings of the 2nd Workshop
on Perspectivist Approaches to NLP co-located with
26th European Conference on Artificial Intelligence
(ECAI 2023), Kraków, Poland, September 30th, 2023,
volume 3494 ofCEUR Workshop Proceedings,
CEURWS.org, 2023. URL:https://ceur-ws.org/Vol-3494/
paper8.pdf.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poesio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uma</surname>
          </string-name>
          , et al.,
          <article-title>We need to consider disagreement in evaluation</article-title>
          , in: Proceed-
          <volume>18653</volume>
          /v1/
          <year>2023</year>
          .emnlp-main.
          <source>212. ings of the 1st workshop on benchmarking: past[</source>
          ,9]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Uma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>present and future, Association for Computational M. Poesio, Learning from disagreement: A survey,</article-title>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          .
          <source>Journal of Artificial Intelligence Research</source>
          <volume>72</volume>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , The ”problem” of human label variation:
          <fpage>1385</fpage>
          -
          <lpage>1470</lpage>
          .
          <article-title>On ground truth in data, modeling and evaluat</article-title>
          [i1o0n],
          <string-name>
            <given-names>L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          ,
          <article-title>Truth is a lie: in:</article-title>
          <source>Proceedings of the 2022 Conference on Empiri- Crowd truth and the seven myths of hucal Methods in Natural Language Processing</source>
          ,
          <year>2022</year>
          , man annotation,
          <source>AI</source>
          Magazine
          <volume>36</volume>
          (
          <year>2015</year>
          ) pp.
          <fpage>10671</fpage>
          -
          <lpage>10682</lpage>
          .
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          . URL: https://ojs.aaai.org/aimagazine/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>Toward a per- index</article-title>
          .php/aimagazine/article/view/2.564 spectivist turn in
          <source>ground truthing for predictive doi:10.1609/aimag.v36i1.2564. computing, in: Proceedings of the AAAI Con</source>
          [-11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Leonardelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Aprosio</surname>
          </string-name>
          , M. Guerini,
          <source>ference on Artificial Intelligence</source>
          , volume
          <volume>37</volume>
          ,
          <year>2023</year>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          , Agreeing to disagree: Annotating ofenpp.
          <fpage>6860</fpage>
          -
          <lpage>6868</lpage>
          . URL: https://ojs.aaai.org/index.php/ sive language datasets with annotators' disagreeAAAI/article/view/2584.0 ment, in
          <source>: Proceedings of the 2021 Conference on</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pedrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. T. Empirical</surname>
          </string-name>
          <article-title>Methods in Natural Language Processing</article-title>
          , Cignarella,
          <string-name>
            <given-names>R.</given-names>
            <surname>Panizzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Scarlini</surname>
          </string-name>
          ,
          <year>2021</year>
          , p.
          <fpage>10528</fpage>
          -
          <lpage>10539</lpage>
          . V.
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Bernardi</surname>
          </string-name>
          , EPIC: Mult[
          <fpage>i1</fpage>
          -2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Uma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dumitrache</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Miller, perspective annotation of a corpus of irony</article-title>
          , in: J.
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Plank</surname>
            , E. Simpson,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poesio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.), Pro- Semeval-2021 task 12:
          <article-title>Learning with disagreeceedings of the 61st Annual Meeting of the Associa- ments</article-title>
          , in
          <source>: Proceedings of the 15th International tion for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Workshop on Semantic Evaluation (SemEval-2021)</surname>
          </string-name>
          , Papers),
          <source>Association for Computational Linguis- 2021</source>
          , pp.
          <fpage>338</fpage>
          -
          <lpage>347</lpage>
          . tics, Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>13844</fpage>
          -
          <lpage>13857</lpage>
          . URL:[13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Leonardelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Abercrombie</surname>
          </string-name>
          , D. Alhttps://aclanthology.org/
          <year>2023</year>
          .acl-lon.
          <source>gd.7o7i4:10</source>
          .
          <string-name>
            <surname>manea</surname>
            , V. Basile,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Fornaciari</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Plank</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Rieser</surname>
          </string-name>
          ,
          <volume>18653</volume>
          /v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .774.
          <string-name>
            <surname>M. Poesio</surname>
          </string-name>
          , Semeval-2023 task 11:
          <article-title>Learning with</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sachdeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barreto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sahn</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>von disagreements (lewidi)</article-title>
          ,
          <source>in: Proceedings of the 17th Vacano</source>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          ,
          <article-title>The measuring hate speech International Workshop on Semantic Evaluation corpus: Leveraging rasch measurement theory for (SemEval-</article-title>
          <year>2023</year>
          ),
          <year>2023</year>
          , p.
          <fpage>2304</fpage>
          -
          <lpage>2318</lpage>
          . data perspectivism, in: G. Abercrombie, V.
          <source>Basil[e1</source>
          ,4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Santy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Le</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Reinecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rieser</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Uma (Eds.), Proceedings of NLPositionality:
          <article-title>Characterizing design biases the 1st Workshop on Perspectivist Approaches to of datasets and models</article-title>
          ,
          <source>in: Proceedings of NLP @LREC2022</source>
          ,
          <article-title>European Language Resources the 61st Annual Meeting of the Association for Association</article-title>
          , Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          . URL:
          <string-name>
            <surname>Computational Linguistics</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          : Long Pahttps://aclanthology.org/
          <year>2022</year>
          .nlperspectives.-
          <volume>1</volume>
          .11 pers), Association for Computational Linguistics,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Casola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Scarlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marco</surname>
          </string-name>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>9080</fpage>
          -
          <lpage>9102</lpage>
          . URLh:ttps:// V. Basile,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bernardi</surname>
          </string-name>
          ,
          <article-title>Does anyone see the irony aclanthology</article-title>
          .org/
          <year>2023</year>
          .
          <article-title>acl-long.</article-title>
          .
          <source>d50o5i:10</source>
          .18653/ here?
          <article-title>Analysis of perspective-aware model predic- v1/2023.acl-long</article-title>
          .
          <volume>505</volume>
          . [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Davani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diaz</surname>
          </string-name>
          , On re-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>