<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>When Figures Speak with Irony: Investigating the Role of Rhetorical Figures in Irony Generation with LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pier Felice Balestrucci</string-name>
          <email>pierfelice.balestrucci@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Oliverio</string-name>
          <email>michael.oliverio@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soda Marem Lo</string-name>
          <email>sodamarem.lo@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Anselma</string-name>
          <email>luca.anselma@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <email>valerio.basile@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Mazzei</string-name>
          <email>alessandro.mazzei@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <email>viviana.patti@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Irony poses a persistent challenge for computational models because it depends on context, implicit meaning, and pragmatic cues. This study investigates the ability of Large Language Models (LLMs) to generate ironic content by focusing on rhetorical ifgures-pragmatic devices that may shape and signal ironic intent. Using two datasets, TWITTIRÒ-UD and the Italian subset of MultiPICo, we fine-tune multilingual LLMs for rhetorical figure classification and evaluate their capacity to generate ironic Italian texts. Our work addresses two main questions: (1) how accurately LLMs can classify rhetorical figures in ironic Italian texts, and (2) whether such training supports the generation of irony that reflects human-like rhetorical usage. Human evaluation shows that LLMs achieve fair agreement with annotators in rhetorical figure classification, indicating a partial but promising alignment with human judgment. By leveraging rhetorical figures as a bridge between irony detection and generation, our results suggest that such training improves the stylistic control and interpretability of LLM-generated ironic language.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Rhetorical Figures</kwd>
        <kwd>Irony Generation</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>• RQ1: To what extent can LLMs accurately
clas</p>
      <p>sify rhetorical figures in ironic Italian texts?
• RQ2: Does fine-tuning LLMs on rhetorical
figure classification lead to the generation of more
human-like ironic replies, in terms of rhetorical that LLMs are capable of learning to produce ironic
condevices? tent, and explore the possibility of linking irony
generation to the socio-demographic characteristics of user
profiles—such as generational groups—with the goal of
generating personalized ironic content tailored to
diferent age groups.</p>
      <sec id="sec-1-1">
        <title>To address these questions, we fine-tune a set of mul</title>
        <p>tilingual open-weight LLMs on rhetorical figure
classification and assess their performance. We then enrich the
Italian subset of MultiPICo with automatic annotations
and conduct a human evaluation to validate a small
sample extracted from that corpus. Finally, we use the best- 3. Datasets
performing fine-tuned model to generate new replies to
ironic posts in MultiPICo and carry out a linguistic analy- TWITTIRÒ-UD A collection of ironic Italian tweets
sis of the model-generated replies, comparing them with annotated according to the Universal Dependencies
human-written ones. framework. TWITTIRÒ-UD was created by enriching</p>
        <p>
          This work contributes to (i) advancing the research a resource originally developed for the fine-grained
aninto rhetorical figure classification using LLMs, by prov- notation of irony [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The original corpus consists of
ing the efectiveness of Chain-of-Thought fine-tuning 1, 424 tweets, with a total of 28, 387 tokens [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Each
strategy; (ii) improving the interpretability of LLMs in tweet in the corpus has been annotated with the
correpragmatic text generation, showing that rhetorical figure- sponding rhetorical figure used to convey irony, such as
aware models tend to create sentences stylistically more OXYMORON PARADOX, HYPERBOLE, or EUPHEMISM. The
similar to human-written texts.1 treebank includes both the fine-grained annotation for
ironic tweets introduced in Karoui et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and the
morphological and syntactic information encoded in the UD
2. Related Works format.2 Figure 1 shows the distribution of rhetorical
ifgures in the corpus.
        </p>
        <p>
          MultiPICo The dataset consists of disaggregated
multilingual posts and replies from social media, each
annotated to indicate whether the reply is ironic given
the post. The corpus includes 18, 778 post–reply pairs,
collected from Reddit (8, 956) and Twitter (9, 822), and
covers 9 diferent languages. A total of 506 annotators,
with diferent sociodemographic information, carried out
the annotations, producing 94, 342 individual labels (an
average of 5.02 per conversation). Each annotation is
accompanied by sociodemographic metadata about the
annotator, including gender, age, ethnicity, student
status, and employment status. For the Italian subset of the
Rhetorical Figure Classification There are mainly
two approaches to the automatic detection and
classification of rhetorical figures in natural language:
ontologybased methods and machine learning techniques [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ].
        </p>
        <p>
          These approaches have shown efectiveness in
supporting tasks such as sentiment analysis and intent
classification [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. Several studies focus on their relationship
with irony [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ], particularly in the context of irony
detection. In this vein, Karoui et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], drawing on
wellestablished linguistic theories that explore the interplay
between irony and rhetorical figures—such as oxymoron,
paradox, false assertion, and analogy—propose an
annotation schema for classifying these categories of irony in
social media texts. Their work focuses on French, English,
and Italian, highlighting the relevance of irony categories
and markers for a linguistically informed approach to
irony detection.
        </p>
        <p>
          Irony Generation Irony generation remains a
relatively underexplored area in Natural Language
Generation. especially when compared to the growing literature
on humor, puns, and sarcasm [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. Recent work has
begun to model sarcasm through linguistic features such
as valence reversal and contextual incongruity [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ],
yet irony is still rarely addressed directly.
        </p>
        <p>
          Among the more recent studies on irony generation,
Balestrucci et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] propose an approach that leverages
LLMs to generate ironic text. The authors demonstrate
1All code and experimental results are publicly available at: https: 2https://github.com/UniversalDependencies/UD_
//github.com/MichaelOliverio/IronyDetection. Italian-TWITTIRO
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Rhetorical Figure Classification</title>
      <p>
        corpus, 24 annotators provided 4, 790 annotations on
1, 000 post–reply pairs [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].3
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Methodology</title>
      <p>In this section, we evaluate a set of LLMs for rhetorical
ifgure classification. We fine-tune several open-weight,
mid-sized LLMs using two diferent approaches on the
original TWITTIRÒ-UD split (see Table 2). To highlight
the impact of fine-tuning on rhetorical figure
classification, we compare the performance of the fine-tuned
models against two baselines: a random classifier and a
zeroshot prompting approach. Our experiments involve five
multilingual LLMs: Qwen2.5-7B-Instruct4 (referred to
as Qwen2.5-7B), Llama-3.1-8B-Instruct5 (Llama-3.1-8B),
Ministral-8B-Instruct-24106 (Ministral-8B),
LLaMAntino3-ANITA-8B-Inst-DPO-ITA7 (LLaMAntino-3-8B), and
Minerva-7B-instruct-v1.0 (Minerva-7B).8</p>
      <sec id="sec-3-1">
        <title>To assess the ability of LLMs to analyze ironic Italian texts</title>
        <p>
          and classify rhetorical figures, we adopted the annotation
scheme proposed by Karoui et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], which defines a
set of rhetorical figures commonly used to convey irony
(summarized in Table 1).
        </p>
        <p>We selected several open-weight multilingual LLMs
trained on Italian data and fine-tuned them on the
TWITTIRÒ dataset for the task of rhetorical figure
classification. Models’ performances were evaluated against two
baselines: (i) a random classifier and (ii) a
promptingbased approach. The best-performing model was then
used to enrich the ironic Italian subset of the MultiPICo Table 2
dataset—aggregated by majority vote—with rhetorical Data split statistics for the TWITTIRÒ-UD dataset.
ifgure annotations. To validate the model’s predictions, Train Dev Test
we conducted a human evaluation on a small subset of #Tweets 1, 138 144 142
the annotated data. Avg. Tokens 20.77 20.80 20.96</p>
        <p>
          Finally, to address the second research question, we
focused on ironic post–reply pairs in Italian from
MultiPICo, again selected via majority vote, and compared Fine-tuning was performed using two diferent prompt
the distribution of rhetorical figures across three types of strategies, described below, both relying on Low-Rank
replies: (i) automatically generated by an LLM fine-tuned Adaptation (LoRA) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
to recognize rhetorical figures, (ii) replies generated by
the same model out-of-the-box, and (iii) written by hu- Instruction Fine-Tuning In this approach, which we
mans. In addition to comparing the distributions, we refer to as FT, we trained all the models (training
deconducted a linguistic analysis of these replies. A repre- tails are available in Appendix A), using the following
sentative sample of the generated content was manually instruction:
annotated to support this evaluation.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3https://huggingface.co/datasets/Multilingual-Perspectivist-NLU/</title>
        <p>MultiPICo</p>
      </sec>
      <sec id="sec-3-3">
        <title>Given the ironic sentence (INPUT),</title>
        <p>identify and return the rhetorical figure</p>
      </sec>
      <sec id="sec-3-4">
        <title>4https://huggingface.co/Qwen/Qwen2.5-7B-Instruct</title>
        <p>5https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
6https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
7https://huggingface.co/swap-uniba/
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
8https://huggingface.co/sapienzanlp/Minerva-7B-instruct-v1.0</p>
        <p>
          Table 3 reports the evaluation results. The baselines
used are: (i) a random classifier (Random), which assigns
one of the eight possible labels uniformly at random to
each input, and (ii) a zero-shot prompting approach. For
the latter, we selected the best-performing model overall
egories—especially EUPHEMISM, for which it made no linguistics—manually annotated a subset of 20 out of the
correct predictions (0 out of 8). These results highlight a 278 ironic post-reply pairs. The annotators were tasked
substantial margin for improvement in this task and sug- to specify the rhetorical figures used to express irony in
gest the need for further investigation into the model’s the reply given the corresponding post, selecting one or
behavior and the characteristics of under-represented or more labels from those reported in Table 1.
more challenging rhetorical categories. The annotators achieved an average Cohen’s  score
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] of 0.63 on a subset of 20 post–reply pairs, a value
comparable to that reported by Karoui et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for the
6. MultiPICo Enrichment same task (0.60), indicating substantial agreement.
Krippendorf’s  [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] was also computed, yielding a score
This section focuses on enriching the Italian MultiPICo of 0.60, which confirms a similarly substantial level of
with annotations of rhetorical figures. To this end, we em- inter-annotator reliability.
ploy the best-performing rhetorical figure classification We then compared the human annotations with the
model (see Table 3), Ministral-8B with CoT-FT, to clas- predictions produced by our automatic model. The
resify rhetorical figures in the Italian post-reply pairs. As sulting Krippendorf’s  was 0.21, corresponding to a
mentioned in Section 3, MultiPICo consists of both ironic fair level of agreement.
and non-ironic post-reply pairs. Therefore, we extract To better understand this result, we examined the 14
only the ironic pairs from the dataset, using a majority out of 20 pairs where both annotators assigned the same
vote approach to determine whether a post-reply pair label. In 3 of these cases, the model’s prediction matched
is ironic, given the disaggregated nature of MultiPICo, the human annotation exactly.
resulting in a subset of 278 ironic post-reply pairs.
        </p>
        <p>For example, for the post: Due si candidano in quanto
"ci vuole una donna" nel #Pd: #Schlein e #DeMicheli. Una
sola domanda: perché?” (Two women are running for
ofifce in the Democratic Party because ‘we need a woman’:
Schlein and DeMicheli. One question: why?”) the
reply: @USER Perché per un canguro è ancora presto.”
(Because for a kangaroo it’s still too early.”) was labeled as
CONTEXT SHIFT by both annotators and the model. The
label was assigned due to the sudden change in topic,
introducing an unexpected element (the kangaroo) that
breaks coherence and signals irony.</p>
        <p>In the remaining 11 cases where the model’s prediction
did not match humans’ annotations, the model frequently
Figure 3: Distribution of rhetorical figures extracted from the labeled replies as OXYMORON PARADOX when annotators
Italian MultiPICo corpus. had chosen OTHER—this occurred in 6 out of the 11 pairs.</p>
        <p>Consider the following example: “Salvini ripropone il
ponte sullo stretto di Messina, opera imprescindibile per</p>
        <p>
          We then use our model to classify the rhetorical figures lo sviluppo economico. Condivido e rilancio:
contestualin this subset. As shown in Figure 3, the most frequently mente realizzerei anche il tunnel sottomarino Civitavecchia
extracted rhetorical figures in the post–reply pairs are - Cagliari. Dai non facciamo come al solito la figura dei
CONTEXT SHIFT (25.9%) and OXYMORON PARADOX barboni, pensiamo in grande” (“Salvini reintroduces the
(21.9%), while the least frequent are EUPHEMISM and Strait of Messina bridge proposal, a crucial infrastructure
HYPERBOLE (1.8% each). This distribution closely re- for economic development. I agree and raise: let’s also build
sembles that of TWITTIRÒ, and the high frequency of the Civitavecchia–Cagliari submarine tunnel. Let’s not be
CONTEXT SHIFT may be attributed to the nature of our usual broke selves—let’s think big!”) with the reply: “Si
post–reply interactions, where replies often reframe or può proporre il ponte Palermo–Cagliari già che ci siamo. . .
shift the meaning of the corresponding posts. Given the una spesa unica. . . compri uno, paghi tre. . . no com’è la
dificulty in classifying some rhetorical figures, as high- storia?” (“We might as well propose a Palermo–Cagliari
lighted in Table 2, we carry out a human evaluation in bridge while we’re at it. . . one payment for three projects. . .
Section 6.1 to assess the quality of the model predictions. or how does it go again?”)
Here, the model likely interpreted the absurdity of the
6.1. Human Evaluation reply as a rhetorical figure of type OXYMORON PARADOX,
whereas human annotators labeled it as a case of sarcasm,
Following the annotation guidelines in Karoui et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and thus as OTHER.
two authors of this paper—both expert in computational An illustrative example of the remaining cases is the
following: “Lo scrivo per tanti idioti che rispondono ai Twit- squats with the couch, push-ups using the
ter come le pecore. Sono un Sovranista, non sono vaccinato, cofee table, and some presses with the cat!)
non pagherò la multa e la mia Libertà non è in svendita.”
(“I write this for all the idiots who respond to tweets like
sheep. I’m a sovereignist, I’m unvaccinated, I won’t pay Table 4
the fine, and my freedom is not for sale.”) with the reply: Distribution of rhetorical figures in human and
model“Lo scrivo per te . . . non bere più” (“I write this for you. . . generated ironic replies (rep.) from MultiPICo. CoT-FT refers
stop drinking.”) to the fine-tuned model; Baseline to the non-fine-tuned
ver
        </p>
        <p>In this case, the model assigned the label ANALOGY, sion.
possibly misled by the introductory phrase in the post, Human rep. Model rep.
failing to capture the sarcastic tone of the reply. This CoT-FT Baseline
example suggests that prompt design could be improved
to better guide the model’s focus toward the reply and HAYNPAELROBGOYLE 450 538 421
its pragmatic intent. EUPHEMISM 5 9 6</p>
        <p>This evaluation highlights the LLM’s ability to produce RHETORICAL QUESTION 45 34 64
overall reasonable outputs. Although its performance is OXYMORON PARADOX 61 67 51
not particularly high, it can still serve as a useful tool for CONTEXT SHIFT 72 62 52
silver annotation, thanks to the reasoning and explana- FALSE ASSERTION 32 35 34
tions it provides. OTHER 18 10 28</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Irony Generation</title>
      <p>
        Inspired by previous work on irony generation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we
investigated whether a model trained to classify
rhetorical figures would also be capable of employing them
during generation—producing ironic outputs comparable
to those written by humans in terms of rhetorical
figures. To explore this hypothesis, we considered the 278
post–reply pairs selected in Section 6, using the posts as
input to the best-performing model for rhetorical figure
classification. The model was prompted to generate an
ironic reply for each post, which was then compared to
the original human-written reply. As a baseline, we used
the same model in its non–fine-tuned version, applying
the same prompting strategy. To illustrate this process,
we provide the following example:
      </p>
      <p>Instruction: Ti viene fornito in input
(INPUT) un post estratto da
conversazioni sui social media. Fornisci in
output (OUTPUT) una risposta ironica
in italiano. (You are given as input
(INPUT) a post extracted from social
media conversations. Provide as output
(OUTPUT) an ironic reply in Italian.)</p>
      <sec id="sec-4-1">
        <title>Input: Consigli su workout in casa in</title>
        <p>questo periodo di palestre chiuse? (Any
tips for home workouts during this period
of gym closures?)</p>
        <p>Table 4 presents the distribution of rhetorical figures
in the ironic replies generated by humans, the fine-tuned
model, and the baseline model, all classified by
Ministral8B with CoT-FT. Overall, the diferences across
distributions are not substantial, but some trends are worth
noting.</p>
        <p>The fine-tuned model produces slightly more ANALOGY
and EUPHEMISM compared to humans, which may reflect
the influence of the TWITTIRÒ training data, where
these categories are relatively well represented.
Conversely, CONTEXT SHIFT appears underrepresented in
the model outputs compared to human replies, which
could be due to either the complexity of capturing
discourse-level phenomena.</p>
        <p>
          Interestingly, the baseline model shows a notable
increase in the use of RHETORICAL QUESTION and OTHER,
suggesting a more generic or less targeted use of
rhetorical strategies when the model is not fine-tuned. This may
indicate that zero-shot generation leads to a reliance on
broadly applicable or ambiguous rhetorical patterns, as
already seen in Balestrucci et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>To better understand these patterns and assess the
reliability of the automatic classification, we conducted
a human evaluation on a subset of 20 model-generated
replies from both systems.</p>
        <p>Specifically, the same two annotators from Section 6.1
independently labeled the rhetorical figures predicted by
the models. Inter-annotator agreement was substantial,
with a Cohen’s  of 0.68 and a Krippendorf’s  of 0.65.
In contrast, the Krippendorf’s  between the annotators
and the classifier was 0.26, confirming all the previous
results.
7.1. Linguistic Analysis</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>8. Conclusions</title>
      <p>
        Following the approach proposed by Balestrucci et Our study explored the extent to which rhetorical
figal. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we also conducted a linguistic analysis focus- ures can serve as a bridge between the detection and
ing on specific stylistic markers—namely, average token generation of ironic content in Italian. We showed that
length, type-token ratio (TTR), and the use of interjec- fine-tuning LLMs on rhetorical figure classification
entions and negations—across human-written replies and ables models to identify key linguistic devices involved
model-generated outputs. in irony with reasonable accuracy. The best results were
obtained using a CoT strategy, which guided models
Table 5 to provide explanations before predicting the
rhetorLinguistic analysis for human-written posts, human-written ical category. While the models performed well on
replies, fine-tuned model generations (CoT-FT), and baseline frequently represented figures such as ANALOGY and
generations (Baseline): average number of tokens (Tokens), RHETORICAL QUESTION, they struggled with more
subtype/token ratio (TTR), and average occurrences of interjec- tle or under-represented categories like EUPHEMISM,
sugtions (Interjections) and negations (Negations). gesting that further refinement and data augmentation
      </p>
      <p>Human Model Replies may be needed.</p>
      <p>Post Reply CoT-FT Baseline For the irony generation task, we observed that models
Tokens 30.586 12.471 20.173 22.399 ifne-tuned on rhetorical figure classification produced
TTR 0.924 0.956 0.938 0.935 ironic replies that more closely resembled human
outInterjections 0.594 0.273 0.381 0.507 puts in terms of rhetorical devices and stylistic markers.
Negations 0.050 0.072 0.410 0.982 Although the overall distribution of rhetorical figures
remained similar across models, the fine-tuned version
demonstrated a more balanced use of devices, reducing
the over-reliance on rhetorical questions and
interjections observed in the baseline. This suggests that
rhetorical figure awareness acquired through classification can
positively influence generation, even in the absence of
explicit training on ironic text generation.</p>
      <p>Manual evaluation confirmed the model’s ability to
generate plausible annotations and replies, albeit with
fair agreement compared to human annotators.
Nonetheless, the consistency and interpretability of its
outputs highlight its potential as a tool for silver
annotation—particularly valuable in low-resource settings.
Finally, our linguistic analysis showed that the fine-tuned
model better preserved lexical diversity and pragmatic
subtlety than its non-fine-tuned counterpart, indicating
that rhetorical figure classification fine-tuning may also
serve as a form of stylistic control. Taken together, these
ifndings point to the value of leveraging rhetorical figures
to enhance both the interpretability and expressiveness
of LLMs in pragmatic language generation.</p>
      <p>As future work, we plan to extend this study to other
languages, such as French and English, with the goal
of comparing the capacity of LLMs to classify
rhetorical figures and generate ironic content across diferent
linguistic contexts.</p>
      <p>Moreover, a key research direction we intend to
pursue concerns the perspectivist nature of the MultiPICo
dataset. In particular, we aim to explore whether
rhetorical figures function as shared cues in the perception of
irony across diferent sociodemographic groups, thereby
pointing to the existence of rhetorical devices that act as
universal markers of ironic intent.</p>
    </sec>
    <sec id="sec-6">
      <title>9. Limitations</title>
      <sec id="sec-6-1">
        <title>Despite the promising results, this work presents several</title>
        <p>limitations that call for further investigation.</p>
        <p>First, the rhetorical figure classification task was
trained and evaluated on a relatively small dataset
(TWITTIRÒ-UD), which may hinder the generalizability
of the models—particularly for under-represented
categories such as EUPHEMISM and HYPERBOLE. While
finetuning contributes to improved performance, the models
still struggle with these categories, likely due to data
sparsity and the intrinsic ambiguity of certain rhetorical
devices.</p>
        <p>Second, the human evaluation was conducted on a
relatively limited subset, which reduces the statistical
robustness of the agreement scores. Although the results align
with previous studies and provide qualitative insights
into model behavior, a larger annotation efort would
be needed to draw more conclusive findings—especially
when distinguishing between closely related rhetorical
categories. However, large-scale human annotation
remains time-consuming and costly.</p>
        <p>Finally, this study did not include a direct comparison
with models explicitly fine-tuned for irony generation.
Such a comparison would be necessary to better assess
the specific contribution of rhetorical figure classification
to the generation of ironic content, and to determine
whether the observed improvements are attributable to
rhetorical awareness or other factors.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Acknowledgments Michael Oliverio was partially</title>
        <p>funded by the ‘Multilingual Perspective-Aware NLU’
project in partnership with Amazon Alexa.</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content
as needed and take(s) full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Muecke</surname>
          </string-name>
          , Irony and the Ironic, Methuen, London,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sravanthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Doshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tankala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Murthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dabre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <article-title>Pub: A pragmatics understanding benchmark for assessing llms' pragmatics capabilities</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics ACL</source>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>12075</fpage>
          -
          <lpage>12097</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Karoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Moriceau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Aussenac-Gilles, Exploring the impact of pragmatic phenomena on irony detection in tweets: A multilingual corpus study</article-title>
          , in: M.
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blunsom</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Koller (Eds.),
          <source>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , Association for Computational Linguistics, Valencia, Spain,
          <year>2017</year>
          , pp.
          <fpage>262</fpage>
          -
          <lpage>272</lpage>
          . URL: https://aclanthology.org/E17-1025/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Athanasiadou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Colston</surname>
          </string-name>
          ,
          <source>The Diversity of Irony</source>
          , volume
          <volume>65</volume>
          ,
          <string-name>
            <surname>Walter de Gruyter GmbH</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>Co</surname>
            <given-names>KG</given-names>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mladenovic</surname>
          </string-name>
          ,
          <article-title>Ontology-based recognition of rhetorical figures, Infotheca</article-title>
          ,
          <source>Journal for Digital Humanities</source>
          <volume>16</volume>
          (
          <year>2016</year>
          )
          <fpage>24</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z. L.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Masui</surname>
          </string-name>
          , G. Leliwa,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wroczynski</surname>
          </string-name>
          ,
          <article-title>Machine learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <fpage>102600</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Strommer</surname>
          </string-name>
          ,
          <article-title>Using rhetorical figures and shallow attributes as a metric of intent in text (</article-title>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubremetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nivre</surname>
          </string-name>
          , Rhetorical figure detection: Chiasmus, epanaphora, epiphora,
          <source>Frontiers in Digital Humanities</source>
          Volume 5
          <article-title>-</article-title>
          <year>2018</year>
          (
          <year>2018</year>
          ). URL: https://www.frontiersin.org/journals/ digital-humanities/articles/10.3389/fdigh.
          <year>2018</year>
          .
          <volume>00010</volume>
          . doi:
          <volume>10</volume>
          .3389/fdigh.
          <year>2018</year>
          .
          <volume>00010</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Neuhaus</surname>
          </string-name>
          ,
          <article-title>On the relation of irony, understatement, and litotes</article-title>
          ,
          <source>Pragmatics &amp; Cognition</source>
          <volume>23</volume>
          (
          <year>2016</year>
          )
          <fpage>117</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Burgers</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Mulken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Schellens</surname>
          </string-name>
          ,
          <article-title>Type of evaluation and marking of irony: The role of perceived complexity and comprehension</article-title>
          ,
          <source>Journal of Pragmatics</source>
          <volume>44</volume>
          (
          <year>2012</year>
          )
          <fpage>231</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>A neural approach to irony generation</article-title>
          , ArXiv abs/
          <year>1909</year>
          .06200 (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/CorpusID: 202572954.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <article-title>A unified framework for pun generation with humor principles</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>3253</fpage>
          -
          <lpage>3261</lpage>
          . URL: https://aclanthology. org/
          <year>2022</year>
          .findings-emnlp.
          <volume>237</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .findings-emnlp.
          <volume>237</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey in automatic irony processing: Linguistic, cognitive, and multi-X perspectives</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>C.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wanner</surname>
          </string-name>
          , K.-S. Choi,
          <string-name>
            <surname>P.-M. Ryu</surname>
          </string-name>
          , H.
          <string-name>
            <surname>- H. Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Donatelli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kurohashi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Paggio</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hahm</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>T. K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Santus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bond</surname>
          </string-name>
          , S.-H. Na (Eds.),
          <source>Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>824</fpage>
          -
          <lpage>836</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>69</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tater</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          ,
          <article-title>A mod- This appendix reports the hyperparameter configuraular architecture for unsupervised sarcasm gener- tion used during model fine-tuning. All experiments ation</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <article-title>were performed using LoRA. Training was conducted Proceedings of the 2019 Conference on Empirical using the transformers and peft libraries. The taMethods in Natural Language Processing and the ble below summarizes the main parameters used in the 9th International Joint Conference on Natural Lan- TrainingArguments class and in the LoRA configuraguage Processing (EMNLP-IJCNLP), Association tion</article-title>
          .
          <source>for Computational Linguistics</source>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>6144</fpage>
          -
          <lpage>6154</lpage>
          . URL: https://aclanthology.org/ Table 6
          <fpage>D19</fpage>
          -
          <lpage>1636</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1636.
          <article-title>Configuration of hyperparameters used in the LoRA-based</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Balestrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Casola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Basile, fine-tuning process. A. Mazzei, I'm sure you're a real scholar yourself: Parameter Value Exploring ironic content generation by large language models</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
          </string-name>
          , Y.-N. LoRA configuration Chen (Eds.),
          <article-title>Findings of the Association for Compu- LoRA rank () 64 tational Linguistics: EMNLP 2024, Association for LDorRopAoaultpphraobability 01</article-title>
          .61
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>14480</fpage>
          -
          <lpage>14494</lpage>
          . URL: https://aclanthology. TrainingArguments org/
          <year>2024</year>
          .findings-emnlp.
          <volume>847</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/ Number of training epochs 5
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <fpage>2A0</fpage>
          .
          <fpage>2T4</fpage>
          ..Cfiginnadrienllag,sC-.
          <source>eBmonslcpo.,8V4.P7a</source>
          .tti, et al.,
          <article-title>Twittiro: EEBnnaaatcbbhlleesfbizpfe1166pettrrraaGiinnPiinnUggfor training FTarl1usee a social media corpus with a multi-layered anno- Batch size per GPU for evaluation 1 tation for irony</article-title>
          ,
          <source>in: CEUR Workshop Proceedings, Gradient accumulation steps 1</source>
          volume
          <year>2006</year>
          , CEUR,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
          <source>Maximum gradient norm 0</source>
          .
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <article-title>TWITTIRÒ: a So- Initial learning rate 2e−4 cial Media Corpus with a Multi-layered Annotation Weight decay 0</article-title>
          .001 for Irony,
          <year>2017</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          . doi:
          <volume>10</volume>
          .4000/books. Optimizer adamw_
          <source>torch aaccademia.2382. Learning rate schedule cosine</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sezerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uva</surname>
          </string-name>
          ,
          <source>Warmup ratio 0</source>
          .03
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pedrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rubagotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , D. Bernardi,
          <article-title>MultiPICo: Multilingual perspectivist irony corpus</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>16008</fpage>
          -
          <lpage>16021</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>849</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>849</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          ,
          <year>2021</year>
          . URL: https: //arxiv.org/abs/2106.09685. arXiv:
          <volume>2106</volume>
          .
          <fpage>09685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>A coeficient of agreement for nominal scales</article-title>
          ,
          <source>Educational and Psychological Measurement</source>
          <volume>20</volume>
          (
          <year>1960</year>
          )
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          . URL: https://api.semanticscholar. org/CorpusID:15926286.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <article-title>Computing krippendorf's alphareliability</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>