<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Building a Pragmatically Annotated Diachronic Corpus: the DIADIta Project</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene De Felice</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Strik-Lievers</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università del Piemonte Orientale</institution>
          ,
          <addr-line>Via Galileo Ferraris 116, Vercelli</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Genova</institution>
          ,
          <addr-line>Piazza Santa Sabina 1, Genova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present here the first stages of the construction of the DIADIta corpus, a diachronic corpus of Italian annotated for interactional pragmatic phenomena. This corpus aims to fill a gap in the resources available for the historical pragmatics of Italian. First, we describe the annotation scheme, which is structured into four levels covering a wide range of pragmatic (or pragmatically relevant) categories: speech acts (e.g., apology; threat), forms (e.g., discourse marker; expressive), pragmatic functions (which are speaker-oriented, e.g., mitigation; turn-taking), and pragmatic aims (which are interlocutor-oriented, e.g., attention-getting; request for agreement). We then discuss how the results of an initial annotation exercise provide insights for refining the annotation procedure.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;diachronic corpus pragmatics</kwd>
        <kwd>historical pragmatics</kwd>
        <kwd>interaction</kwd>
        <kwd>Italian</kwd>
        <kwd>pragmatic annotation1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The DIADIta project1, situated within the framework of
historical pragmatics [1], aims to investigate the specific
pragmatic features and strategies of dialogic interaction
in different phases of the Italian language, and to
understand how these features and strategies interrelate
with one another and change over time. Although the
last fifteen years have witnessed a growing interest in
the historical pragmatics of Italian [2], there is still a lack
of an in-depth study on this topic, one that is able to fully
account for how different communicative strategies and
different linguistic categories (primarily, but not
exclusively, pragmatic) interact with each other, both in
synchronic and diachronic perspective. The DIADIta
project aims to address this gap.</p>
      <p>
        A key goal of the project is to build a diachronic
corpus annotated for a wide range of pragmatically
relevant linguistic phenomena. The DIADIta corpus,
which will contribute to the recently established field of
diachronic corpus pragmatics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], will consist of at least
24 Italian literary texts of different genres dating from
the 13th to the 20th century: in most cases, plays, novels
and short stories where dialogic interactions between
characters are particularly frequent. Once completed,
the corpus will be freely accessible and searchable from
the project website (www.diadita.it) and will be possibly
further expanded and enriched with other texts of
different literary genres.
      </p>
      <p>In this paper, we present the first steps we have
taken to lay the foundation for the DIADIta corpus.
After a brief review of related literature and resources
(Section 2), we describe the structure of the annotation
scheme, outlining the theoretical and methodological
assumptions that underlie it and highlighting its most
innovative aspects (Section 3). Then, we present the
results of an annotation exercise on a play by Luigi
Pirandello, with which we tested the reliability of the
scheme. In the light of these results, we also briefly
discuss some improvements that we plan to apply in the
next stages of the corpus annotation process (Section 4).
The last section draws the conclusions of the study
(Section 5).</p>
      <p>1 PRIN 2022 project Dialogic interaction in diachrony: a pragmatic
history of the Italian language - DIADIta (2023-2025), national P.I.
Maria Napoli (Università del Piemonte Orientale), P.I. for the
University of Genova Chiara Fedriani. The paper was conceived by
the two authors together. For academic reasons only, the scientific
responsibility is attributed as follows: Sections 2, 3.2, 3.3, 4 to Irene
De Felice; Sections 1, 3, 3.1, 5 to Francesca Strik-Lievers.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Pragmatically annotated (diachronic) corpora: challenges and resources</title>
      <p>
        Most existing corpora are not well suited for research
focused on pragmatics, unless one adopts a
form-tofunction approach, which implies searching for specific
keywords or linguistic structures that are known or
supposed to express pragmatic functions (e.g. discourse
markers, specific verb forms and syntactic structures,
etc.; see [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]). Such an approach is not viable in the field
of diachronic pragmatics: in this case, a
function-toform approach must usually be adopted, since certain
pragmatic functions remain stable over time, while the
linguistic means by which speakers express them may
vary [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. The problem is, of course, that “functions
cannot be searched for automatically” [8, p. 5].
      </p>
      <p>
        Corpora annotated with pragmatic information that
allow for searches based on a function-to-form approach
are rare, partly due to the difficulties arising in their
construction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. First of all, the annotation of pragmatic
categories requires a great deal of interpretation on the
part of the annotator. Moreover, this type of annotation,
“unlike, for example, POS (part-of-speech) or semantic
tagging/annotation, almost always needs to take into
account levels above the individual word and may even
need to refer to contextual information beyond those
textual units that are commonly referred to as a
‘sentence’ or ‘utterance’” [10, p. 84]. Therefore, due to
its inherent difficulties, the annotation of pragmatic
categories is still mostly a manual, time-consuming task
and “it is doubtful whether the process of manual
classification will ever be fully replaced” [8, p. 15].
Nevertheless, some attempts have been made to design
annotation schemes that allow for (semi-)automatic
annotation of specific pragmatic categories. In
particular, most efforts have focused on speech acts.
Consider, for instance, the Speech Act Annotated Corpus
project (SPAAC; [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) and the Dialogue Annotation and
Research Tool (DART; [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]; for a discussion of widely
known models and tools for speech act or dialogue act
annotation, including the DAMSL and the
SWBDDAMSL models, see [
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ], and more recently [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]).
The international standard DiAML (Dialogue Act
Markup Language, ISO 24617-2; see [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) also concerns
speech acts found in dialogue. In this annotation
scheme, a given dialogue segment may express multiple
acts, and a given act may be assigned multiple
communicative functions: a feature that is also crucial in
our annotation scheme (see Section 3.1).
      </p>
      <p>
        Corpora annotated with pragmatic categories for
English include, among others, the SPICE-Ireland Corpus,
which is derived from the spoken data of the
International Corpus of English: Ireland Component
(ICEIreland) and provides information on the speech act
function of utterances, discourse markers, and
quotatives. The Sociopragmatic Corpus (SPC) is a
subsection of the Corpus of English Dialogues (CED) and
comprises drama and trial proceedings dating from 1640
to 1760. This historical corpus can be used to investigate
the extent to which the role of the participants affects
the realization of pragmatic functions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], since gender,
status/social rank, role, and age are annotated for each
participant.
      </p>
      <p>
        For the Italian language, there are numerous corpora
that collect texts from historical varieties of Italian (e.g.
DiaCORIS – Corpus of Diachronic Written Italian; CEOD
– Digital Nineteenth-Century Epistolary Corpus), some of
which also provide morphological information (e.g.
MIDIA – Morphology of Italian in Diachrony). There are
also corpora designed to enable or facilitate pragmatic
analysis. For example, the LABLITA corpus [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ],
developed within the pragmatic framework of the
Language into Act Theory (L-AcT), brings together in a
single resource a collection of three spoken Italian
corpora recorded in Tuscany since 1965. One of the most
innovative aspects of the corpus is that the transcripts
are aligned with the acoustic source via utterance, i.e.,
“the linguistic counterpart of a speech act” [17, p. 93].
Linguistic implicatures (presuppositions, implicatures,
topicalizations, and vagueness) are annotated in the
IMPAQTS corpus, which collects Italian political
discourses since 1946 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Although this is a brief and non-exhaustive
overview of the resources in this field, the few examples
provided are sufficient to demonstrate that, overall, it is
still true what Archer and colleagues wrote in 2008, that
is, that “[w]ork in the area of pragmatics and corpus
annotation is much less advanced than other annotation
work (grammatical annotation schemes, for example)”
[19, p. 613]. Furthermore, to the best of our knowledge,
a diachronic corpus annotated with a rich set of
pragmatic features is currently lacking among the
corpora developed for Italian, and we find no
equivalents among the corpora developed for other
languages either. Most notably, there is no resource
capable of accounting for both the linguistic means that
express different pragmatic functions in various
historical varieties of a language, and the ways in which
these linguistic categories interact with one another in
both a synchronic and diachronic dimension. This led to
the design and construction of the DIADIta corpus.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Annotation scheme</title>
      <p>The annotation scheme created within the DIADIta
project is designed to cover a wide range of
pragmatically relevant phenomena, especially those
with a clear interactional value. Given that no existing
tagset fully met the project’s needs to encompass a broad
spectrum of linguistic—and particularly pragmatic—
phenomena, the annotation scheme has been developed
by drawing from a number of categories whose
relevance is well established in pragmatic studies, such
as POLITENESS, DISCOURSE MARKERS, further enriched with
other linguistic categories that proved to have
significant implications on the pragmatic front, such as
EPISTEMICITY and EVIDENTIALITY.</p>
      <p>So far, the scheme is organized into four levels of
annotation (for a detailed description of the individual
tags, please refer to the DIADIta annotation guidelines
available on the project’s website):
• Forms: This level includes linguistic
expressions (belonging to different parts of
speech, and with variable extension) that have
an interactional pragmatic value, and in
particular: DISCOURSE MARKERS (e.g., Senti, io me
ne vado, ‘Listen, I’m leaving’), EXPRESSIVES (e.g.,
Smettila, idiota!, ‘Stop it, you idiot!’) and
REPETITION, when it has a pragmatic value (e.g.,
Lo giuro, lo giuro!, ‘I swear it, I swear it!’, where
the repetition intensifies the oath).
• Pragmatic functions: This level includes a set
of categories that have (also, or exclusively) a
pragmatic value, such as: POLITENESS,
VAGUENESS, DISAGREEMENT, IMPOLITENESS,
INTENSIFICATION, EPISTEMICITY, TURN-TAKING.
• Pragmatic aims: This level focuses on the
reaction that the speaker intends to provoke in
the interlocutors, for example attracting their
attention (ATTENTION GETTING) or requesting
their confirmation or manifestation of
agreement (REQUEST FOR
CONFIRMATION/AGREEMENT)2.
• Speech acts: This level includes the main
types of expressive (e.g., DERISION, PROTEST),
directive (e.g., ORDER, REQUEST), commissive
(e.g., COMMITMENT/PROMISE, THREAT), and
assertive (e.g., ASSERTION, CORRECTION) speech
acts.</p>
      <p>Each of the four levels includes several tags (N=57), as
summarized in Appendix A.</p>
      <sec id="sec-3-1">
        <title>3.1. Interaction between categories</title>
        <p>As illustrated by examples from Luigi Pirandello’s play
Enrico IV (1921), the same string of text can be annotated
with multiple tags, either from the same level (ex. 1) or
from a different level (ex. 2). Furthermore, a string of text
tagged with a certain tag can contain a smaller string
2 To avoid overburdening the tagset, we have chosen to merge
certain categories that, despite being well-defined on a theoretical
level, are often difficult to distinguish in practice from other closely
related functional categories, such as REQUEST FOR CONFIRMATION and
REQUEST FOR AGREEMENT.
tagged with a different tag, either from the same level
(ex. 3) or from a different level (ex. 4):
1.
2.
3.
4.</p>
        <sec id="sec-3-1-1">
          <title>Di Nolli: Lasciamo andare, lasciamo andare, vi</title>
          <p>prego.</p>
          <p>Di Nolli: ‘Let it go, let it go, I beg you.’
D. Matilde: […] Non ti vedi in me, tu, là?
Frida: Mah! Io, veramente...</p>
          <p>D. Matilde: ‘[...] Don’t you see yourself in me,
there? ‘
Frida: ‘Well! I, actually...’
Bertoldo: […] Ho detto bene: non era vestiario,
questo, del mille e cinquecento!
Arialdo: Ma che mille e cinquecento!
Bertoldo: ‘[...] I said it right: this wasn’t
clothing from the fifteen hundreds!’
Arialdo: ‘What fifteen hundreds!’
Bertoldo (arrabbiandosi): Ma me lo potevano
dire, per Dio santo, che si trattava di quello di
Germania e non d'Enrico IV di Francia!
Bertoldo (getting angry): ‘But they could have
told me, for God’s sake, that it was about the
one from Germany and not Henry IV of
France!’
In ex. 1, vi prego ‘I beg you’ is labeled with two tags from
the pragmatic functions level: it has both a POLITENESS
function and an INTENSIFICATION function (it intensifies
the force of the directive act expressed by the whole
utterance).</p>
          <p>In ex. 2, Mah! ‘Well!’ is tagged as a DISCOURSE MARKER
(forms level) but is also considered an expression of
EPISTEMICITY and DISAGREEMENT (functions level). By
using this interjection, the character Frida expresses a
low degree of certainty regarding the truth of Donna
Matilde’s statement, thus also demonstrating that she
does not fully agree with her.</p>
          <p>In ex. 3, the entire utterance by Arialdo, who mocks
Bertoldo in front of his friends (speech act of DERISION),
is labeled at the level of pragmatic functions as a
manifestation of DISAGREEMENT and IMPOLITENESS.
However, it also contains the DISCOURSE MARKER ma che
‘what,’ which is also labeled – again at the pragmatic
functions level - as a TURN-TAKING marker.</p>
          <p>In ex. 4, the whole utterance by Bertoldo is labeled
as a PROTEST (speech acts level). Within this utterance,
ma ‘but’ is labeled as a DISCOURSE MARKER (forms level)
and as a TURN-TAKING marker (pragmatic functions
level), and per Dio Santo ‘for God’s sake’ is labeled with
the tags EXPRESSIVE (forms level) and INTENSIFICATION
(pragmatic functions level), since it is used to strengthen
the illocutionary force of the act itself.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Annotation tool</title>
        <p>As shown in Section 3.1, allowing overlapping
annotations from the same and different levels is
essential to capture the multifunctionality of
pragmatically relevant expressions and the interaction
between linguistic and pragmatic categories. For
instance, Mah! ‘Well!’ serves as a DISCOURSE MARKER that
expresses DISAGREEMENT while also conveying
EPISTEMICITY, in ex. 2 discussed above. Moreover, having
multiple annotators work on the same text is necessary
for identifying and discussing cases of disagreement,
especially in the early stages of the project.</p>
        <p>
          For collaborative projects of this type, a web-based
tool is the most suitable instrument [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. For this first
annotation exercise, we chose INCEpTION [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], which
allows the creation and easy modification of a tagset (in
our case multiple tagsets, one for each annotation level)
and the overlapping and nesting of different tags. The
annotation performed on INCEpTION is of the standoff
type: the texts are therefore not modified, and the
annotations are stored in a separate document (see
Finlayson &amp; Erjavec [22, p. 178], who consider standoff
annotation a best practice, compared to inline
annotation).
        </p>
        <p>As an example, Figure 1 presents a screenshot of an
annotation, again on the play Enrico IV. A stratification
of annotations can be observed, with the entire
utterance Senti: io non ho mai capito perché si laureino in
medicina! (‘Listen: I have never understood why they
graduate in medicine!’) labeled as an EXCLAMATION
speech act, senti ‘listen’ as a DISCOURSE MARKER with the
pragmatic function of TURN-TAKING and the pragmatic
aim of ATTENTION-GETTING.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Annotation guidelines</title>
        <p>As the annotation scheme and the few examples
provided in Section 3.1 clearly demonstrate, the
annotation of the DIADIta corpus is extremely complex.
Indeed, Weisser [10, p. 84] observes, “[a]ny type of
linguistic annotation is a highly complex and
interpretive process, but none more so than pragmatic
annotation”. Therefore, it is essential to have a
meticulously detailed annotation manual to guide
annotators.</p>
        <p>The first text tested for the pragmatic annotation of
the categories initially selected for our project is the first
act of Pirandello's Enrico IV (9,216 words). We began by
independently annotating the text and subsequently
discussed our work until a consensus was reached on
each annotation.</p>
        <p>The total number of annotations for the first act is
958. This very first phase of the annotation process has
been crucial for refining the tagset, which is now in the
form shown in Appendix A, and for developing
guidelines with practical instructions for annotation.
The current version of the DIADIta annotation
guidelines is available on the project’s website. The
guidelines provide a brief definition for each annotation
level and tag, along with basic references and examples
from the annotated texts in the corpus. They also specify
constraints for applying certain tags. For example, the
tag EXPRESSIVE (forms level) is used to annotate lexical
elements such as exclamations, vulgarisms, insults, or
curses that express “subjective sensations, emotions,
affections, evaluations or attitudes” [23, p. 33]. However,
it is also specified that this tag should only be applied
when it co-occurs with one or more tags from the
pragmatic functions or pragmatic aims levels; i.e, only in
contexts where expressive forms are relevant at a
pragmatic, interactional level. Consider examples 5 and
6:
5.
6.</p>
        <p>Secondo valletto: Eh, santo Dio, potevate
dircelo!
Second valet: ‘Oh, holy God, you could have
told us!’
Frida: Fa di professione lo scemo, non lo sa?
Frida: ‘He acts the fool professionally, don’t
you know?’
In ex. 5, santo Dio ‘holy God’ is tagged as EXPRESSIVE
because it also has an INTENSIFICATION function, as it
intensifies the expressive force of a PROTEST speech act.
In contrast, in ex. 6, scemo ‘fool’, despite being an
expressive used in a DERISION speech act, is not tagged
because it does not seem to serve primarily a specific
pragmatic function or aim in the interaction.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussion</title>
      <p>To test the reliability of the adopted scheme, we
annotated the second act of Pirandello’s Enrico IV (6,968
tokens) in INCEpTION. This annotation process
benefited from our previous joint annotation experience
on the first act of the same play and, most importantly,
relied on the established annotation guidelines. The
annotation performed separately by the two authors
resulted in 818 and 906 annotations, respectively, for a
total of 1,724 annotations.</p>
      <p>
        To test the inter-annotator agreement we adopted
Krippendorff’s α metric [
        <xref ref-type="bibr" rid="ref24 ref25 ref26 ref27">24, 25, 26, 27</xref>
        ], a unitizing
measure that is particularly suitable for assessing the
level of agreement in our case, because it can produce
partial agreement scores from all annotations by also
taking into account their partial overlaps. For instance,
for eh sì (‘oh, yes’), one annotator assigned the tag
AGREEMENT (pragmatic functions) to the entire
expression, while the other annotator assigned the same
tag only to sì. This kind of annotation is considered
incomplete, but is still used to compute the agreement.
The agreement score is, of course, lower in such cases
compared to complete annotations, where the same tag
is assigned to the same length of spans by both
annotators. Table 1 presents the agreement scores and
the number of annotations for each of the four layers of
our annotation scheme3.
According to Landis and Koch’s [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] scale, our levels of
agreement should be considered as slight for the
pragmatic aims level, fair for the functions level,
moderate for the speech acts level, and substantial for the
forms level.
      </p>
      <p>
        These results clearly demonstrate that, even though
the annotation was performed by expert annotators
following detailed guidelines, pragmatic annotation
remains a highly complex and fine-grained task,
especially when annotators have to assign many labels,
and often multiple labels to the same token(s). In many
cases, to understand the pragmatic function of a
linguistic unit, the annotator must go well beyond the
level of the single word, phrase or sentence, and
necessarily consider the linguistic co-text, or even the
extralinguistic context, as far as it can be reconstructed
from a written text. Therefore, in this specific field of
annotation, reaching an α value higher than 0.67, which
is sometimes considered essential to draw at least
“tentative conclusions” [24, p. 241] in other
computational linguistic tasks, may be exceptionally
3 The inter-annotator agreement is calculated with INCEpTION
33.3-SNAPSHOT (b5644aca).
7.
challenging, even for expert annotators. Other complex
pragmatic annotation models created for discourse
annotation tasks have also failed to achieve high levels
of agreement. For instance, slight to moderate values of
agreement produced by the α metric are also reported by
Duran et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] for the Conversation Analysis Modeling
Schema - CAMS (cf. also Castagneto [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], who reports
moderate agreement values for the Chiba and DAMSL
annotation models).
      </p>
      <p>
        Therefore, a low level of agreement was to be
expected and, from our point of view, this should not
necessarily be understood as an indication of low
annotation quality, inadequate training, or poorly
defined guidelines [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], since when there are two
partially or completely disagreeing annotations, it is not
always the case that one is correct and the other wrong.
In many cases both can be acceptable, as in example 7,
in which Matilde’s reaction to the doctor’s question was
considered by one annotator as an EXCLAMATION, and by
the other as a RESPONSE to his request for information:
Dottore (stordito): Come dice?
D. Matilde: Quest’automobile, dottore! Sono
più di tre ore e mezzo!
Doctor (stunned): ‘What did you say?’
D. Matilde: ‘This car, doctor! It’s been over
three and a half hours!’
Discrepancies may also stem from differences in
annotated span lengths, even when the same tag is
chosen. For instance, in example 8, one annotator
marked AGREEMENT for the entire statement by Belcredi
(Sì, forse, quando disse…), while the other one marked
AGREEMENT only for sì ‘yes’.
      </p>
      <sec id="sec-4-1">
        <title>D. Matilde: Non è vero! – Di me! Parlava di me!</title>
        <p>Belcredi: Sì, forse, quando disse…
D. Matilde: Dei miei capelli tinti!
D. Matile: ‘That’s not true! Me! He was talking
about me!’
Belcredi: ‘Yes, maybe, when he said…’</p>
        <p>D. Matilde: ‘About my dyed hair!’
The analysis of cases of disagreement has been also
useful in order to revise certain aspects of the tagset. For
instance, after this exercise we have decided to merge
the COMMITMENT/PROMISE speech act with OATH in future
annotations, given that in many cases it is very difficult
to distinguish between them. It has also been useful to
identify unclear points in the guidelines, and to better
plan the next phases of the project. In particular, we
intend to: (i) release an updated version of the guidelines
with clearer descriptions of some aspects of the
annotation process; (ii) ensure that each text in the
corpus is annotated or revised by at least two expert
annotators; and (iii) include validation tasks at a regular
rate in the project workflow to revise annotations for
small groups of texts in order to reach better intra- and
inter-text consistency.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This paper has outlined the initial steps in creating the
DIADIta corpus, a pragmatically annotated diachronic
corpus for Italian. This corpus is characterized by its
rich, multi-layered annotation scheme organized into
four dimensions: forms, pragmatic functions, pragmatic
aims, speech acts. This structure allows for nuanced
analysis of pragmatic strategies in literary texts from the
13th to the 20th century. The innovative approach of
annotating complex interactional features highlights the
value of this corpus as an unparalleled tool for
examining the evolution of pragmatic functions and
forms over time, enabling detailed and
multidimensional analysis of text data.</p>
      <p>We have also detailed an annotation exercise on a
play by Pirandello that illustrates the task’s complexity
(reflected in the low level of agreement in some layers),
but also the richness of the annotations. This first
exercise is crucial for refining the annotation process
and improving clarity and reliability in applying a
pragmatic annotation model to historical texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>Funded by the European Union - Next Generation EU
(Mission 4, Component 1, CUP D53D23009600006)
within the PRIN 2022 project Dialogic interaction in
diachrony: a pragmatic history of the Italian language –
DIADIta (2023-2025; P.I. Maria Napoli, Università del
Piemonte Orientale). We thank Maria Napoli and Chiara
Fedriani for their useful suggestions, together with the
other members of our research group, Luisa Brucale,
Ludovica Maconi and Giada Parodi, for the valuable
moments of constructive discussion we have had. We
also owe a special thanks to Richard Eckart de Castilho
for his generous assistance with INCEpTION.</p>
    </sec>
    <sec id="sec-7">
      <title>Appendix A</title>
      <p>The DIADIta annotation scheme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Pragmatic Developments in the History of English</source>
          , John Benjamins, Amsterdam/Philadelphia,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Alfieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Alfonzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Motta</surname>
          </string-name>
          , R. Sardo (Eds.),
          <article-title>Pragmatica storica dell'italiano. Modelli e usi comunicativi del passato</article-title>
          , Cesati, Firenze,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Taavitsainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Jucker</surname>
          </string-name>
          , J. Tuominen (Eds.),
          <source>Diachronic corpus pragmatics</source>
          , John Benjamins, Amsterdam,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Culpeper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kytö</surname>
          </string-name>
          ,
          <article-title>Early Modern English dialogues: Spoken interaction as writing</article-title>
          .
          <source>Studies in English Language</source>
          , Cambridge University Press, Cambridge,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>U.</given-names>
            <surname>Lutzky</surname>
          </string-name>
          , Discourse markers in Early Modern English, John Benjamins, Amsterdam,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Jucker</surname>
          </string-name>
          , History of English and English Historical Linguistics, Ernst Klett, Stuttgart,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Jucker</surname>
          </string-name>
          ,
          <article-title>Corpus pragmatics</article-title>
          , in: J.
          <string-name>
            <surname>-O. Östman</surname>
          </string-name>
          , J. Verschueren (Eds.), Handbook of Pragmatics, Benjamins, Amsterdam/Philadelphia,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Landert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dayter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Messerli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Locher</surname>
          </string-name>
          , Corpus Pragmatics, Cambridge University Press, Cambridge,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rühlemann</surname>
          </string-name>
          ,
          <article-title>What can a corpus tell us about pragmatics</article-title>
          , in: A.
          <string-name>
            <surname>O'Keeffe</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. McCarthy</surname>
          </string-name>
          (Eds.),
          <source>The Routledge Handbook of Corpus Linguistics</source>
          , Routledge, New York,
          <year>2022</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weisser</surname>
          </string-name>
          ,
          <article-title>Speech act annotation</article-title>
          , in: K. Aijmer,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Rühlemann (Eds.),
          <source>Corpus Pragmatics. A Handbook</source>
          , Cambridge University Press, Cambridge,
          <year>2015</year>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>114</lpage>
          . doi:
          <volume>10</volume>
          .1017/cbo9781139057493.
          <fpage>005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Leech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weisser</surname>
          </string-name>
          ,
          <article-title>Generic speech act annotation for task-oriented dialogues</article-title>
          , in: D.
          <string-name>
            <surname>Archer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rayson</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Wilson, T.
          <source>McEnery (Eds.)</source>
          ,
          <source>Proceedings of the Corpus Linguistics</source>
          <year>2003</year>
          Conference. Lancaster University: UCREL Technical Papers vol.
          <volume>16</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weisser</surname>
          </string-name>
          , How to Do
          <source>Corpus Pragmatics on Pragmatically Annotated Data: Speech Acts and Beyond</source>
          , John Benjamins, Amsterdam/Philadelphia,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weisser</surname>
          </string-name>
          ,
          <article-title>Speech acts in corpus pragmatics: Making the case for an extended taxonomy</article-title>
          ,
          <source>International Journal of Corpus Linguistics</source>
          <volume>25</volume>
          (
          <issue>4</issue>
          ) (
          <year>2020</year>
          )
          <fpage>400</fpage>
          -
          <lpage>425</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Castagneto</surname>
          </string-name>
          ,
          <article-title>Il sistema di annotazione Pra.Ti.D tra gli altri sistemi di annotazione pragmatica. Le ragioni di un nuovo schema</article-title>
          ,
          <source>AIΩN. Annali del Dipartimento di Studi Letterari</source>
          , Linguistici e Comparati.
          <source>Sezione Linguistica</source>
          <volume>1</volume>
          (
          <year>2012</year>
          )
          <fpage>105</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mezza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cervone</surname>
          </string-name>
          , E. Stepanov, G. Tortoreto,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Riccardi, ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Santa Fe,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3539</fpage>
          -
          <lpage>3551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petukhova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Traum</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Alexandersson Dialogue act annotation with the ISO 24617-2 standard</article-title>
          , in: D.
          <string-name>
            <surname>Dahl</surname>
          </string-name>
          (Ed.),
          <source>Multimodal interaction with W3C standards</source>
          , Springer, Cham,
          <year>2017</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cresti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gregori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moneglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nicolás</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panunzi</surname>
          </string-name>
          ,
          <article-title>The LABLITA Speech Resources</article-title>
          . in E. Cresti, M. Moneglia (Eds.), Corpora e Studi Linguistici.
          <article-title>Atti del LIV Congresso Internazionale di Studi della Società di Linguistica Italiana</article-title>
          , Milano, Officinaventuno,
          <year>2022</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cominetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gregori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Lombardi</given-names>
            <surname>Vallauri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panunzi</surname>
          </string-name>
          ,
          <article-title>IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies</article-title>
          , in: D.
          <string-name>
            <surname>Fišer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Eskevich</surname>
          </string-name>
          , D. Bordon (Eds.),
          <source>Proceedings of the IV Workshop on Creating</source>
          , Analysing, and
          <article-title>Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN), Torino, ELRA</article-title>
          and ICCL,
          <year>2024</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Archer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Culpeper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <article-title>Pragmatic annotation</article-title>
          , in: A.
          <string-name>
            <surname>Lüdeling</surname>
          </string-name>
          , M. Kytö (Eds.), Corpus Linguistics: An International Handbook, de Gruyter, Berlin,
          <year>2008</year>
          , pp.
          <fpage>613</fpage>
          -
          <lpage>642</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , R. Eckart de Castilho, I. Gurevych,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <article-title>Collaborative WebBased Tools for Multi-layer Text Annotation</article-title>
          , in: N.
          <string-name>
            <surname>Ide</surname>
          </string-name>
          , J. Pustejovsky (Eds.),
          <source>Handbook of Linguistic Annotation</source>
          , Springer, Dordrecht,
          <year>2017</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>256</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -94-024-0881-
          <issue>2</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>J.-C. Klie</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bugert</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Boullosa</surname>
            , R. Eckart de Castilho,
            <given-names>I. Gurevych</given-names>
          </string-name>
          , The INCEpTION Platform:
          <article-title>Machine-Assisted and Knowledge-Oriented Interactive Annotation</article-title>
          ,
          <source>in: Proceedings of System Demonstrations of the 27th International Conference on Computational Linguistics (COLING)</source>
          ,
          <source>Santa Fe</source>
          , New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Finlayson</surname>
          </string-name>
          , T. Erjavec, Overview of Annotation Creation:
          <article-title>Processes and Tools</article-title>
          , in: N.
          <string-name>
            <surname>Ide</surname>
          </string-name>
          , J. Pustejovsky (Eds.),
          <source>Handbook of Linguistic Annotation</source>
          , Springer, Dordrecht,
          <year>2017</year>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>191</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -94-024-0881-
          <issue>2</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Löbner</surname>
          </string-name>
          , Understanding Semantics, Routledge, New York,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorff</surname>
          </string-name>
          ,
          <article-title>Content analysis: An introduction to its methodology</article-title>
          , Sage, Thousand Oaks,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorff</surname>
          </string-name>
          ,
          <article-title>Agreement and information in the reliability of coding</article-title>
          ,
          <source>Communication Methods and Measures</source>
          <volume>5</volume>
          (
          <year>2011</year>
          )
          <fpage>93</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Mistakes and how to avoid mistakes in using intercoder reliability indices</article-title>
          ,
          <source>Methodology: European Journal of Research Methods for the Behavioral and Social Sciences</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
          <fpage>13</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>N.</given-names>
            <surname>Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Battle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Inter-annotator Agreement Using the Conversation Analysis Modelling Schema, for Dialogue</article-title>
          ,
          <source>Communication Methods and Measures</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ) (
          <year>2022</year>
          )
          <fpage>182</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Landis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. G. Koch,</surname>
          </string-name>
          <article-title>The measurement of observer agreement for categorical data</article-title>
          ,
          <source>Biometrics</source>
          <volume>33</volume>
          (
          <issue>1</issue>
          ) (
          <year>1977</year>
          )
          <fpage>159</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          ,
          <article-title>Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation</article-title>
          ,
          <source>AI</source>
          Magazine
          <volume>36</volume>
          (
          <year>2015</year>
          )
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>