=Paper=
{{Paper
|id=Vol-3878/32_main_long
|storemode=property
|title=Building CorefLat. a Linguistic Resource for Coreference and Anaphora Resolution in Latin
|pdfUrl=https://ceur-ws.org/Vol-3878/32_main_long.pdf
|volume=Vol-3878
|authors=Eleonora Delfino,Roberta Leotta,Marco Passarotti,Giovanni Moretti
|dblpUrl=https://dblp.org/rec/conf/clic-it/DelfinoLPM24
}}
==Building CorefLat. a Linguistic Resource for Coreference and Anaphora Resolution in Latin==
Building CorefLat
A linguistic resource for coreference and anaphora
resolution in Latin
Eleonora Delfino1,*,† , Roberta G. Leotta2,† , Marco Passarotti2,† and Giovanni Moretti2,†
1
Università di Udine, Via Palladio 8, 33100 Udine, Italy
2
CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Largo Gemelli 1, 20123 Milano, Italy
Abstract
This paper presents the initial stages of a project focused on coreference and anaphora resolution in Latin texts. By building a
corpus enhanced with coreference/anaphora annotation, the project wants to explore empirically a layer of metalinguistic
analysis that has not been yet extensively investigated in linguistic resources and natural language processing for Latin. After
reviewing the related work on this NLP task, the paper discusses annotation criteria and data analysis, providing examples
about a few issues that emerged during the annotation process.
Keywords
Latin, Coreference, Anaphora, Annotation, Corpora
1. Introduction example, investigating in Latin texts a philosophical con-
cept conveyed by a word, like voluntas ‘will’, or studying
Over the past decade, research on linguistic resources the turns of a certain character in a drama would highly
and natural language processing (NLP) for Latin has benefit from a textual resource where, for instance, the
seen remarkable growth1 . However an important layer ana-/cataphoric references of pronouns are resolved.
of metalinguistic annotation such as coreference and The PRIN 2022 project Textual Data and Tools for
anaphora resolution still remains quite neglected. In- Coreference Resolution of Latin was granted funding to
deed, except for the (meta)data produced by the FIR-2013 overcome such situation. Run jointly by the Univer-
project Development and Integration of Advanced Lin- sità Cattolica of Milan and the University of Udine, the
guistic Resources for Latin [2], there are neither corpora project stems from the FIR-2013 pilot experience, having
enhanced with coreferential/anaphoric annotations nor the short-term objective of developing a large-scale and
NLP tools for automatic coreference/anaphora resolution balanced dataset of Latin texts enhanced with corefer-
for Latin. This absence limits the degree of granularity of ence/anaphora resolution (called CorefLat). Based upon
information extraction from Latin corpora. Such a limita- this annotated dataset, the project has two long-term
tion is particularly compelling, as Latin texts are mainly objectives.
used for purposes of research in the Humanities, like The first aims to develop and evaluate a set of trained
literary, stylistic and philosophical analysis. To give an models for automatic coreference/anaphora resolution
of Latin.
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, The second long-term objective wants to publish the
Dec 04 — 06, 2024, Pisa, Italy metadata pertaining to coreference/anaphora resolution
*
Corresponding author.
† as Linked Data, to make them interoperable with other
These authors contributed equally.
$ eleonora.delfino@uniud.it (E. Delfino);
(meta)data in the Web. To this aim, the texts of the anno-
robertagrazia.leotta@unicatt.it (R. G. Leotta); tated dataset are selected among those published in the
marco.passarotti@unicatt.it (M. Passarotti); LiLa Knowledge Base, a collection of multiple linguistic
giovanni.moretti@unicatt.it (G. Moretti) resources for Latin modelled using the same vocabularies
https://docenti.unicatt.it/ppd2/en/docenti/102059/ for knowledge description and interconnected according
roberta-grazia-leotta/profilo (R. G. Leotta); https://docenti.unicatt.
it/ppd2/it/docenti/14144/marco-carlo-passarotti/profilo
to the principles of the Linked Data paradigm [3]2 .
(M. Passarotti) This paper details the initial stages of the creation of
0009-0002-5947-5011 (E. Delfino); 0009-0004-5631-1032 the CorefLat annotated dataset.
(R. G. Leotta); 0000-0002-9806-7187 (M. Passarotti);
0000-0001-7188-8172 (G. Moretti)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
1
For an overview of the available linguistic resources for Latin, see
[1]. As for NLP tools, see the three editions of the evaluation
campaign EvaLatin (last edition: https://circse.github.io/LT4HALA/
2
2024/EvaLatin). https://lila-erc.eu
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
2. Related Work also confirmed by those selected for the CoNLL shared
task on modeling unrestricted coreference in OntoNotes
Coreference (henceforth CR) and anaphora (henceforth [9, 10], as well as by the NXT-format Switchboard Cor-
AR) resolution are often treated as a single, yet diverse, pus [11]. In addition, some treebanks feature CR/AR,
task in NLP. To understand the difference between CR encompassing a wide range of languages, including En-
and AR, it is necessary to distinguish between the con- glish and Czech [12], German [13], Japanese [14], Italian
cept of “mention” and that of “entity”. A mention is [15], Spanish and Catalan [16]. To the best of our knowl-
defined as an instance of reference to an object, while edge, there is no specific Latin corpus enriched with
an entity is the object to which a mention refers in a CR/AR. The only currently available texts that include
text. CR consists in finding in a text all mentions of this layer of annotation come from Latin treebanks. The
(strictly speaking, real-world) entities such as persons or FIR-2013 project mentioned above built a CR-annotated
organisations, regardless of their textual representation. dataset including works by Sallust, Caesar and Cicero
Instead, in AR the interpretation of a mention (known as (taken from the Latin Dependency Treebank [17]), and
“anaphora” or “cataphora”, e.g., a pronoun) depends on by Thomas Aquinas (from the Index Thomisticus Tree-
another mention present in the text, whether antecedent bank [18]). However, the selection of texts in this dataset
or following in the word order. If both mentions refer to is quite unbalanced as for both literary genres and au-
the same entity, they are considered to be coreferential, thors. Out of the more than 45,000 total annotated tokens,
which makes AR and CR closely bound to each other. about 27,000 are taken from Thomas Aquinas’ Summa
Since ana-/cataforic relations are present in the text, the contra Gentiles, and more than 10,000 are from Sallust’s
need of world knowledge in AR is minimal. In contrast, In Catilinam. This given, our project wants to create a
CR has a much broader scope: co-referential terms can more balanced dataset by increasing and differentiating
have completely different grammatical properties and/or the quantity of annotated texts for both Classical and
functions (e.g., different gender and part of speech) and Late Latin.
yet, by definition, they can refer to the same entity.
In NLP, the CR task is usually not meant in a strict
sense, as it consists in finding all mentions of each entity 3. Building CorefLat
in a text regardless of their relation to the real world.
Accordingly, our project adopts this same interpretation 3.1. Annotation Criteria and Data
of the CR task [4]. Selection
Since the 1960s, coreference and anaphora resolution
has been a central topic in NLP studies, but it was con- To create a resource that adheres to the most unified and
sidered a difficult task, typically requiring the use of widely shared annotation criteria for CR/AR, the anno-
sophisticated knowledge sources and inference proce- tation style of CorefLat resembles the one developed for
dures. In 1983, Roberto Busa pointed out the absence of the GUM corpus and follows the recommendations pro-
resources and tools for pronoun coreference resolution: posed by the (ongoing) Universal Anaphora (UA) project4 ,
“[...] avete mai incontrato tavole e concordanze comput- which aims to create, gather, and distribute harmonized
erizzate nelle quali il programma automaticamente abbia resources for CR/AR.
[...] collegato i pronomi alle forme di cui sono vicari?” [5, While building CorefLat, we decided to focus on a
7.2]3 . subset of the different types of coreference and ana-
Like for other NLP tasks, during the 1990s research on /cataphora prescribed by the GUM and UA recommenda-
CR/AR gradually shifted from heuristic approaches to tions. The types that we selected are listed below:
machine learning approaches, thanks to the public avail-
• anaphoric pronouns referring back to something:
ability of annotated corpora produced for the aims of
domine qui et semper vivis (Aug. Conf. 1.6.8)
shared tasks dedicated to coreference resolution, such as
‘Lord (you) who live for ever’;
Message Understanding Conference (MUC) conferences
[7], and Automatic Content Evaluation (ACE) Program • cataphoric pronouns referring forward to some-
conferences [8]. These corpora mainly include news arti- thing: invocat te, domine (Aug. Conf. 1.1.1) ‘in-
cle and newswire texts in English. The ACE corpus also vokes you, Lord’;
features Arabic and Chinese texts from web-blogs and • content-rich lexical item - coreferring the same
telephone conversations. The tendency to focus coref- lexical mention: laudes tuae, domine, laudes tuae
erence and anaphora annotation on newspaper texts is per scripturas tuas suspenderent palmitem cordis
mei (Aug. Conf. 1.17.27) ‘Your praises, Lord, your
3
“[...] have you ever come across computerized tables and concor- praises throughout your Scriptures would have
dances in which the programme automatically [...] connects pro- supported the vine shoot of my heart’;
nouns with the nouns that they represent?”. Translation taken from
4
[6, 137-138]. https://universalanaphora.github.io/UniversalAnaphora/
• split antecedents - the referred items are more content-rich entity is concerned in this coreference
than one: an vero caelum et terra, quae fecisti relation. Moreover, it should be noted that sometimes
et in quibus me fecisti, capiunt te? (Aug. Conf. the entity is not explicitly expressed in the text. To
1.2.2) ‘heaven and earth, which you made, and address this issue, we create external entities to which
in which you made me, encompass you?’. the respective mentions are linked by tagging. For
instance, in example (3), the pronoun nos ‘we’ refers
Such a limited set of types of coreference was selected to the two lovers in Plautus’ comedy Curculio, namely
to address the fundamental aim of the two-year long the girl Planesium and the boy Phaedromus, whose
funded project, namely building and distributing a Latin names are not explicitly mentioned in the sentence for
corpus enhanced with coreferential annotation, which is economy’s sake, as the two characters are present on
not yet available for this language. stage and pronounce these lines themselves.
Texts are annotated manually by two independent anno-
tators, using the Content Annotation Tool (CAT)[19], (3) quo usque, quaeso, ad hunc modum / inter nos
formerly known as the CELCT Annotation Tool, which amore utemur semper surrupticio? (Pl. Curc. 1, 204-205)
was created specifically for textual coreference annota- ‘How much longer, please, will we always conduct our
tion. The tool is highly customizable, making it possible, love affair in secret?’
for instance, to distinguish between annotations of
mentions and those of entities. (Meta)data are saved in In such a case, we tag the mention nos as linked
XML and are then converted in CoNLL-U Plus following to the entities “Planesium” and “Phaedromus” that are
the recommendations of the UA initiative5 . created external to the text.
In CorefLat, coreferences are not annotated as chains, The annotation task is performed on a collection of
but rather as relations. In a coreference relation two Latin texts already enriched with lemmatization and Part-
elements are involved: the one referring (mention) of-Speech (PoS) tagging and linked to the LiLa Knowl-
and the one referred (entity). In our annotation, each edge Base. The following texts were chosen according to
mention points directly to the one entity it refers to, selection criteria aimed to ensure a sufficiently represen-
rather than to any previous mention of the same entity. tative and balanced corpus as for both literary genre and
Consider the example in (1). era.
(1) Magnus es, Domine, et laudabilis valde. Magna virtus • Classical Latin: data are excerpted from the Opera
tua et sapientiae tuae non est numerus. (Aug. Conf. 1.1.1) Latina corpus by LASLA7 , an extensive collection
‘Great are you, O Lord, and surpassingly worthy of of approximately 1.7 million words from over 130
praise. Great is your goodness, and your wisdom is lemmatized and morphologically tagged Classical
incalculable’6 . and Late Latin texts8 .
• Late Latin: data are taken from the text of Au-
In sentence (1), we identify two coreference rela- gustine’s Confessiones provided by The Latin Li-
tions: the first one involves the mention tua and the brary9 .
entity Domine, and the second one involves the mention
tuae and the same entity Domine. Typically, the referred At present, no annotation of Medieval Latin texts was
element is a noun, nevertheless it happens to get through performed, as data from this era are largely provided,
cases where the referred entity is represented by a albeit in unbalanced fashion, by the results of the FIR
function word, such a pronoun, like in example (2): project.
(2) nec valerem quae volebam omnia nec quibus 3.2. Results
volebam omnibus. (Aug. Conf. 1.8.13)
‘I was incapable of achieving all that I wanted, and by So far, we annotated the following excerpts: the first book
all that I wanted.’ from Augustine’s Confessiones, a philosophical prose text,
and a comedy of Plautus: Curculio. The workload was
In (2), the relative pronoun quae refers to the quantifying split equally between the two annotators; however, the
pronoun omnia, like quibus refers to omnibus in the last 50 sentences of the first book of Augustine’s Confes-
reminder of the sentence. Since omnis ‘all’ (lemma siones were annotated by both annotators to measure
of both omnia and omnibus) is a function word, no 7
https://lasladb.uliege.be/OperaLatina/
8
The Opera Latina corpus in the LiLa Knowledge Base is available at
5
https://github.com/UniversalAnaphora/UniversalAnaphora/blob/ https://lila-erc.eu/data/corpora/Lasla/id/corpus.
9
main/documents/UA_CONLL_U_Plus_proposal_v1.0.md http://www.thelatinlibrary.com. The text is available in LiLa
6
English translations of Latin examples are taken, with minor at https://lila-erc.eu/lodview/data/corpora/CIRCSELatinLibrary/id/
changes, from [20] (Augustine) and [21] (Plautus). corpus/Confessiones
their agreement. Inter-annotator agreement was cal- 3.3. Annotation Issues
culated through the Dice coefficient similarity metric,
In this section, we present and discuss three examples
which is widely adopted in NLP [22, 23]. Its value ranges
of annotation issues. On one hand, we address a prob-
from 0 to 1, with 1 indicating that two sets are identical
lematic case regarding the application of our annotation
and 0 meaning that they have no overlap. Once evaluated
scheme on the data, which was the primary reason for
that the annotated markables span the same tokens for
disagreement between the two annotators (example 4).
the two annotators in all cases, we calculated the simi-
On the other hand, we present two cases that highlight
larity values as for entities (0.817) and mentions (0.824),
the fundamental role of context (example 5) and of the
which are comparatively highly acceptable for this task
literary genre (example 6) for the coreference resolution
[24, 25, 26]. Additionally, the Cohen’s Kappa coefficient
task. The limited number of cases presented below is
was measured, yielding the following agreement values
consistent with our prior decision to restrict the scope of
for each markable class: for the markable class ‘mention’
annotation to only a subset of coreferential phenomena.
the resulting value is 0.8139902, whereas for the mark-
We hypothesize that expanding the range of annotated
able class ‘entity’, the value obtained is 0.8118851.
coreference types or enlarging the corpus of annotated
Table 1 presents the data derived from the analysis of the
texts (in terms of quantity and literary genre) would lead
two texts. To highlight the quantitative significance of
to greater annotation challenges.
the coreference phenomenon, it shows the total number
Starting from the first annotation issue, the most relevant
of tokens in the texts analyzed, along with the number
disagreement between the two annotators concerns how
of tokens involved in coreference relations. Additionally,
to link mentions that are distant in the text from the
the table shows the total number of coreference rela-
entity they refer to. Example (4) shows a representative
tions, and their respective entities and mentions. The
case of this type of disagreement.
Table 1 (4) Bonus ergo est qui fecit me, et ipse est bonum
Data obtained from the analysis of the corpus
meum, et illi exulto bonis omnibus quibus etiam puer
Category Confessiones Curculio eram. Hoc enim peccabam, quod non in ipso sed in
Tot. token 6,133 5,853 creaturis eius me atque ceteris voluptates, sublimitates,
Token in coref. 746 976 veritates quaerebam, atque ita inruebam in dolores,
Coref. relation 521 796 confusiones, errores. (Aug. Conf. 1.20.31)
Entity 202 577 ‘Therefore the one who made me is good, and he himself
Mention 542 569
is my good, and I rejoice in him for all the good things
of which I consisted even in childhood. This was my sin:
tokens involved in a coreference relation account for the I sought pleasures, exaltations, truths not in he himself
12.16 percent of the total in Confessiones, while in Cur- but in his creations, which is to say, in myself and other
culio they represent the 16.7 percent of the total. In both things’.
cases the percentages exceed the data produced by the
FIR project, where the phenomenon concerns approx- The pronouns in (4) are references to the entity
imately the 8 percent of the tokens of the Latin texts God, which is explicitly expressed six sentences above
annotated therein. The table clearly indicates that Cur- in the text. The reader has no difficulty decoding these
culio exhibits a greater number of coreferences despite pronouns because the first-person narrator is discussing
having a lower total number of tokens. This difference is his relationship with God, to whom he is constantly
statistically significant: the chi-squared test performed referring. Therefore, it is not necessary to explicitly state
on these data yielded a chi-squared statistic of 49.18 and the entity in every sentence.
a p-value lower than 0.00001. Given that the p-value is The sentence in (4) can be annotated in two distinct
lower than the conventional alpha level of 0.05, corefer- ways: each pronoun can either be directly linked to the
ence relations vary significantly from a statistic point of entity ‘God’ within the text, or be linked to the first pro-
view in Confessiones and in Curculio. The coreference noun concerned in (4) (qui), which gets then linked to
phenomenon is indeed widespread in the language of the external entity ‘God’. During the annotation process,
Plautus’s theatre. This may be due to the fact that Plau- the two annotators diverged: one selected the former
tus’s language mimics, to some extent, everyday spoken method, while the other opted for the latter. There is
language. Furthermore, the presence of numerous dia- no upper limit to the number of sentences after which a
logues, where speakers often interrupt each other’s turns, mention cannot be associated with the entity to which it
implies frequent references to the recipients with whom refers [27]. When CR and AR first emerged as NLP tasks,
the characters interact. The text structure, characterized there were concerns that machines could not yield accept-
by numerous allocutions, also contributes to the high able results if the mention and the entity were too distant
number of coreferences.
from each other [28]. However, contemporary meth- refer to the same entity. This case clearly demonstrates
ods achieve satisfactory results even with long-distance the importance of understanding both the context and
coreference, exceeding 200 sentences [29]. Additionally, the specific narrative techniques of the textual genre in
given that we focus on literary texts, which feature long- order to effectively resolve coreferences.
distance coreferences more frequently than other textual
types [30], it is imperative that we devote particular atten-
tion to this specific type of coreference. The two options 4. Conclusion and Future Work
chosen by the annotators are both equally valid. To har-
In this paper, we provide an overview of the current
monize the annotation process, we decided to link the
state of a project aimed to build a Latin corpus enhanced
mention to the external entity beyond a certain threshold,
with coreference and anaphora resolution. We detailed
which was set at five sentences10 .
the annotation criteria and discussed a few annotation
Sentence (5) from Plautus’ Curculio exemplifies
challenges, highlighting how this annotation layer ne-
another challenging case of ambiguity, which further
cessitates a profound interaction among various fields
complicates the annotation process:
of expertise, including linguistics, textual criticism, and
literature.
(5) Pal.: Quid? tu te pones Veneri ieientaculo? Phaed.: Me,
In the near future, our aim is to expand the annotated
te atque hosce omnis. (Pl. Curc. 1, 73-74)
corpus and to further extend the evaluation of inter-
Pal.: ‘What? You’ll offer yourself a breakfast to Venus?’
annotator agreement by incorporating the metrics as
Phaed.: ‘Yes, myself, yourself, and all these here.’
those proposed by Kopeć and Ogrodniczuk [35], such
as the MUC score [36]. Once a sufficiently large dataset
As is typical in theatrical texts, much is left to
will be available, NLP will be concerned too, as we plan
the audience’ inference. In this instance, the actor’s
to exploit the annotated dataset to train and evaluate a
gestures serve to disambiguate the phrase hosce omnis,
stochastic model in supervised fashion to perform au-
which could refer either to the group of slaves accom-
tomatic CR/AR of Latin, usable also in NLP pipelines
panying the character Phaedromus or to the audience
like, for instance, UDPipe [37] and Stanza [38]. We ex-
itself [31, 32, 33]. The annotators decided to follow the
pect such a model to prove helpful to provide the Latin
interpretation provided by Paratore [34], according
treebanks currently available in the Universal Depen-
to whom, hosce omnis refers to the audience. In this
dencies (UD) initiative [39] with a layer of so-called en-
example, an agreement in gender and number between
hanced dependencies, which also includes coreference
the mentions and the potential antecedents inferred
and anaphora resolution. This would position Latin on
from the context can be observed. Disambiguating the
an equal footing with other contemporary languages for
antecedent not only requires understanding the text but
which CR/AR annotations are also publicly accessible
also knowing the specific characteristics of the literary
in treebanks [40] 11 . Given that one of the UD Latin
genre concerned.
treebanks, the Index Thomisticus Treebank, is already
Another case in which the importance of literary
published as Linked Data in the LiLa Knowledge Base
genre and knowledge of context becomes evident is as
[41], having the treebank enriched with enhanced de-
follows.
pendencies will require to model and publish therein the
metadata about CR/AR.
(6) Cvrc.: [...] Lyconem quaero tarpezitam. Lyc.:
The contribution of our project can also be considered
Dic mihi, quid eum nunc quaeris? (Pl. Curc. 3, 406- 407):
within the broader context of NLP task on Latin. For in-
Cvurc.: ‘I’m looking for the banker Lyco.’ Lyc.: ‘Tell me,
stance, the corpus enriched with coreference annotations
why are you looking for him now?’
could enhance a task such as Emotion Polarity Detection,
which was one of the shared tasks at the last edition of
The dialogue cited here between the two charac-
the evaluation campaign EvaLatin 2024. In the long term,
ters, Curculio and Lyco, plays on a comedic ambiguity:
a follow-up of the project will consist in building further
Curculio knows he is speaking to Lyco, while Lyco
textual datasets that feature other layers of coreferen-
believes that Curculio is unaware of his identity. When
tial annotation recognized by the GUM framework, such
Curculio asks to speak with Lyco, Lyco responds by
as appositive, attributive, and predicative coreferences,
speaking about himself in the third person, thereby con-
along with discourse deixis, and non-proper coreferences.
cealing his identity. For this reason, both the first-person
Finally, given the current spread of Large Language Mod-
pronoun ‘mihi’ and the third-person pronoun ‘eum’
els and their highly promising accuracy rates on a wide
10
The threshold is sentence-based rather than token-based as sen- range of NLP tasks, our data could be used to fine-tune
tence is the usual relevant unit adopted in CR/AR, where indeed
11
it is regular distinguishing between, for instance, intra- and inter- https://universaldependencies.org/u/overview/enhanced-syntax.
sentential anaphora. html
already models for Latin, such as the Latin BERT [42]. tilingual unrestricted coreference in ontonotes, in:
Joint conference on EMNLP and CoNLL-shared
task, 2012, pp. 1–40.
5. Acknowledgements [11] S. Calhoun, J. Carletta, J. M. Brenier, N. Mayo, D. Ju-
rafsky, M. Steedman, D. Beaver, The nxt-format
This contribution is funded by the PRIN 2022 project "Tex-
switchboard corpus: a rich resource for investigat-
tual Data and Tools for Coreference Resolution in Latin"
ing the syntax, semantics, pragmatics and prosody
(CUP J53D23013680008), a project carried out jointly by
of dialogue, Language resources and evaluation 44
the Università Cattolica del Sacro Cuore in Milan and by
(2010) 387–419.
the University of Udine.
[12] A. Nedoluzhko, M. Novák, S. Cinková, M. Mikulová,
J. Mírovský, Coreference in Prague Czech-English
References Dependency Treebank, in: N. Calzolari, K. Choukri,
T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard,
[1] M. Passarotti, F. Mambrini, G. Franzini, F. M. Cec- J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis
chini, E. Litta, G. Moretti, P. Ruffolo, R. Sprugnoli, (Eds.), Proceedings of the Tenth International Con-
Interlinking through lemmas. the lexical collection ference on Language Resources and Evaluation
of the lila knowledge base of linguistic resources for (LREC’16), European Language Resources Associa-
latin, Studi e Saggi Linguistici 58 (2020) 177–212. tion (ELRA), Portorož, Slovenia, 2016, pp. 169–176.
[2] M. Passarotti, From syntax to semantics. first steps URL: https://aclanthology.org/L16-1026.
towards tectogrammatical annotation of latin, in: [13] E. Hinrichs, S. Kübler, K. Naumann, H. Telljohann,
Proceedings of the 8th Workshop on Language J. Trushkina, et al., Recent developments in lin-
Technology for Cultural Heritage, Social Sciences, guistic annotations of the TüBa-D/Z treebank, Uni-
and humanities (LaTeCH), 2014, pp. 100–109. versitätsbibliothek Johann Christian Senckenberg,
[3] T. Berners-Lee, J. Hendler, O. Lassila, The semantic 2004.
web, Scientific american 284 (2001) 34–43. [14] R. Iida, M. Komachi, K. Inui, Y. Matsumoto, Annotat-
[4] R. Sukthanker, S. Poria, E. Cambria, ing a japanese text corpus with predicate-argument
R. Thirunavukarasu, Anaphora and corefer- and coreference relations, in: Proceedings of the
ence resolution: A review, Information Fusion 59 linguistic annotation workshop, 2007, pp. 132–139.
(2020) 139–162. [15] A. Minutolo, R. Guarasci, E. Damiano, G. De Pietro,
[5] R. Busa, Trent’anni d’informatica su testi: a che H. Fujita, M. Esposito, A multi-level methodology
punto siamo? quali spazi aperti alla ricerca?, in: for the automated translation of a coreference res-
Convegno su L’Università e l’evoluzione delle Tec- olution dataset: an application to the italian lan-
nologie Informatiche, volume 1, CILEA, Milano, guage, Neural Computing and Applications 34
Italy, 1983, pp. 7.1–7.4. (2022) 22493–22518.
[6] J. Nyhan, M. Passarotti, One Origin of Digital Hu- [16] M. Recasens, M. A. Martí, Ancora-co: Coreferen-
manities: Fr Roberto Busa in His Own Words, tially annotated corpora for spanish and catalan,
Springer, 2019. Language resources and evaluation 44 (2010) 315–
[7] N. A. Chinchor, Overview of MUC-7, in: Sev- 345.
enth Message Understanding Conference (MUC- [17] D. Bamman, G. Crane, The design and use of a latin
7): Proceedings of a Conference Held in Fairfax, dependency treebank, in: Proceedings of the Fifth
Virginia, April 29 - May 1, 1998, 1998. URL: https: Workshop on Treebanks and Linguistic Theories
//aclanthology.org/M98-1001. (TLT2006), Citeseer, 2006, pp. 67–78.
[8] G. R. Doddington, A. Mitchell, M. A. Przybocki, [18] M. Passarotti, The project of the index thomisticus
L. A. Ramshaw, S. M. Strassel, R. M. Weischedel, treebank, in: Digital Classical Philology, De Gruyter
The automatic content extraction (ace) program- Saur, 2019, pp. 299–320.
tasks, data, and evaluation., in: Lrec, volume 2, [19] V. B. Lenzi, G. Moretti, R. Sprugnoli, Cat: the celct
Lisbon, 2004, pp. 837–840. annotation tool., in: LREC, 2012, pp. 333–338.
[9] S. Pradhan, L. Ramshaw, M. Marcus, M. Palmer, [20] W. W. Augustine, Confessions, Vol. 2: Books 9-13
R. Weischedel, N. Xue, Conll-2011 shared task: (Loeb Classical Library, No. 27), 1912.
Modeling unrestricted coreference in ontonotes, in: [21] P. Nixon, et al., Plautus, Vol. II: Casina. The Casket
Proceedings of the fifteenth conference on com- Comedy. Curculio. Epidicus. The Two Menaech-
putational natural language learning: shared task, muses (Loeb Classical Library), William Heine-
2011, pp. 1–27. mann; GP Putnam’s Sons, 1917.
[10] S. Pradhan, A. Moschitti, N. Xue, O. Uryupina, [22] L. R. Dice, Measures of the amount of ecologic
Y. Zhang, Conll-2012 shared task: Modeling mul- association between species, Ecology 26 (1945) 297–
302. Association for Computational Linguistics
[23] T. Sorensen, A method of establishing groups of 11 (2023) 922–940. URL: https://doi.org/10.
equal amplitude in plant sociology based on similar- 1162/tacl_a_00581. doi:10.1162/tacl_a_00581.
ity of species content and its application to analyses arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.116
of the vegetation on danish commons, Biologiske [37] M. Straka, J. Hajic, J. Straková, Udpipe: trainable
skrifter 5 (1948) 1–34. pipeline for processing conll-u files performing tok-
[24] K. B. Cohen, A. Lanfranchi, M. J.-y. Choi, M. Bada, enization, morphological analysis, pos tagging and
W. A. Baumgartner, N. Panteleyeva, K. Verspoor, parsing, in: Proceedings of the Tenth International
M. Palmer, L. E. Hunter, Coreference annotation Conference on Language Resources and Evaluation
and resolution in the colorado richly annotated full (LREC’16), 2016, pp. 4290–4297.
text (craft) corpus of biomedical journal articles, [38] S. N. Group, et al., Stanza–a python nlp package for
BMC bioinformatics 18 (2017) 1–14. many human languages, 2018.
[25] I. Hendrickx, G. Bouma, F. Coppens, W. Daelemans, [39] M.-C. De Marneffe, C. D. Manning, J. Nivre, D. Ze-
V. Hoste, G. Kloosterman, A.-M. Mineur, J. Van man, Universal dependencies, Computational lin-
Der Vloet, J.-L. Verschelde, A coreference corpus guistics 47 (2021) 255–308.
and resolution system for dutch., in: LREC, 2008. [40] V. Ng, Supervised noun phrase coreference re-
[26] A. Nedoluzhko, J. Mírovskỳ, P. Pajas, The coding search: The first fifteen years, in: Proceedings
scheme for annotating extended nominal corefer- of the 48th annual meeting of the association for
ence and bridging anaphora in the prague depen- computational linguistics, 2010, pp. 1396–1411.
dency treebank, in: Proceedings of the Third Lin- [41] F. Mambrini, M. Passarotti, G. Moretti, M. Pelle-
guistic Annotation Workshop (LAW III), 2009, pp. grini, The index thomisticus treebank as linked
108–111. data in the lila knowledge base, in: Proceedings of
[27] R. Simone, Fondamenti di linguistica, volume 9, Lat- the Thirteenth Language Resources and Evaluation
erza Bari, 1990. Conference, 2022, pp. 4022–4029.
[28] T. McEnery, I. Tanaka, S. Botley, Corpus annotation [42] D. Bamman, P. J. Burns, Latin bert: A contextual lan-
and reference resolution, Operational Factors in guage model for classical philology, arXiv preprint
Practical, Robust Anaphora Resolution for Unre- arXiv:2009.10053 (2020).
stricted Texts (1997).
[29] H.-L. Trieu, A.-K. D. Nguyen, N. Nguyen, M. Miwa,
H. Takamura, S. Ananiadou, Coreference resolution
in full text articles with bert and syntax-based men-
tion filtering, in: Proceedings of the 5th workshop
on BioNLP open shared tasks, 2019, pp. 196–205.
[30] R. Thirukovalluru, N. Monath, K. Shridhar, M. Za-
heer, M. Sachan, A. McCallum, Scaling within doc-
ument coreference to long texts, Findings of the
Association for Computational Linguistics: ACL-
IJCNLP 2021 (2021) 3921–3931.
[31] L. Cappiello, Un commento al curculio di plauto
(vv. 1-370) (2015).
[32] G. Monaco, T. M. Plauto, Teatro di Plauto: Il Cur-
culio. I, Istituto editoriale cultura europea, 1963.
[33] T. H. Gellar-Goad, Plautus’ curculio and the case
of the pious pimp, Roman Drama and its Contexts
34 (2016) 231.
[34] T. M. Plautus, E. Paratore, Il gorgoglione:(Il Gor-
goglione). Testo latino con traduzione a fronte, San-
soni, 1958.
[35] M. Kopeć, M. Ogrodniczuk, Inter-annotator agree-
ment in coreference annotation of polish, Advanced
Approaches to Intelligent Information and Database
Systems (2014) 149–158.
[36] B. Zheng, P. Xia, M. Yarmohammadi, B. V.
Durme, Multilingual Coreference Resolution
in Multiparty Dialogue, Transactions of the