=Paper=
{{Paper
|id=Vol-3033/paper24
|storemode=property
|title=The Annotation of Liber Abbaci, a Domain-Specific Latin Resource
|pdfUrl=https://ceur-ws.org/Vol-3033/paper24.pdf
|volume=Vol-3033
|authors=Francesco Grotto,Rachele Sprugnoli,Margherita Fantoli,Maria Simi,Flavio Massimiliano Cecchini,Marco Passarotti
|dblpUrl=https://dblp.org/rec/conf/clic-it/GrottoSFSCP21
}}
==The Annotation of Liber Abbaci, a Domain-Specific Latin Resource==
The Annotation of Liber Abbaci, a Domain-Specific Latin Resource
Francesco Grotto1 , Rachele Sprugnoli2 , Margherita Fantoli3 ,
Maria Simi4 , Flavio Massimiliano Cecchini2 , Marco Passarotti2
1. Scuola Normale Superiore, Italy
2. Università Cattolica del Sacro Cuore, Italy
3. KU Leuven, Belgium
4. Università degli Studi di Pisa, Italy
francesco.grotto1@sns.it,
{rachele.sprugnoli,flavio.cecchini,marco.passarotti}@unicatt.it
margherita.fantoli@kuleuven.be, maria.simi@unipi.it
Abstract literary ones, a huge amount of Latin texts of sev-
eral different genres can be found spread all over
The Liber Abbaci (13th century) is a mile- Europe and beyond. An important textual genre
stone in the history of mathematics and ac- is represented by scientific treaties, which in many
counting. Due to the late stage of Latin, cases are interesting not only for their contents, but
its features and its very specialized con- also because of the technical terminology they fea-
tent, it also represents a unique resource ture.
for scholars working on Latin corpora. This is precisely the case for the Liber Abbaci
In this paper we present the annotation ‘the book of the abacus’ by Leonardo of Pisa (also
and linking work carried out in the frame known as Fibonacci). Written in the very first
of the project Fibonacci 1202-2021. A years of the 1200s, it is a book on arithmetic pro-
gold-standard lemmatization and part-of- moting a style of calculation based on Arabic nu-
speech tagging allow us to elaborate some merals without aid of an abacus. Fibonacci 1202-
first observations on the linguistic and his- 2021 is a project financed by the Tuscany Re-
torical features of the text, and to link the gion and involving the University of Pisa and the
text to the Lila Knowledge Base, that has Galilei Museum in Florence, following the pub-
as its goal to make distributed linguistic lication of a critical edition of the Liber Abbaci
resources for Latin interoperable by fol- by Enrico Giusti (Fibonacci, 2020). The goal of
lowing the principles of the Linked Data the project is to produce an enhanced digital edi-
paradigm. Starting from this specific case, tion of this work by leveraging advanced publish-
we discuss the importance of annotating ing tools and investigating the use of computa-
and linking scientific and technical texts, tional linguistics techniques in order to uncover
in order to (a) compare and search them the wealth of linguistic, scientific and historical in-
together with other (non-technical) Latin formation contained in the book.
texts (b) train, apply and evaluate NLP re- Besides its scientific interest, the Liber Abbaci
sources on a non-standard variety of Latin. features a very peculiar lexicon, not often repre-
The paper also describes the fruitful inter- sented in the currently available (linguistically an-
action and coordination between NLP ex- notated) corpora for Latin. In order to fill this gap,
perts and traditional Latin scholars on a in the context of the project Fibonacci 1202-2021
project requiring a large range of exper- we have started performing the linguistic annota-
tise. tion of the Liber Abbaci, beginning from part-of-
speech (PoS) tagging and lemmatization of a spe-
1 Introduction cific chapter of the book, chosen for its linguistic
Latin texts have a wide diachronic and diatopic ex- and historical interest. The dataset is freely avail-
tension that corresponds to a similarly large diver- able online1 .
sity of the textual genres they represent. Besides This paper describes the process of annotation
of the Liber Abbaci and two applications of its
Copyright © 2021 for this paper by its authors. Use per-
1
mitted under Creative Commons License Attribution 4.0 In- http://dialogo.di.unipi.it/
ternational (CC BY 4.0). LiberAbbaci
results, namely (a) the evaluation of a number The corpus of Latin Lemmatized Texts released
of trained models for PoS tagging and lemma- by Thibault Clérice (Clérice, 2021a) is formed by
tization for Latin in out-of-domain fashion and 21,222,911 tokens (17,804,769 without punctua-
(b) the interlinking of the annotated chapter with tion marks) and includes a large set of Classi-
other linguistic resources for Latin through the cal and Late Latin texts available in a a number
Lila Knowledge Base (KB)2 . of open access corpora6 . Clérice’s corpus cov-
ers a very ample chronological span (up until the
2 Related Work 9th century) as well as different genres: from Clas-
The research area dealing with the creation of lin- sical literature (Horace, Ovid, etc.), to Christian
guistic resources and Natural Language Process- religious texts and legal texts. The linguistic an-
ing (NLP) tools for ancient languages has seen a notation consists of lemmatization and full mor-
remarkable growth during the last decade (Sprug- phological description of the tokens , produced
noli and Passarotti, 2020). This has primarily con- automatically by applying the Pie Latin LASLA+
cerned Latin and Ancient Greek as essential media model 0.0.6 (Manjavacas et al., 2019), fine-tuned
to access and understand the so-called Classical on ca. 1,500,000 tokens taken from the LASLA cor-
heritage. In particular, several annotated corpora pus (Clérice, 2021b), with very good results con-
of Latin texts are currently available in digital for- cerning lemmatization and PoS tagging7 . How-
mat: they follow different guidelines and tagsets ever, results appear to be less good on unknown
and feature different layers of linguistic annota- tokens8 . This difference underlines the difficulty
tion. This section wants to provide a (far from ex- of using automatic annotation tools on texts with
haustive) overview of such resources to show how a very specialized language, surely not found in
the dataset presented in this paper stands with re- LASLA , as is the case for Fibonacci’s Liber Ab-
spect to the state of the art. baci.
The LASLA corpus contains 2,500,000 semi- As for syntactically annotated corpora, five tree-
manually annotated tokens. It covers a large por- banks are currently available for Latin. They
tion of the extant Classical Latin literature. It was are the Index Thomisticus Treebank (IT- TB) (Pas-
started in 1961 by the LASLA research center at the sarotti, 2019), the PROIEL treebank (Haug and
Université de Liège3 and is still being expanded4 . Jøhndal, 2008; Eckhoff et al., 2018), the Latin
The corpus is considered to be a gold standard, Dependency Treebank by the Perseus Digital Li-
since the annotation of every token has been man- brary (part of the Ancient Greek and Latin Tree-
ually verified by a philologist. The linguistic in- bank) (Bamman and Crane, 2007), the Late Latin
formation consists of lemmatization, morpholog- Charter Treebank (LLCT) (Cecchini et al., 2020b)
ical tagging, and an additional syntactic layer for and the UDante treebank (Cecchini et al., 2020a).
verbs (Verkerk et al., 2020). Texts cover various The treebanks include texts of different genres (lit-
literary genres (theater, poetry, prose) and have a erary, historical, philosophical and documentary)
chronological extension ranging from the come- and periods (from Classical to Medieval), but tech-
dies of Plautus to the texts of Suetonius and Pliny nical works are not represented.
the Younger. Recent additions reach later stages
of Latin literature 5 , but include neither Medieval 3 Dataset Creation and Analysis
nor Neo-Latin works. Natural sciences and tech-
The Liber Abbaci is made up of more than 270,000
nical works are weakly represented in the cor-
tokens and is divided into 15 chapters of varying
pus, the treatise De Agri Cultura ‘on agriculture’
length. The choice of starting our manual annota-
by Cato and the recently added Naturales Quaes-
tion from chapter VIII de reperiendis pretiis mer-
tiones ‘investigations about nature’ by Seneca be-
cium per maiorem guisam ‘on finding out the price
ing the only examples.
of goods through the “greater means”’ is due to the
2
https://lila-erc.eu
3 6
http://web.philo.ulg.ac.be/lasla/ For the full list, see https://github.com/
presentation-du-laboratoire/ lascivaroma/latin-lemmatized-texts/tree/
4
See http://web.philo.ulg.ac.be/lasla/ 0.1.2.
7
textes-latins-traites/. For lemmatization, accuracy: 0.9734 . For PoS tagging,
5 accuracy: 0.9651 .
Of which some are already available: see
8
http://web.philo.ulg.ac.be/lasla/ For lemmatization, accuracy: 0.8716 . For PoS tagging,
textes-latins-en-cours-de-traitement/. accuracy: 0.9232 .
peculiarity of its content. Here, Fibonacci treats always happen straightforwardly. Chapter VIII
many simple business negotiations using propor- of the Liber Abbaci, as well as the work in its
tions and referring to many examples taken from entirety, presents several typical features of Me-
the entire Mediterranean world. The examples dieval Latin, both graphically (e. g. the monoph-
concern weight and monetary systems as well as thongization ae → e and the spelling nichil instead
the main products bought and sold in the 13th cen- of the Classical nihil ‘nothing’), morphologically
tury. This means that the text is rich of terminol- (e. g the presence of analytical verb forms such as
ogy specific of the mathematical domain but also the “perfect”, i. e. present perfective, subjunctive
of trade and commerce. Chapter VIII is made up of habeat . . . honeratum, instead of the Classical on-
29,858 tokens (including punctuation marks), thus erauerit, from onero ‘to load’) and syntactically
covering about 10% of the total length of the Liber (e. g. the nearly exclusive use of quod ‘that’ to in-
Abbaci. troduce declarative clauses, instead of accusative
and infinitive11 ). It is also worth noting the very
3.1 Data Annotation limited use of enclitic particles (in the whole chap-
The manual annotation of chapter VIII is carried ter VIII, Fibonacci uses the enclitic conjunction
out by a master’s degree student in Classical lan- que ‘and’ only 3 times, appended to the auxiliary
guages, with excellent knowledge of Latin but verb form erunt ‘they will be’) and the presence
without any previous expertise in either linguis- of syntactic calques of vernacular constructions
tic annotation or computational linguistics. The (e. g. secundum quod uadis multiplicando ‘accord-
overall effort of the work amounts to a total of ing to what you are multiplying’, where uado is
227 hours, including: training sessions, study preferred to the more Classical eo ‘to go’ and fur-
of the guidelines and of terminology related to ther assumes an auxiliary function, and the use
measures, coins and trade in the Middle Ages of the gerundive form multiplicando is an innova-
(Marcinkowski, 2003; Martinori, 1915), the actual tion).
annotation, the reconciliation after evaluation of But the main peculiarities of the text concern the
inter-annotator agreement (IAA, see Section 3.2), lexicon. Chapter VIII presents indeed a rich set of
periodic checks with supervisors, the linking of toponyms, units of measurement, names of coins
the annotated text to the LiLa KB (see Section and Arabisms often not even reported by Medieval
5). We make use of a large number of dictio- Latin dictionaries. This is the case, for example,
naries as references: the Oxford Latin Dictionary of some names of places, such as Bugea, today’s
(OLD) (Souter, 1968), the Lexicon Totius Latini- Biğāya/Bgayet in Algeria (a city where Fibonacci
tatis (Forcellini, 1965), the Dictionnaire illustré spent a period of his childhood, learning the art
latin-français (herafter: Gaffiot) (Gaffiot, 2016) of calculation), and Septis, today’s Ceuta/Sabta on
and the Thesaurus Linguae Latinae9 for Classical the Strait of Gibraltar; or, among the numismatic
Latin, but also the Dictionary of Medieval Latin terms, of bolsonalia, a word designating a certain
from British Sources (Latham and Howlett, 1975) amount of broken silver or mixture coins which
and the Glossarium mediae et infimae latinitatis were sold to goldsmiths because they were adul-
(du Cange et al., from 1883 to 1887) for Medieval terated or out of date.
Latin. Tokenization and sentence splitting are per-
formed manually on a text editor, then lemmati- 3.2 Inter-Annotator Agreement
zation and PoS tagging are carried out on a shared The IAA is calculated on 30 sentences (1,010 to-
spreadsheet following the Universal Dependencies kens), with the participation of a second scholar
(UD) formalism (de Marneffe et al., 2021), in par- with a background in Classical languages. We reg-
ticular both the universal and the language-specific ister an almost perfect agreement with a Cohen’s
guidelines relative to the latest release of the UD κ (Artstein and Poesio, 2008) of 0.97 for lemma-
treebanks (v 2.9)10 . tization and 0.94 for PoS tagging.
The implementation of the UD guidelines to The comparison between the two annotations
the linguistic peculiarities of the text does not highlights two main issues. The first concerns the
9
https://thesaurus.badw.de/ choice of the UPOS (Universal Part Of Speech) tag
das-projekt.html (de Marneffe et al., 2021, §2.2.2) for terms such as
10
https://universaldependencies.org/
11
guidelines.html See for example (Traina and Bertotti, 2015, C. XVI) .
nam ‘certainly’ and enim ‘namely’, because differ- an exact historical moment; cf. (Ledgeway, 2012,
ent corpora and dictionaries adopt different con- §4.2.1).
ventions: e. g. nam is labeled as adverb in the
Lila KB and Df in the Latin PROIEL treebank, 4 Comparing NLP Models
both possibly equivalent to UPOS ADV12 ; as S13 , Table 1 reports accuracy scores computed on our
standing for conjonction de coordination (UPOS: gold standard processed with UDPipe using the
CCONJ) in the LASLA corpus, and more gener- UD v2.6 models for Latin (Straka and Straková,
ically conjonction (servant à confirmer/causale) 2017). The scores clearly show that current mod-
(UPOS either CCONJ or SCONJ) in the Gaffiot; els are not good enough to process the Latin of
finally particle (not necessarily corresponding to Fibonacci. The best accuracy for lemmatization
UPOS PART) in the OLD , and similarly partic-
is achieved by the model trained on the LLCT tree-
ule in one sense in the Gaffiot. The treatment of bank, which contains a set of Early Medieval char-
the etymologically related and functionally simi- ters written in Tuscany. However, this scores are
lar enim is mostly identical for all sources, only lower than state-of-the-art ones: the best partic-
with the Gaffiot reporting a sense as adverbe in- ipating system at the EvaLatin 2020 evaluation
stead of particule, followed by the LASLA cor- campaign achieves an accuracy of 96, 19% for
pus in using both labels S and M (generic for ad- lemmatization and 96, 74% for PoS tagging on
verbe), the latter though very marginally. These the corresponding test set (Sprugnoli et al., 2020),
terms have been discussed and finally assigned the i. e. about 33 and 15 points more than the results
UPOS PART, used in the latest Latin UD treebanks
obtained on Fibonacci.
to label discoursive particles like these. Such diffi-
culties derive on one hand from the “volatile” and Lemma UPOS
diachronically variable nature of similar elements, EvaLatin2020 63.60 81.90
but on the other hand, and relatedly, to traditional IT- TB 65.58 77.14
grammars overlooking them and more generally LLCT 68.81 82.79
skipping over pragmatic phenomena, in favour of Perseus 67.54 78.37
“more Classical” parts of speech (hence the fre- PROIEL 60.25 51.64
quent inclusion of nam, enim, etc. in the catchall
category of “adverbs”). Table 1: Accuracy of UDPipe v2.6 Latin models
The second issue is the UPOS to be used for tested on chapter VIII of the Liber Abbaci.
unus ‘one’. Fibonacci often uses unus to indicate a
generic entity, as is clearly visible when paralleled Taking into consideration lemmatization, the
by alter ‘other’. In this case, unus is tagged as DET percentage of out-of-vocabulary lemmas, that is,
(determiner), like alter14 . In a number of other lemmas present in the text by Fibonacci but not
contexts, however, unus specifies the quantity of in the training texts of the models, is very high
a certain object. In such cases it is considered a (> 50% of lemma types). The majority of errors
NUM (numeral)15 . The difficulty here originates are registered for numbers and common nouns.
from a well known and complex linguistic change The first problem is due to the fact that some
that will eventually produce a clear indefinite arti- models do not recognize Arabic numbers, because
cle from the numeral in Romance languages, but they have not seen them in their training data,
for which, being so gradual, we cannot pinpoint while others lemmatize them with a special “met-
alemma” of the kind of num. arab., eschewing lex-
12
Cf. (Eckhoff et al., 2018, §5) ical forms. As for common nouns, most errors re-
13
With only very few exceptions when it is seen as part of lated to lemmatization concern the lexical classes
a compound expression with tmesis, thus not receiving an au-
tonomous PoS; cf. Pl. Am. 2.1, 49-50: Quo id, malum, pacto discussed in Section 3. For example, the tokens
potest nam (mecum argumentis puta) fieri, nunc uti tu et hic libris and libre are often lemmatized as liber ‘free’
sis et domi?, interpreted as an instance of quonam ‘whither (ADJ) instead of libra ‘pound’ (NOUN).
pray?’, itself receiving K meaning pronom interrogatif.
14
For instance, in the clause ita est pretium unius ad Table 2 shows the F 1 score per UPOS tag. We
pretium alterius (VIII, 8) ‘so the price of the one [merchan- observe that an F 1 above 70% is achieved by any
dise] is to the price of the other’. model only on 5 tags: ADP, NOUN, NUM, SCONJ
15
For instance, in the clause . . . que multiplica per summam
denariorum unius libre (VIII, 20) ‘which you have to multi- and VERB. No model recognizes the SYM tag
ply by the amount of denarii of which one pound consists’. (used for mathematical operators such as paren-
EvaLatin2020 IT- TB Perseus PROIEL LLCT
SYM 0.00 0.00 0.00 0.00 0.00
AUX 0.24 0.45 0.03 0.27 0.32
ADJ 0.48 0.37 0.40 0.28 0.55
PRON 0.57 0.38 0.40 0.52 0.93
PART 0.65 0.00 0.00 0.00 0.65
ADV 0.70 0.71 0.78 0.21 0.84
CCONJ 0.75 0.67 0.68 0.44 0.86
SCONJ 0.89 0.95 0.96 0.86 0.95
VERB 0.91 0.87 0.92 0.78 0.84
NOUN 0.92 0.83 0.86 0.75 0.88
DET 0.93 0.00 0.00 0.53 0.91
PROPN 0.94 0.09 0.00 0.32 0.57
NUM 0.95 0.96 0.96 0.75 0.99
ADP 0.99 0.98 0.93 0.91 0.88
Global 0.71 0.52 0.49 0.46 0.73
Table 2: F 1 on UPOS tags of UDPipe v2.6 Latin models on chapter VIII of the Liber Abbaci.
theses), because it is not present in their respec- 5 Linking and Querying in LiLa
tive training data. The same is true for the tag
PART in IT- TB (up until UD v2.8)16 , Perseus and The LiLa KB makes linguistic resources for Latin
PROIEL , and for the tag DET in Perseus. In old interoperable by linking tokens in corpora and en-
versions of the IT- TB, DET is limited to the proto- tries in dictionaries/lexica to a collection of canon-
article ly (8 occurrences), while in Perseus the ical forms for Latin called Lemma Bank (Pas-
tag PROPN appears only for the lemma Aefulanus sarotti et al., 2020). In order to connect the
(1 occurrence). The IT- TB-based model, too, reg- lemmas of chapter VIII to LiLa’s KB, a string
isters a near-zero F 1 score for PROPN: in the cor- match is first performed between the lemmas in
responding training data, this tag is used for a re- the texts and those in the KB, also taking into ac-
stricted (116 types of lemmas) set of terms mostly count their parts of speech. Using this strategy,
specific to the domains of philosophy and reli- 88.8% of the lemmas are directly connected to
gion (e. g. Aristoteles, Maria), not present in our a single entry in the KB. The remaining uncon-
dataset. Low performances are registered also for nected lemmas fall into two possible categories:
the AUX tag, the annotation of which is not consis- ambiguous lemmas, that is, with possible con-
tent in training data: in Perseus, this tag is not used nection to more than one entry in the KB; and
at all, while in EvaLatin 2020 it marks only the lemmas absent from the KB. More specifically,
auxiliaries in periphrastic passive (including depo- we find 44 ambiguous lemmas (corresponding to
nent) constructions, while in the other treebanks it 631 tokens): for example, colligo can be con-
is applied also to verbal copulas, as per UD guide- nected to two entries: either a first-conjugation
lines. Further, the Liber Abbaci sees the rise (1 verb colligare17 ‘to bind’, or a third-conjugation
occurrence) of habeo ‘to have’ as a possible auxil- verb colligĕre18 ‘to gather’. These cases are manu-
iary (cf. Section 3.1), unheard of in Classical Latin ally disambiguated, checking each context of use.
and only attested (albeit marginally) in LLCT. The remaining, not directly connected lemmas are
not present in the KB and need to be manually
added: these are mainly words denoting weight
and monetary units (e. g. karatus ‘carat’), or dif-
16
ferent written representations of lemmas already
Annotation discrepancies with respect to other Latin UD
treebanks for INTJ, NUM, PART, PRON and DET have been in LiLa (e. g. torscellus is a graphic variant of tor-
resolved in IT- TB in its last version (2.9), released in Novem-
ber 2021; however, the model adopted in this paper and cur-
17
rently available in UDPipe is based on an older version of the https://lila-erc.eu/data/id/lemma/94854
18
data. https://lila-erc.eu/data/id/lemma/94855
cellus19 , a unit of length). Thanks to the linking, coin minted in Constantinople24 . Finally, virgula
each lemma of our dataset becomes part of an in- (diminutive of virga, properly a ‘rod’, used by Fi-
teroperable ecosystem made of resources of differ- bonacci in the same sense of virgula) primarily de-
ent kinds. We can thus query different interlinked notes the bar between the numerator and denom-
resources using SPARQL and the LiLa endpoints20 . inator of a fraction, but it can also designate the
For example, we can find the lemmas appearing fraction itself (Bocchi, 2004).
only in chapter VIII21 and not in the other texts that
are currently linked to the KB: the Summa Con- 6 Conclusions and Future Work
tra Gentiles by Thomas Aquinas (from the Index This paper describes the annotation of one chap-
Thomisticus), those found in UDante (a corpus of ter of the Liber Abbaci by Fibonacci, and reports
5 works mostly by Dante Alighieri, or attributed to on the linguistic peculiarities of this text and the
him, manually annotated following the UD formal- ensuing challenges.
ism), and the Querolus siue Aulularia (an anony- The results of existing UDPipe models in
mous comedy dating back to the 5th c. AD). lemmatization and tagging show low accuracy and
F 1 scores when compared to the state of the art
Lemma Gloss Freq. for these tasks in the recent EvaLatin 2020 eval-
rotulus unit of weight 296 uation campaign. This, on the one hand, can be
soldus monetary unit 212 attributed to the characteristics of the genre of Fi-
virgula bar of a fraction 202 bonacci’s texts, which are representative of scien-
byzantius monetary unit 73 tific Medieval Latin texts, and on the other hand
cantare unit of weight 67 can be explained with the different choices in an-
notation style of Latin treebanks released under
Table 3: The 5 most frequent distinctive lemmas
the UD project. Substantial improvements can be
in chapter VIII of the Liber Abbaci.
expected with models trained on new releases of
Latin treebanks which have already undertaken the
Table 3 shows the 5 most frequent distinctive,
effort of resolving annotation discrepancies and of
i. e. exclusively found in the Liber Abbaci, lem-
making the annotation style across treebanks more
mas retrieved using a SPARQL query22 . They are
homogeneous. Further improvements will how-
all related to mathematics, coins and units of mea-
ever require new annotated chapters and experi-
surement, confirming the specificity of the domain
ments in domain adaptation, which are scheduled
of our dataset. In particular, rotulus and cantāre
as future work.
are two units of weight, both deriving from Ara-
bic, respectively from rat.l (in turn, a metatheti- Acknowledgments
cal adaptation of Greek λίτρα litra ‘pound’) and
qint.ār, which designates a weight of 100 ro- This work is a contribution to the Fibonacci 1202-
tuli23 . The term soldus, instead, indicates a 2021 project, financed by the Tuscany Region.
unit of measurement used for monetary quantities. Part of the work has been funded by the Eu-
Among the many currencies mentioned in chapter ropean Research Council (ERC) under the Euro-
VIII , Fibonacci often cites the byzantius, a golden
pean Union’s Horizon 2020 research and innova-
tion programme – Grant Agreement No. 769994.
19
https://lila-erc.eu/data/id/lemma/133810 The authors want to thank: prof. Andrea Boc-
20
https://lila-erc.eu/sparql/ chi, dott. Alessandro Gelsumini, prof. Pier Daniele
21
https://lila-erc.eu/data/corpora/ Napolitani and prof. Enrica Salvatori for their lin-
CorpusFibonacci/id/corpus/Liber Abbaci
22
https://github.com/CIRCSE/ guistic and historical advice.
SPARQL-queries/blob/main/
distinctivelemmas-Fibonacci.rq
23
It should be noted that Fibonacci alternates a third- References
declension cantāre (gen. sing. cantāris) with a second-
declension cantarium (gen. sing. cantarii). During Ron Artstein and Massimo Poesio. 2008. Inter-coder
lemmatization of the text, the various attested singular Agreement for Computational Linguistics. Compu-
forms have been linked to their respective lemmas; the tational Linguistics, 34(4):555–596.
nom./acc. plur. cantaria, which theoretically could derive
24
both from cantāre and cantarium, has been linked to the Also mentioned is the byzantius saracenatus, equivalent
lemma cantāre for simple reasons of probability, as it is the to the hyperperus, that is, a byzantius with inscriptions in
most frequently used by Fibonacci among these two forms. Kufic characters (Martinori, 1915).
David Bamman and Gregory Crane. 2007. The Latin Félix Gaffiot. 2016. Dictionnaire Latin-Français. Ac-
Dependency Treebank in a Cultural Heritage Digital cessible at gaffiot.fr.
Library. In Proceedings of the Workshop on Lan-
guage Technology for Cultural Heritage Data (LaT- Dag Trygve Truslew Haug and Marius Jøhndal. 2008.
eCH 2007), pages 33–40, Prague, Czech Republic, Creating a Parallel Treebank of the Old Indo-
June. Association for Computational Linguistics. European Bible Translations. In Proceedings of the
Second Workshop on Language Technology for Cul-
Andrea Bocchi. 2004. In Michelangelo Zaccarello tural Heritage Data (LaTeCH 2008), pages 27–34.
and Lorenzo Tomasin, editors, Storia della lingua e Ronald Edward Latham and David R Howlett. 1975.
filologia. Per Alfredo Stussi nel suo sessantacinques- Dictionary of Medieval Latin from British Sources:
imo compleanno, chapter Sì nel Livero de l’abbecho, Fascicule V: IJKL. OUP Oxford.
pages 121–158. SISMEL – Edizioni del Galluzzo,
Florence, Italy. Adam Ledgeway. 2012. From Latin to Romance, vol-
ume 1 of Oxford studies in historical and diachronic
Flavio M. Cecchini, Rachele Sprugnoli, Giovanni linguistics. Oxford University Press, Oxford, UK.
Moretti, and Marco Passarotti. 2020a. UDante:
First Steps Towards the Universal Dependencies Enrique Manjavacas, Ákos Kádár, and Mike Keste-
Treebank of Dante’s Latin Works. In Seventh Ital- mont. 2019. Improving lemmatization of non-
ian Conference on Computational Linguistics, pages standard languages with joint learning. In Proceed-
1–7, Bologna. CEUR-WS.org. ings of the 2019 Conference of the North American
Chapter of the Association for Computational Lin-
Flavio Massimiliano Cecchini, Timo Korkiakangas, guistics: Human Language Technologies, Volume 1
and Marco Passarotti. 2020b. A New Latin Tree- (Long and Short Papers), pages 1493–1503, Min-
bank for Universal Dependencies: Charters be- neapolis, Minnesota, June. Association for Compu-
tween Ancient Latin and Romance Languages. In tational Linguistics.
Proceedings of the 12th Language Resources and
Evaluation Conference, pages 933–942, Marseille, Hinz Marcinkowski. 2003. Measures and Weights in
France, May. European Language Resources Asso- the Islamic World, an English Translation of Walther
ciation. Hinz’s Handbook Islamische Maße und Gewichte.
International Islamic University Malaysia (IIUM).
Thibault Clérice. 2021a. lascivaroma/latin- Edoardo Martinori. 1915. La Moneta: vocabolario
lemmatized-texts: 0.1.2 - HN PSL, May. generale. Instituto italiano di numismatica.
DOI : 10.5281/zenodo.4661034;
project online at https://github.com/ Marco Passarotti, Francesco Mambrini, Greta Franzini,
lascivaroma/latin-lemmatized-texts. Flavio Massimiliano Cecchini, Eleonora Litta, Gio-
vanni Moretti, Paolo Ruffolo, and Rachele Sprug-
Thibault Clérice. 2021b. Latin Lasla Model, Apr. noli. 2020. Interlinking through lemmas. the lexi-
DOI : 10.5281/zenodo.4661034. cal collection of the lila knowledge base of linguis-
tic resources for latin. Studi e Saggi Linguistici,
Marie-Catherine de Marneffe, Christopher D. Man- 58(1):177–212.
ning, Joakim Nivre, and Daniel Zeman. 2021. Uni-
versal Dependencies. Computational Linguistics, Marco Passarotti, 2019. volume 10 of Age of Ac-
47(2):255–308, 07. cess? Grundfragen der Informationsgesellschaft,
chapter The Project of the Index Thomisticus Tree-
Charles du Fresne sieur du Cange, bénédictins de la bank, pages 299–320. De Gruyter Saur, Berlin, Ger-
congrégation de Saint-Maur, d. Pierre Carpentier, many; Boston, MA, USA.
Johann Christoph Adelung, G. A. Louis Henschel,
Lorenz Diefenbach, and Léopold Favre. from 1883 Alexander Souter. 1968. Oxford Latin dictionary:
to 1887. Glossarium mediae et infimae latinitatis. OLD. Clarendon Press.
Favre, Niort, France. Rachele Sprugnoli and Marco Passarotti. 2020. Pro-
ceedings of LT4HALA 2020-1st Workshop on Lan-
Hanne Martine Eckhoff, Kristin Bech, Gerlof Bouma, guage Technologies for Historical and Ancient Lan-
Kristine Eide, Dag Haug, Odd Einar Haugen, and guages. In Proceedings of LT4HALA 2020-1st
Marius Jøhndal. 2018. The PROIEL treebank Workshop on Language Technologies for Historical
family: a standard for early attestations of Indo- and Ancient Languages.
European languages. Language Resources and
Evaluation, 52(1):29–65. Rachele Sprugnoli, Marco Passarotti, Flavio Massi-
miliano Cecchini, and Matteo Pellegrini. 2020.
Leonardus Bigollus Pisanus vulgo Fibonacci. 2020. Overview of the EvaLatin 2020 evaluation cam-
Liber Abbaci, volume 79 of Biblioteca di «Nun- paign. In Proceedings of LT4HALA 2020 - 1st
cius». Leo S. Olschki, Florence, Italy. Workshop on Language Technologies for Historical
and Ancient Languages, pages 105–110, Marseille,
Egidio Forcellini. 1965. Lexicon totius latinitatis. Ar- France, May. European Language Resources Asso-
naldo Forni, Bologna, Italy. ciation (ELRA).
Milan Straka and Jana Straková. 2017. Tokenizing,
POS tagging, lemmatizing and parsing UD 2.0 with
UDPipe. In Proceedings of the CoNLL 2017 Shared
Task: Multilingual Parsing from Raw Text to Univer-
sal Dependencies, pages 88–99, Vancouver, Canada,
August. Association for Computational Linguistics.
Alfonso Traina and Tullio Bertotti. 2015. Sintassi nor-
mativa della lingua latina. Pàtron, Bologna, Italy.
Philippe Verkerk, Yves Ouvrard, Margherita Fan-
toli, and Dominique Longrée. 2020. L.A.S.L.A.
and Collatinus: a convergence in lexica. SSL,
1(LVIII):95–120.