=Paper=
{{Paper
|id=Vol-3033/paper24
|storemode=property
|title=The Annotation of Liber Abbaci, a Domain-Specific Latin Resource
|pdfUrl=https://ceur-ws.org/Vol-3033/paper24.pdf
|volume=Vol-3033
|authors=Francesco Grotto,Rachele Sprugnoli,Margherita Fantoli,Maria Simi,Flavio Massimiliano Cecchini,Marco Passarotti
|dblpUrl=https://dblp.org/rec/conf/clic-it/GrottoSFSCP21
}}
==The Annotation of Liber Abbaci, a Domain-Specific Latin Resource==
<pdf width="1500px">https://ceur-ws.org/Vol-3033/paper24.pdf</pdf>
<pre>
     The Annotation of Liber Abbaci, a Domain-Specific Latin Resource
          Francesco Grotto1 , Rachele Sprugnoli2 , Margherita Fantoli3 ,
         Maria Simi4 , Flavio Massimiliano Cecchini2 , Marco Passarotti2
                         1. Scuola Normale Superiore, Italy
                  2. Università Cattolica del Sacro Cuore, Italy
                              3. KU Leuven, Belgium
                       4. Università degli Studi di Pisa, Italy
                        francesco.grotto1@sns.it,
{rachele.sprugnoli,flavio.cecchini,marco.passarotti}@unicatt.it
      margherita.fantoli@kuleuven.be, maria.simi@unipi.it

                        Abstract                                literary ones, a huge amount of Latin texts of sev-
                                                                eral different genres can be found spread all over
    The Liber Abbaci (13th century) is a mile-                  Europe and beyond. An important textual genre
    stone in the history of mathematics and ac-                 is represented by scientific treaties, which in many
    counting. Due to the late stage of Latin,                   cases are interesting not only for their contents, but
    its features and its very specialized con-                  also because of the technical terminology they fea-
    tent, it also represents a unique resource                  ture.
    for scholars working on Latin corpora.                         This is precisely the case for the Liber Abbaci
    In this paper we present the annotation                     ‘the book of the abacus’ by Leonardo of Pisa (also
    and linking work carried out in the frame                   known as Fibonacci). Written in the very first
    of the project Fibonacci 1202-2021. A                       years of the 1200s, it is a book on arithmetic pro-
    gold-standard lemmatization and part-of-                    moting a style of calculation based on Arabic nu-
    speech tagging allow us to elaborate some                   merals without aid of an abacus. Fibonacci 1202-
    first observations on the linguistic and his-               2021 is a project financed by the Tuscany Re-
    torical features of the text, and to link the               gion and involving the University of Pisa and the
    text to the Lila Knowledge Base, that has                   Galilei Museum in Florence, following the pub-
    as its goal to make distributed linguistic                  lication of a critical edition of the Liber Abbaci
    resources for Latin interoperable by fol-                   by Enrico Giusti (Fibonacci, 2020). The goal of
    lowing the principles of the Linked Data                    the project is to produce an enhanced digital edi-
    paradigm. Starting from this specific case,                 tion of this work by leveraging advanced publish-
    we discuss the importance of annotating                     ing tools and investigating the use of computa-
    and linking scientific and technical texts,                 tional linguistics techniques in order to uncover
    in order to (a) compare and search them                     the wealth of linguistic, scientific and historical in-
    together with other (non-technical) Latin                   formation contained in the book.
    texts (b) train, apply and evaluate NLP re-                    Besides its scientific interest, the Liber Abbaci
    sources on a non-standard variety of Latin.                 features a very peculiar lexicon, not often repre-
    The paper also describes the fruitful inter-                sented in the currently available (linguistically an-
    action and coordination between NLP ex-                     notated) corpora for Latin. In order to fill this gap,
    perts and traditional Latin scholars on a                   in the context of the project Fibonacci 1202-2021
    project requiring a large range of exper-                   we have started performing the linguistic annota-
    tise.                                                       tion of the Liber Abbaci, beginning from part-of-
                                                                speech (PoS) tagging and lemmatization of a spe-
1    Introduction                                               cific chapter of the book, chosen for its linguistic
Latin texts have a wide diachronic and diatopic ex-             and historical interest. The dataset is freely avail-
tension that corresponds to a similarly large diver-            able online1 .
sity of the textual genres they represent. Besides                 This paper describes the process of annotation
                                                                of the Liber Abbaci and two applications of its
     Copyright © 2021 for this paper by its authors. Use per-
                                                                  1
mitted under Creative Commons License Attribution 4.0 In-           http://dialogo.di.unipi.it/
ternational (CC BY 4.0).                                        LiberAbbaci
results, namely (a) the evaluation of a number              The corpus of Latin Lemmatized Texts released
of trained models for PoS tagging and lemma-             by Thibault Clérice (Clérice, 2021a) is formed by
tization for Latin in out-of-domain fashion and          21,222,911 tokens (17,804,769 without punctua-
(b) the interlinking of the annotated chapter with       tion marks) and includes a large set of Classi-
other linguistic resources for Latin through the         cal and Late Latin texts available in a a number
Lila Knowledge Base (KB)2 .                              of open access corpora6 . Clérice’s corpus cov-
                                                         ers a very ample chronological span (up until the
2       Related Work                                     9th century) as well as different genres: from Clas-
The research area dealing with the creation of lin-      sical literature (Horace, Ovid, etc.), to Christian
guistic resources and Natural Language Process-          religious texts and legal texts. The linguistic an-
ing (NLP) tools for ancient languages has seen a         notation consists of lemmatization and full mor-
remarkable growth during the last decade (Sprug-         phological description of the tokens , produced
noli and Passarotti, 2020). This has primarily con-      automatically by applying the Pie Latin LASLA+
cerned Latin and Ancient Greek as essential media        model 0.0.6 (Manjavacas et al., 2019), fine-tuned
to access and understand the so-called Classical         on ca. 1,500,000 tokens taken from the LASLA cor-
heritage. In particular, several annotated corpora       pus (Clérice, 2021b), with very good results con-
of Latin texts are currently available in digital for-   cerning lemmatization and PoS tagging7 . How-
mat: they follow different guidelines and tagsets        ever, results appear to be less good on unknown
and feature different layers of linguistic annota-       tokens8 . This difference underlines the difficulty
tion. This section wants to provide a (far from ex-      of using automatic annotation tools on texts with
haustive) overview of such resources to show how         a very specialized language, surely not found in
the dataset presented in this paper stands with re-      LASLA , as is the case for Fibonacci’s Liber Ab-
spect to the state of the art.                           baci.
   The LASLA corpus contains 2,500,000 semi-                As for syntactically annotated corpora, five tree-
manually annotated tokens. It covers a large por-        banks are currently available for Latin. They
tion of the extant Classical Latin literature. It was    are the Index Thomisticus Treebank (IT- TB) (Pas-
started in 1961 by the LASLA research center at the      sarotti, 2019), the PROIEL treebank (Haug and
Université de Liège3 and is still being expanded4 .      Jøhndal, 2008; Eckhoff et al., 2018), the Latin
The corpus is considered to be a gold standard,          Dependency Treebank by the Perseus Digital Li-
since the annotation of every token has been man-        brary (part of the Ancient Greek and Latin Tree-
ually verified by a philologist. The linguistic in-      bank) (Bamman and Crane, 2007), the Late Latin
formation consists of lemmatization, morpholog-          Charter Treebank (LLCT) (Cecchini et al., 2020b)
ical tagging, and an additional syntactic layer for      and the UDante treebank (Cecchini et al., 2020a).
verbs (Verkerk et al., 2020). Texts cover various        The treebanks include texts of different genres (lit-
literary genres (theater, poetry, prose) and have a      erary, historical, philosophical and documentary)
chronological extension ranging from the come-           and periods (from Classical to Medieval), but tech-
dies of Plautus to the texts of Suetonius and Pliny      nical works are not represented.
the Younger. Recent additions reach later stages
of Latin literature 5 , but include neither Medieval     3   Dataset Creation and Analysis
nor Neo-Latin works. Natural sciences and tech-
                                                         The Liber Abbaci is made up of more than 270,000
nical works are weakly represented in the cor-
                                                         tokens and is divided into 15 chapters of varying
pus, the treatise De Agri Cultura ‘on agriculture’
                                                         length. The choice of starting our manual annota-
by Cato and the recently added Naturales Quaes-
                                                         tion from chapter VIII de reperiendis pretiis mer-
tiones ‘investigations about nature’ by Seneca be-
                                                         cium per maiorem guisam ‘on finding out the price
ing the only examples.
                                                         of goods through the “greater means”’ is due to the
    2
    https://lila-erc.eu
    3                                                       6
    http://web.philo.ulg.ac.be/lasla/                         For the full list, see https://github.com/
presentation-du-laboratoire/                             lascivaroma/latin-lemmatized-texts/tree/
  4
    See http://web.philo.ulg.ac.be/lasla/                0.1.2.
                                                            7
textes-latins-traites/.                                       For lemmatization, accuracy: 0.9734 . For PoS tagging,
  5                                                      accuracy: 0.9651 .
    Of which some are already available: see
                                                            8
http://web.philo.ulg.ac.be/lasla/                             For lemmatization, accuracy: 0.8716 . For PoS tagging,
textes-latins-en-cours-de-traitement/.                   accuracy: 0.9232 .
peculiarity of its content. Here, Fibonacci treats     always happen straightforwardly. Chapter VIII
many simple business negotiations using propor-        of the Liber Abbaci, as well as the work in its
tions and referring to many examples taken from        entirety, presents several typical features of Me-
the entire Mediterranean world. The examples           dieval Latin, both graphically (e. g. the monoph-
concern weight and monetary systems as well as         thongization ae → e and the spelling nichil instead
the main products bought and sold in the 13th cen-     of the Classical nihil ‘nothing’), morphologically
tury. This means that the text is rich of terminol-    (e. g the presence of analytical verb forms such as
ogy specific of the mathematical domain but also       the “perfect”, i. e. present perfective, subjunctive
of trade and commerce. Chapter VIII is made up of      habeat . . . honeratum, instead of the Classical on-
29,858 tokens (including punctuation marks), thus      erauerit, from onero ‘to load’) and syntactically
covering about 10% of the total length of the Liber    (e. g. the nearly exclusive use of quod ‘that’ to in-
Abbaci.                                                troduce declarative clauses, instead of accusative
                                                       and infinitive11 ). It is also worth noting the very
3.1   Data Annotation                                  limited use of enclitic particles (in the whole chap-
The manual annotation of chapter VIII is carried       ter VIII, Fibonacci uses the enclitic conjunction
out by a master’s degree student in Classical lan-     que ‘and’ only 3 times, appended to the auxiliary
guages, with excellent knowledge of Latin but          verb form erunt ‘they will be’) and the presence
without any previous expertise in either linguis-      of syntactic calques of vernacular constructions
tic annotation or computational linguistics. The       (e. g. secundum quod uadis multiplicando ‘accord-
overall effort of the work amounts to a total of       ing to what you are multiplying’, where uado is
227 hours, including: training sessions, study         preferred to the more Classical eo ‘to go’ and fur-
of the guidelines and of terminology related to        ther assumes an auxiliary function, and the use
measures, coins and trade in the Middle Ages           of the gerundive form multiplicando is an innova-
(Marcinkowski, 2003; Martinori, 1915), the actual      tion).
annotation, the reconciliation after evaluation of        But the main peculiarities of the text concern the
inter-annotator agreement (IAA, see Section 3.2),      lexicon. Chapter VIII presents indeed a rich set of
periodic checks with supervisors, the linking of       toponyms, units of measurement, names of coins
the annotated text to the LiLa KB (see Section         and Arabisms often not even reported by Medieval
5). We make use of a large number of dictio-           Latin dictionaries. This is the case, for example,
naries as references: the Oxford Latin Dictionary      of some names of places, such as Bugea, today’s
(OLD) (Souter, 1968), the Lexicon Totius Latini-       Biğāya/Bgayet in Algeria (a city where Fibonacci
tatis (Forcellini, 1965), the Dictionnaire illustré    spent a period of his childhood, learning the art
latin-français (herafter: Gaffiot) (Gaffiot, 2016)     of calculation), and Septis, today’s Ceuta/Sabta on
and the Thesaurus Linguae Latinae9 for Classical       the Strait of Gibraltar; or, among the numismatic
Latin, but also the Dictionary of Medieval Latin       terms, of bolsonalia, a word designating a certain
from British Sources (Latham and Howlett, 1975)        amount of broken silver or mixture coins which
and the Glossarium mediae et infimae latinitatis       were sold to goldsmiths because they were adul-
(du Cange et al., from 1883 to 1887) for Medieval      terated or out of date.
Latin. Tokenization and sentence splitting are per-
formed manually on a text editor, then lemmati-        3.2      Inter-Annotator Agreement
zation and PoS tagging are carried out on a shared     The IAA is calculated on 30 sentences (1,010 to-
spreadsheet following the Universal Dependencies       kens), with the participation of a second scholar
(UD) formalism (de Marneffe et al., 2021), in par-     with a background in Classical languages. We reg-
ticular both the universal and the language-specific   ister an almost perfect agreement with a Cohen’s
guidelines relative to the latest release of the UD    κ (Artstein and Poesio, 2008) of 0.97 for lemma-
treebanks (v 2.9)10 .                                  tization and 0.94 for PoS tagging.
   The implementation of the UD guidelines to             The comparison between the two annotations
the linguistic peculiarities of the text does not      highlights two main issues. The first concerns the
   9
     https://thesaurus.badw.de/                        choice of the UPOS (Universal Part Of Speech) tag
das-projekt.html                                       (de Marneffe et al., 2021, §2.2.2) for terms such as
  10
     https://universaldependencies.org/
                                                         11
guidelines.html                                               See for example (Traina and Bertotti, 2015, C. XVI) .
nam ‘certainly’ and enim ‘namely’, because differ-                  an exact historical moment; cf. (Ledgeway, 2012,
ent corpora and dictionaries adopt different con-                   §4.2.1).
ventions: e. g. nam is labeled as adverb in the
Lila KB and Df in the Latin PROIEL treebank,                        4   Comparing NLP Models
both possibly equivalent to UPOS ADV12 ; as S13 ,                   Table 1 reports accuracy scores computed on our
standing for conjonction de coordination (UPOS:                     gold standard processed with UDPipe using the
CCONJ) in the LASLA corpus, and more gener-                         UD v2.6 models for Latin (Straka and Straková,
ically conjonction (servant à confirmer/causale)                    2017). The scores clearly show that current mod-
(UPOS either CCONJ or SCONJ) in the Gaffiot;                        els are not good enough to process the Latin of
finally particle (not necessarily corresponding to                  Fibonacci. The best accuracy for lemmatization
UPOS PART) in the OLD , and similarly partic-
                                                                    is achieved by the model trained on the LLCT tree-
ule in one sense in the Gaffiot. The treatment of                   bank, which contains a set of Early Medieval char-
the etymologically related and functionally simi-                   ters written in Tuscany. However, this scores are
lar enim is mostly identical for all sources, only                  lower than state-of-the-art ones: the best partic-
with the Gaffiot reporting a sense as adverbe in-                   ipating system at the EvaLatin 2020 evaluation
stead of particule, followed by the LASLA cor-                      campaign achieves an accuracy of 96, 19% for
pus in using both labels S and M (generic for ad-                   lemmatization and 96, 74% for PoS tagging on
verbe), the latter though very marginally. These                    the corresponding test set (Sprugnoli et al., 2020),
terms have been discussed and finally assigned the                  i. e. about 33 and 15 points more than the results
UPOS PART, used in the latest Latin UD treebanks
                                                                    obtained on Fibonacci.
to label discoursive particles like these. Such diffi-
culties derive on one hand from the “volatile” and                                            Lemma       UPOS
diachronically variable nature of similar elements,                          EvaLatin2020      63.60     81.90
but on the other hand, and relatedly, to traditional                         IT- TB            65.58     77.14
grammars overlooking them and more generally                                 LLCT              68.81     82.79
skipping over pragmatic phenomena, in favour of                              Perseus           67.54     78.37
“more Classical” parts of speech (hence the fre-                             PROIEL            60.25     51.64
quent inclusion of nam, enim, etc. in the catchall
category of “adverbs”).                                             Table 1: Accuracy of UDPipe v2.6 Latin models
   The second issue is the UPOS to be used for                      tested on chapter VIII of the Liber Abbaci.
unus ‘one’. Fibonacci often uses unus to indicate a
generic entity, as is clearly visible when paralleled                  Taking into consideration lemmatization, the
by alter ‘other’. In this case, unus is tagged as DET               percentage of out-of-vocabulary lemmas, that is,
(determiner), like alter14 . In a number of other                   lemmas present in the text by Fibonacci but not
contexts, however, unus specifies the quantity of                   in the training texts of the models, is very high
a certain object. In such cases it is considered a                  (> 50% of lemma types). The majority of errors
NUM (numeral)15 . The difficulty here originates                    are registered for numbers and common nouns.
from a well known and complex linguistic change                     The first problem is due to the fact that some
that will eventually produce a clear indefinite arti-               models do not recognize Arabic numbers, because
cle from the numeral in Romance languages, but                      they have not seen them in their training data,
for which, being so gradual, we cannot pinpoint                     while others lemmatize them with a special “met-
                                                                    alemma” of the kind of num. arab., eschewing lex-
  12
      Cf. (Eckhoff et al., 2018, §5)                                ical forms. As for common nouns, most errors re-
  13
      With only very few exceptions when it is seen as part of      lated to lemmatization concern the lexical classes
a compound expression with tmesis, thus not receiving an au-
tonomous PoS; cf. Pl. Am. 2.1, 49-50: Quo id, malum, pacto          discussed in Section 3. For example, the tokens
potest nam (mecum argumentis puta) fieri, nunc uti tu et hic        libris and libre are often lemmatized as liber ‘free’
sis et domi?, interpreted as an instance of quonam ‘whither         (ADJ) instead of libra ‘pound’ (NOUN).
pray?’, itself receiving K meaning pronom interrogatif.
   14
      For instance, in the clause ita est pretium unius ad             Table 2 shows the F 1 score per UPOS tag. We
pretium alterius (VIII, 8) ‘so the price of the one [merchan-       observe that an F 1 above 70% is achieved by any
dise] is to the price of the other’.                                model only on 5 tags: ADP, NOUN, NUM, SCONJ
   15
      For instance, in the clause . . . que multiplica per summam
denariorum unius libre (VIII, 20) ‘which you have to multi-         and VERB. No model recognizes the SYM tag
ply by the amount of denarii of which one pound consists’.          (used for mathematical operators such as paren-
                                        EvaLatin2020         IT- TB       Perseus      PROIEL   LLCT
                          SYM               0.00              0.00         0.00         0.00    0.00
                          AUX               0.24              0.45         0.03         0.27    0.32
                          ADJ               0.48              0.37         0.40         0.28    0.55
                          PRON              0.57              0.38         0.40         0.52    0.93
                          PART              0.65              0.00         0.00         0.00    0.65
                          ADV               0.70              0.71         0.78         0.21    0.84
                          CCONJ             0.75              0.67         0.68         0.44    0.86
                          SCONJ             0.89              0.95         0.96         0.86    0.95
                          VERB              0.91              0.87         0.92         0.78    0.84
                          NOUN              0.92              0.83         0.86         0.75    0.88
                          DET               0.93              0.00         0.00         0.53    0.91
                          PROPN             0.94              0.09         0.00         0.32    0.57
                          NUM               0.95              0.96         0.96         0.75    0.99
                          ADP               0.99              0.98         0.93         0.91    0.88
                          Global            0.71              0.52         0.49         0.46    0.73

       Table 2: F 1 on UPOS tags of UDPipe v2.6 Latin models on chapter VIII of the Liber Abbaci.


theses), because it is not present in their respec-                   5        Linking and Querying in LiLa
tive training data. The same is true for the tag
PART in IT- TB (up until UD v2.8)16 , Perseus and                     The LiLa KB makes linguistic resources for Latin
PROIEL , and for the tag DET in Perseus. In old                       interoperable by linking tokens in corpora and en-
versions of the IT- TB, DET is limited to the proto-                  tries in dictionaries/lexica to a collection of canon-
article ly (8 occurrences), while in Perseus the                      ical forms for Latin called Lemma Bank (Pas-
tag PROPN appears only for the lemma Aefulanus                        sarotti et al., 2020). In order to connect the
(1 occurrence). The IT- TB-based model, too, reg-                     lemmas of chapter VIII to LiLa’s KB, a string
isters a near-zero F 1 score for PROPN: in the cor-                   match is first performed between the lemmas in
responding training data, this tag is used for a re-                  the texts and those in the KB, also taking into ac-
stricted (116 types of lemmas) set of terms mostly                    count their parts of speech. Using this strategy,
specific to the domains of philosophy and reli-                       88.8% of the lemmas are directly connected to
gion (e. g. Aristoteles, Maria), not present in our                   a single entry in the KB. The remaining uncon-
dataset. Low performances are registered also for                     nected lemmas fall into two possible categories:
the AUX tag, the annotation of which is not consis-                   ambiguous lemmas, that is, with possible con-
tent in training data: in Perseus, this tag is not used               nection to more than one entry in the KB; and
at all, while in EvaLatin 2020 it marks only the                      lemmas absent from the KB. More specifically,
auxiliaries in periphrastic passive (including depo-                  we find 44 ambiguous lemmas (corresponding to
nent) constructions, while in the other treebanks it                  631 tokens): for example, colligo can be con-
is applied also to verbal copulas, as per UD guide-                   nected to two entries: either a first-conjugation
lines. Further, the Liber Abbaci sees the rise (1                     verb colligare17 ‘to bind’, or a third-conjugation
occurrence) of habeo ‘to have’ as a possible auxil-                   verb colligĕre18 ‘to gather’. These cases are manu-
iary (cf. Section 3.1), unheard of in Classical Latin                 ally disambiguated, checking each context of use.
and only attested (albeit marginally) in LLCT.                        The remaining, not directly connected lemmas are
                                                                      not present in the KB and need to be manually
                                                                      added: these are mainly words denoting weight
                                                                      and monetary units (e. g. karatus ‘carat’), or dif-
   16
                                                                      ferent written representations of lemmas already
      Annotation discrepancies with respect to other Latin UD
treebanks for INTJ, NUM, PART, PRON and DET have been                 in LiLa (e. g. torscellus is a graphic variant of tor-
resolved in IT- TB in its last version (2.9), released in Novem-
ber 2021; however, the model adopted in this paper and cur-
                                                                          17
rently available in UDPipe is based on an older version of the                 https://lila-erc.eu/data/id/lemma/94854
                                                                          18
data.                                                                          https://lila-erc.eu/data/id/lemma/94855
cellus19 , a unit of length). Thanks to the linking,             coin minted in Constantinople24 . Finally, virgula
each lemma of our dataset becomes part of an in-                 (diminutive of virga, properly a ‘rod’, used by Fi-
teroperable ecosystem made of resources of differ-               bonacci in the same sense of virgula) primarily de-
ent kinds. We can thus query different interlinked               notes the bar between the numerator and denom-
resources using SPARQL and the LiLa endpoints20 .                inator of a fraction, but it can also designate the
For example, we can find the lemmas appearing                    fraction itself (Bocchi, 2004).
only in chapter VIII21 and not in the other texts that
are currently linked to the KB: the Summa Con-                   6        Conclusions and Future Work
tra Gentiles by Thomas Aquinas (from the Index                   This paper describes the annotation of one chap-
Thomisticus), those found in UDante (a corpus of                 ter of the Liber Abbaci by Fibonacci, and reports
5 works mostly by Dante Alighieri, or attributed to              on the linguistic peculiarities of this text and the
him, manually annotated following the UD formal-                 ensuing challenges.
ism), and the Querolus siue Aulularia (an anony-                     The results of existing UDPipe models in
mous comedy dating back to the 5th c. AD).                       lemmatization and tagging show low accuracy and
                                                                 F 1 scores when compared to the state of the art
          Lemma             Gloss             Freq.              for these tasks in the recent EvaLatin 2020 eval-
          rotulus       unit of weight         296               uation campaign. This, on the one hand, can be
           soldus       monetary unit          212               attributed to the characteristics of the genre of Fi-
          virgula      bar of a fraction       202               bonacci’s texts, which are representative of scien-
         byzantius      monetary unit           73               tific Medieval Latin texts, and on the other hand
          cantare       unit of weight          67               can be explained with the different choices in an-
                                                                 notation style of Latin treebanks released under
Table 3: The 5 most frequent distinctive lemmas
                                                                 the UD project. Substantial improvements can be
in chapter VIII of the Liber Abbaci.
                                                                 expected with models trained on new releases of
                                                                 Latin treebanks which have already undertaken the
    Table 3 shows the 5 most frequent distinctive,
                                                                 effort of resolving annotation discrepancies and of
i. e. exclusively found in the Liber Abbaci, lem-
                                                                 making the annotation style across treebanks more
mas retrieved using a SPARQL query22 . They are
                                                                 homogeneous. Further improvements will how-
all related to mathematics, coins and units of mea-
                                                                 ever require new annotated chapters and experi-
surement, confirming the specificity of the domain
                                                                 ments in domain adaptation, which are scheduled
of our dataset. In particular, rotulus and cantāre
                                                                 as future work.
are two units of weight, both deriving from Ara-
bic, respectively from rat.l (in turn, a metatheti-              Acknowledgments
cal adaptation of Greek λίτρα litra ‘pound’) and
qint.ār, which designates a weight of 100 ro-                   This work is a contribution to the Fibonacci 1202-
tuli23 . The term soldus, instead, indicates a                   2021 project, financed by the Tuscany Region.
unit of measurement used for monetary quantities.                Part of the work has been funded by the Eu-
Among the many currencies mentioned in chapter                   ropean Research Council (ERC) under the Euro-
VIII , Fibonacci often cites the byzantius, a golden
                                                                 pean Union’s Horizon 2020 research and innova-
                                                                 tion programme – Grant Agreement No. 769994.
  19
       https://lila-erc.eu/data/id/lemma/133810                  The authors want to thank: prof. Andrea Boc-
  20
      https://lila-erc.eu/sparql/                                chi, dott. Alessandro Gelsumini, prof. Pier Daniele
  21
      https://lila-erc.eu/data/corpora/                          Napolitani and prof. Enrica Salvatori for their lin-
CorpusFibonacci/id/corpus/Liber Abbaci
   22
      https://github.com/CIRCSE/                                 guistic and historical advice.
SPARQL-queries/blob/main/
distinctivelemmas-Fibonacci.rq
   23
      It should be noted that Fibonacci alternates a third-      References
declension cantāre (gen. sing. cantāris) with a second-
declension cantarium (gen. sing. cantarii).           During     Ron Artstein and Massimo Poesio. 2008. Inter-coder
lemmatization of the text, the various attested singular           Agreement for Computational Linguistics. Compu-
forms have been linked to their respective lemmas; the             tational Linguistics, 34(4):555–596.
nom./acc. plur. cantaria, which theoretically could derive
                                                                     24
both from cantāre and cantarium, has been linked to the              Also mentioned is the byzantius saracenatus, equivalent
lemma cantāre for simple reasons of probability, as it is the   to the hyperperus, that is, a byzantius with inscriptions in
most frequently used by Fibonacci among these two forms.         Kufic characters (Martinori, 1915).
David Bamman and Gregory Crane. 2007. The Latin            Félix Gaffiot. 2016. Dictionnaire Latin-Français. Ac-
  Dependency Treebank in a Cultural Heritage Digital         cessible at gaffiot.fr.
  Library. In Proceedings of the Workshop on Lan-
  guage Technology for Cultural Heritage Data (LaT-        Dag Trygve Truslew Haug and Marius Jøhndal. 2008.
  eCH 2007), pages 33–40, Prague, Czech Republic,            Creating a Parallel Treebank of the Old Indo-
  June. Association for Computational Linguistics.           European Bible Translations. In Proceedings of the
                                                             Second Workshop on Language Technology for Cul-
Andrea Bocchi. 2004. In Michelangelo Zaccarello              tural Heritage Data (LaTeCH 2008), pages 27–34.
  and Lorenzo Tomasin, editors, Storia della lingua e      Ronald Edward Latham and David R Howlett. 1975.
  filologia. Per Alfredo Stussi nel suo sessantacinques-     Dictionary of Medieval Latin from British Sources:
  imo compleanno, chapter Sì nel Livero de l’abbecho,        Fascicule V: IJKL. OUP Oxford.
  pages 121–158. SISMEL – Edizioni del Galluzzo,
  Florence, Italy.                                         Adam Ledgeway. 2012. From Latin to Romance, vol-
                                                             ume 1 of Oxford studies in historical and diachronic
Flavio M. Cecchini, Rachele Sprugnoli, Giovanni              linguistics. Oxford University Press, Oxford, UK.
   Moretti, and Marco Passarotti. 2020a. UDante:
   First Steps Towards the Universal Dependencies          Enrique Manjavacas, Ákos Kádár, and Mike Keste-
   Treebank of Dante’s Latin Works. In Seventh Ital-         mont. 2019. Improving lemmatization of non-
   ian Conference on Computational Linguistics, pages        standard languages with joint learning. In Proceed-
   1–7, Bologna. CEUR-WS.org.                                ings of the 2019 Conference of the North American
                                                             Chapter of the Association for Computational Lin-
Flavio Massimiliano Cecchini, Timo Korkiakangas,             guistics: Human Language Technologies, Volume 1
   and Marco Passarotti. 2020b. A New Latin Tree-            (Long and Short Papers), pages 1493–1503, Min-
   bank for Universal Dependencies: Charters be-             neapolis, Minnesota, June. Association for Compu-
   tween Ancient Latin and Romance Languages. In             tational Linguistics.
   Proceedings of the 12th Language Resources and
   Evaluation Conference, pages 933–942, Marseille,        Hinz Marcinkowski. 2003. Measures and Weights in
   France, May. European Language Resources Asso-            the Islamic World, an English Translation of Walther
   ciation.                                                  Hinz’s Handbook Islamische Maße und Gewichte.
                                                             International Islamic University Malaysia (IIUM).
Thibault Clérice.     2021a.   lascivaroma/latin-          Edoardo Martinori. 1915. La Moneta: vocabolario
  lemmatized-texts:    0.1.2 - HN PSL, May.                  generale. Instituto italiano di numismatica.
  DOI :             10.5281/zenodo.4661034;
  project online at https://github.com/                    Marco Passarotti, Francesco Mambrini, Greta Franzini,
  lascivaroma/latin-lemmatized-texts.                       Flavio Massimiliano Cecchini, Eleonora Litta, Gio-
                                                            vanni Moretti, Paolo Ruffolo, and Rachele Sprug-
Thibault Clérice. 2021b. Latin Lasla Model, Apr.            noli. 2020. Interlinking through lemmas. the lexi-
  DOI : 10.5281/zenodo.4661034.                             cal collection of the lila knowledge base of linguis-
                                                            tic resources for latin. Studi e Saggi Linguistici,
Marie-Catherine de Marneffe, Christopher D. Man-            58(1):177–212.
 ning, Joakim Nivre, and Daniel Zeman. 2021. Uni-
 versal Dependencies. Computational Linguistics,           Marco Passarotti, 2019. volume 10 of Age of Ac-
 47(2):255–308, 07.                                         cess? Grundfragen der Informationsgesellschaft,
                                                            chapter The Project of the Index Thomisticus Tree-
Charles du Fresne sieur du Cange, bénédictins de la         bank, pages 299–320. De Gruyter Saur, Berlin, Ger-
  congrégation de Saint-Maur, d. Pierre Carpentier,         many; Boston, MA, USA.
  Johann Christoph Adelung, G. A. Louis Henschel,
  Lorenz Diefenbach, and Léopold Favre. from 1883          Alexander Souter. 1968.     Oxford Latin dictionary:
  to 1887. Glossarium mediae et infimae latinitatis.         OLD. Clarendon Press.
  Favre, Niort, France.                                    Rachele Sprugnoli and Marco Passarotti. 2020. Pro-
                                                             ceedings of LT4HALA 2020-1st Workshop on Lan-
Hanne Martine Eckhoff, Kristin Bech, Gerlof Bouma,           guage Technologies for Historical and Ancient Lan-
  Kristine Eide, Dag Haug, Odd Einar Haugen, and             guages. In Proceedings of LT4HALA 2020-1st
  Marius Jøhndal. 2018. The PROIEL treebank                  Workshop on Language Technologies for Historical
  family: a standard for early attestations of Indo-         and Ancient Languages.
  European languages. Language Resources and
  Evaluation, 52(1):29–65.                                 Rachele Sprugnoli, Marco Passarotti, Flavio Massi-
                                                             miliano Cecchini, and Matteo Pellegrini. 2020.
Leonardus Bigollus Pisanus vulgo Fibonacci. 2020.            Overview of the EvaLatin 2020 evaluation cam-
  Liber Abbaci, volume 79 of Biblioteca di «Nun-             paign. In Proceedings of LT4HALA 2020 - 1st
  cius». Leo S. Olschki, Florence, Italy.                    Workshop on Language Technologies for Historical
                                                             and Ancient Languages, pages 105–110, Marseille,
Egidio Forcellini. 1965. Lexicon totius latinitatis. Ar-     France, May. European Language Resources Asso-
  naldo Forni, Bologna, Italy.                               ciation (ELRA).
Milan Straka and Jana Straková. 2017. Tokenizing,
  POS tagging, lemmatizing and parsing UD 2.0 with
  UDPipe. In Proceedings of the CoNLL 2017 Shared
  Task: Multilingual Parsing from Raw Text to Univer-
  sal Dependencies, pages 88–99, Vancouver, Canada,
  August. Association for Computational Linguistics.
Alfonso Traina and Tullio Bertotti. 2015. Sintassi nor-
  mativa della lingua latina. Pàtron, Bologna, Italy.
Philippe Verkerk, Yves Ouvrard, Margherita Fan-
  toli, and Dominique Longrée. 2020. L.A.S.L.A.
  and Collatinus: a convergence in lexica. SSL,
  1(LVIII):95–120.

</pre>