Enhancing the Latin Morphological Analyser LEMLAT
                         with a Medieval Latin Glossary
          Flavio Cecchini*, Marco Passarotti*, Paolo Ruffolo*, Marinella Testori*,
Lia Draettao , Martina Fieromonteo , Annarita Lianoo , Costanza Marinio , Giovanni Piantanidao
                   *Università Cattolica del Sacro Cuore - o Università di Pavia
       *Largo Gemelli 1, 20123 Milan, Italy - o Corso Strada Nuova 65, 27100 Pavia, Italy
                {flavio.cecchini, marco.passarotti}@unicatt.it


                      Abstract                           by Ponti and Passarotti (2016), who show that the
                                                         performance of a dependency parser trained on
      English. We present the process of ex-             Medieval Latin data drops dramatically when the
      panding the lexical basis of the Latin mor-        same trained model is applied to texts from the
      phological analyser LEMLAT with the en-            Classical era.
      tries from the Medieval Latin glossary Du             This issue affects all layers of linguistic annota-
      Cange. This process is performed semi-             tion, including fundamental ones, like lemmatisa-
      automatically by exploiting the morpho-            tion and morphological analysis. Today, a hand-
      logical properties of lemmas, a previously         ful of morphological analysers are available for
      available word list enhanced with inflec-          Latin, chiefly Words,1 LEMLAT 3.0,2 Morpheus3
      tional information, and the contents of the        –reimplemented in 2013 as Parsley4 –, the PROIEL
      lexical entries of Du Cange.                       Latin morphology system5 and LatMor.6
      Italiano.     L’articolo descrive il pro-             Although LEMLAT, together with LatMor,7 has
      cesso di ampliamento della base lessicale          proved to be the best performing morphological
      dell’analizzatore morfologico per il latino        analyser for Latin and the one boasting the largest
      LEMLAT con il glossario di latino me-              lexical basis, its lexical coverage is still limited
      dievale Du Cange. Il processo è realiz-           to Classical and Late Latin only. First released
      zato semiautomaticamente ricorrendo ad             as a morphological lemmatiser at the end of the
      alcune proprietà morfologiche dei lemmi,          1980s at ILC - CNR in Pisa (Bozzi and Cappelli,
      a un lemmario completo d’informazione              1990; Marinone, 1990, v 1.0), where it was en-
      flessionale e ai contenuti delle entrate           hanced with morphological features between 2002
      lessicali del Du Cange.                            and 2005 (Passarotti, 2004, v 2.0), LEMLAT re-
                                                         lies on a lexical basis resulting from the collation
                                                         of three Latin dictionaries (Georges and Georges,
  1   Introduction                                       1913 1918; Glare, 1982; Gradenwitz, 1904) for
  Latin raises particular challenges for Natural Lan-    a total of 40 014 lexical entries and 43 432 lem-
  guage Processing (NLP). Given that accuracy rates      mas, as more than one lemma can be included
  of stochastic NLP tools heavily depend on the          in one lexical entry. This lexical basis was fur-
  training set on which their models are built, this     ther enlarged in version 3.0 of LEMLAT by semi-
  becomes a particularly problematic issue when          automatically adding most of the Onomasticon
  Latin is concerned, because Latin texts show an        (26 415 lemmas out of 28 178) provided by the 5th
  enormous linguistic variety resulting from (a) a       edition of the Forcellini dictionary (Budassi and
  wide time span (covering more than two millen-            1
                                                               http://archives.nd.edu/words.html
  nia), (b) a large set of genres (ranging from liter-      2
                                                               www.lemlat3.eu. Binaries and database available at
  ary to philosophical, historical and documentary       https://github.com/CIRCSE/LEMLAT3.
                                                             3
                                                               https://github.com/tmallon/morpheus
  texts) and (c) a big diatopic diversity (spread all        4
                                                               https://github.com/goldibex/
  over Europe and beyond).                               parsley-core
                                                             5
     Such complexity impacts NLP to the point that             https://github.com/mlj/proiel-webapp/
                                                         tree/master/lib/morphology
  building NLP tools claiming to be suitable for all         6
                                                               http://cistern.cis.lmu.de
  Latin varieties is an unrealistic task. One practi-        7
                                                               For an evaluation of morphological analysers for Latin
  cal example comes from an experiment described         see (Springmann et al., 2016).
Passarotti, 2016).                                        ern sense of the word, but a glossary, i. e. a mere
   In order to equip LEMLAT to process Latin texts        collection of words where information about parts
beyond the Classical period, we recently enhanced         of speech (PoS) and inflectional categories is al-
its lexical basis with the lexical entries from a large   most absent, and therefore has to be deduced or
reference glossary for Medieval Latin, namely the         reconstructed before an entry can be included in
Glossarium Mediae et Infimae Latinitatis by Du            LEMLAT .8 In addition, lemmatisation criteria are
Cange et alii (1883 1887, hereafter DC). This pa-         often inconsistent, even for words belonging to
per details the process performed to include DC in        the same class (e. g. verbs are cited either by their
LEMLAT ’s lexical basis.                                  present active infinitive or by their first person sin-
                                                          gular present indicative).
2   Word Form Analysis in LEMLAT                             This is partly due to the fact that five different
LEMLAT is a lemmatiser and morphological anal-
                                                          authors contributed to the glossary over a period of
yser of types (i. e. no contextual disambiguation         two centuries (Géraud, 1839), not always coher-
is performed). Given a word form in input (e. g.          ently with respect to their predecessors. Nonethe-
coniugae), LEMLAT’s output produces the cor-              less, it is possible to distinguish some recurring
responding lemma(s) (e. g. coniuga ‘wife’) and            patterns, which can be exploited to automatically
a number of tags conveying (a) the inflectional           include in LEMLAT as many of the 85 999 lemmas
paradigm of the lemma(s) (e. g. first declension          in DC as possible, or at least to expedite the man-
noun) and (b) the morphological features of the in-       ual recording of lexical entries.
put word form (e. g. feminine singular genitive and       3.1   Suffixes and Bon’s Word List
dative; feminine plural nominative and vocative).
   LEMLAT makes use of a database that includes           The preliminary step to extend LEMLAT with DC
multiple tables recording the different formative         consists in selecting a set of derivational suffixes
elements (segments) of word forms. The core ta-           that are morphologically-unambiguous in terms of
ble is the lexical look-up table, whose basic com-        PoS and inflectional category, and hence the set
ponent is the so-called LES (LExical Segment).            of all lemmas displaying these suffixes. These
The LES is defined as the invariable part of the in-      lemmas require no further analysis for entry in
flected form (e. g. coniug for coniug-ae). In other       LEMLAT . Examples are -itas for feminine im-
words, the LES is the string (or one of the strings)      parysillabic third declension nouns, or -icum for
of characters that remains the same in the inflec-        neuter second declension nouns. On the contrary,
tional paradigm of a lemma; hence, the LES does           suffixes like, e. g. -anus or -atus are considered
not necessarily correspond to either the word stem        morphologically-ambiguous, as they can belong
or the root.                                              to different PoS (adjective or noun) and/or differ-
   LEMLAT includes a LES archive, in which LES            ent inflectional categories (first or fourth declen-
are assigned an ID and a number of inflectional           sion). In these cases the corresponding lemmas
features, among which a tag for the gender of the         require manual annotation (see Section 3.2). Ap-
lemma (for nouns only) and a code (called CO -            proximately 30 000 DC lemmas are retrieved and
DLES) for its inflectional category. According to         added to LEMLAT in this way.
the CODLES, the LES is compatible with the end-              To extend the automatic acquisition of DC’s
ings (called SF, “Final Segment”) of its inflectional     lemmas, we also take advantage of a list of 71 908
paradigm, which are collected in a separate table         Latin lemmas collected by Bruno Bon from vari-
in the LEMLAT database. For example, the CO -             ous lexicographic sources and corpora.9 This list
DLES for the LES coniug is N 1 (first declension          supplies information about inflectional morphol-
nouns) and its gender is F (feminine). The word           ogy.10 Of these lemmas, 22 628 are found among
form coniugae is thus analysed as belonging to the            8
                                                                For this work, we use the digital version of DC pro-
LES coniug, the segment ae being recognised as an         vided by the École nationale des chartes (Paris). Source
ending compatible with a LES with CODLES N 1.             data are available in XML format at http://svn.code.
                                                          sf.net/p/ducange/code/xml/. The glossary can be
                                                          accessed online at http://ducange.enc.sorbonne.
3   Adding the Du Cange Glossary                          fr/.
                                                              9
                                                                Available at http://glossaria.eu/outils/
Adding DC to LEMLAT is a challenging task                 lemmatisation/ and presented in (Bon, 2011).
                                                             10
mostly because DC is not a dictionary in the mod-               Specifically: PoS; genitive endings of nouns; nominative
those in DC that are not analysed in the prelimi-                  a set of very common lexicographical annotations
nary step; and out of these, 21 805 showing a one-                 and abbreviations (e. g. Italus or Ital., f. = fortasse,
to-one correspondence with lemmas in Bon’s list                    lib., cap.).
are added to LEMLAT with no further check.11                          With regard to quotations, we only consider
                                                                   the first one as the most significant. Given the
3.2    Definitions and Quotations
                                                                   lemma’s citation form in DC, we exploit the list of
Each lexical entry in DC comprises (a) the name                    all Latin endings and their agreements with inflec-
of the lemma, (b) usually, a short definition and                  tional categories available in LEMLAT’s database
(c) possibly one or more quotations (taken from                    to construct all of its a priori possible inflec-
explicitly-cited textual sources), where most of the               tional paradigms; of these (partly artificial) forms,
times a form of the lexical entry is capitalised. By               we retain only those that allow us to unambigu-
making use of all these elements, we automatically                 ously discriminate a PoS and/or an inflectional
assign a PoS and an inflectional category (i. e. a                 category from the others. For example, the en-
CODLES , in LEMLAT ’s terms) to the lemma.                         try for mansaticus ‘mansion, house’ illustrates this
   In particular, to assess the PoS of a lemma we                  method:
follow a principle of “lexical osmosis”, that is,
we assume that a lemma’s definition core (see be-                        MANSATICUS,     Mansio, domus. An-
low) will most probably use terms belonging to the                      nal. Bertin. ad ann. 874. tom. 7. Collect.
same PoS of that lemma. By cross-checking this                          Histor. Franc. pag. 118 : Inde per At-
information with the citation form of the lemma                         tiniacum et consuetos Mansaticos Com-
and possibly with its inflected forms in a quota-                       pendium adiit [. . . ]
tion, we are able to assign it also its inflectional
category.                                                          Since the definition’s core mansio can only be
   With regard to the definition, we take into con-                a noun for LEMLAT, we can conclude that
sideration only its initial part, maximally up to the              mansaticus is almost surely a noun too, even if
first quotation; what comes after are mostly more                  the -icus ending tends to be associated with de-
in-depth discussions of the term, secondary inter-                 nominal adjectives in Latin. The -us ending tells
pretations or later interpolations. More precisely,                us that mansaticus can be either a masculine sec-
we focus on the definition’s core, i. e. a short cap-              ond or fourth declension noun;12 a first class ad-
italised phrase, enclosed in commas and/or end-                    jective might theoretically be possible, but is ruled
ing with a full-stop, providing a short explanation                out by the definition’s core mansio. The second
or paraphrase of the lemma immediately after the                   declension is confirmed by the ending -os found
lemma itself. Its terms are lemmas in typical quo-                 in the quotation, thus excluding the fourth declen-
tation form, e. g. the nominative case for nouns.                  sion (which should yield -us).
Moreover, the definition’s core makes use of a                        Thanks to this process, more than 10 000 addi-
standardised and Classical variety of Latin lexicon                tional lemmas are automatically included in LEM -
so as to be as clear as possible to the reader. This               LAT . This process is applied very carefully, cover-

means that most of the terms in a definition’s core                ing only decidedly unambiguous cases, i. e. when
can also be found in the list of lemmas of LEM -                   content words in the definition’s core are found to
LAT 3.0. Of the recognised forms, we retain only                   belong to only one PoS or to a phrase of a fixed
those that are univocally assigned only one PoS.                   type (e. g. a phrase ending with an infinitive as-
We ignore a small set of both function and con-                    signs PoS verb to the lemma) and when the inflec-
tent words often recurring in definitions (e. g. pro               tional category of the word form possibly found
‘for’ and omnis ‘all, every’), and discard as noise                in the quotation can be univocally discriminated.
                                                                   This leads to high precision (1.0), but affects re-
endings of adjectives; infinitive endings of regular verbs and
full paradigms of irregular verbs.                                 call (0.18). For the remaining cases we have to re-
   11
      The remaining lemmas are manually-checked because            sort to manual annotation; this happens most fre-
they correspond to multiple entries in one and/or the other        quently when we correctly identify the PoS and
source. For example, the lemma fedus appears once in DC (as
a masculine second declension noun, ‘fief’) but three times        the inflectional category of a lemma, but cannot
in Bon’s list: as a masculine second declension noun (but          infer its gender a priori. For instance, approxi-
with the different meaning ‘goat’), as a neuter third declen-
                                                                      12
sion noun (with the genitive federis, ‘alliance’) and as a first         Feminines are so rare in these declensions that we ex-
class adjective (‘hideous’).                                       clude them from the automated analysis.
mately 10% of first declension nouns are found to               clension), showing a trend towards more transpar-
be masculine, and not feminine as expected.                     ent lexical items. While similar figures can be ob-
                                                                served for verbs, in DC we notice a reduced pres-
4        Discussion                                             ence of adjectives (12% against LEMLAT’s 25%),
                                                                revealing that they represent a less diachronically-
Not all of the 85 999 lemmas of DC are included                 productive PoS than nouns and verbs.
in LEMLAT. We exclude the entries of some 3 000
fixed or idiomatic multi-word expressions and of                5        Evaluation
around 300 adverbs derived either from an adjec-
tive (e. g. affectuose ‘tenderly’ from affectuosus              As conducted for the previous major update of
‘tender’) or from a verb (e. g. attendenter ‘watch-             LEMLAT (Passarotti et al., 2017), we evaluate
fully’ from attendere ‘to keep, to watch’) in the               LEMLAT ’s coverage of the Latin lexicon against
lexical basis of the DC-enhanced LEMLAT. This is                the Thesaurus formarum totius latinitatis (TFTL)
because LEMLAT considers derived adverbs as part                by Tombeur (1998), in order to assess the impact
of the inflectional paradigm of the source adjective            of LEMLAT’s acquisition of DC. A primary refer-
or verb.                                                        ence for the study of the Latin lexicon, TFTL is a
    At the end of the process, 82 556 DC lemmas are             comprehensive diachronic collection of all Latin
added to LEMLAT. Since DC shows a tendency to                   word forms as they occur in texts from the archaic
treat different nuances of the same lemma as dis-               period up to the Second Vatican Council (20th
tinct entries, the total number of DC distinct lem-             century), listing their respective frequencies in the
mas inserted in LEMLAT is 73 131. The lemmas                    sources from different eras.14
with the highest number of separate entries are                    Passarotti et alii (2017) report a coverage
forma ‘form’ (17), scala ‘stairs, staircase, ladder’            of 72.254% of TFTL’s forms, corresponding to
(15) and status ‘mode, state, position, size’ (15).             98.345% of the 62 922 781 total occurrences in
These are all already attested in Classical Latin,              the source texts.15       This is partly explained
but are also recorded in DC because of their seman-             by the fact that many forms in TFTL are ei-
tic change over time.13 This happens frequently;                ther extremely rare, include punctuation in their
there are, in fact, 10 168 shared lemmas (corre-                spelling, or are merely sequences of numbers,
sponding to 14 469 entries in DC) in LEMLAT 3.0                 letters and punctuation marks. When we add
and DC, with respect to the name of the lemma, its              DC to LEMLAT , our coverage of TFTL raises
PoS and inflectional category (and gender, when                 by 3.264% to 75.518%, corresponding to 17 224
applicable). Additionally, 1 820 lemmas share the               newly-recognised forms, whereas the covered oc-
same quotation form in both sources (often inci-                currences increase to 98.665%.
dentally), despite being morphologically different.                We also perform a coverage evaluation over
An example is amo: in DC, it is the third declen-               three Medieval Latin texts of comparable size,
sion noun amo, amonis, a variant of ammo, ammo-                 available from ALIM, the Archive of Italian Me-
nis (a unit of measure for wine), while in LEMLAT               dieval Latinity (Ferrarini, 2017).16 The texts be-
it is the verb amare ‘to love’.                                 long to three different periods and genres; these
    The remaining 66 267 lemmas are to be consid-               are: the Codex diplomaticus Cavensis I (doc-
ered lexical innovations of “media et infima La-                uments 33-210), a collection of documentary
tinitas”. Looking at these Medieval lemmas, we                  sources from Southern Italy dating to the 9th cen-
notice some tendencies in the distribution of PoS               tury; the Historia Mongalorum, a 13th century
and inflectional categories. Whereas nouns are the              report of a journey and diplomatic mission; and
prevalent PoS both in LEMLAT and DC (albeit at                  the De falso credita et ementita Constantini dona-
very different rates, respectively 52% and 75%),                tione, a philological treatise dating back to the end
in the former the most attested declension is the               of the 15th century.
third (37% of nouns), while in the latter it is the                 14
                                                                    Archaic Latin (up to IInd c. AD), Patristic Latin (IInd c.
first and second declensions that dominate (34%                 AD – AD 735), Medieval Latin ( AD 736 – AD 1499) and Mod-
and 39% of nouns, against 20% of the third de-                  ern Latin (AD 1500 – AD 1965), respectively.
                                                                   15
                                                                      The statistics in this paper are based on updated,
    13                                                          marginally corrected statistics with respect to those presented
     Indeed, DC does not at all record lemmas already avail-
able in Classical Latin, unless they show a different meaning   in Passarotti et alii (2017).
                                                                   16
and/or morphology.                                                    http://it.alim.unisi.it/
    Work (century)                   Tokens     Types    LEMLAT      LEMLAT + DC           Only DC
    Codex dipl. Cavensis (IX)        19428      3262     54.1%            59.2%          166 (5.1%)
    Historia Mongalorum (XIII)       20360      4649     90.3%            92.2%          87 (1.9%)
    De Constantini donatione (XV)    19805      6514     93.9%            94.8%           56 (0.9%)

Table 1: Comparison of the lexical coverage of DC-enhanced LEMLAT of three Medieval texts. The
“Only DC” column lists the number of terms to be found exclusively in the added DC vocabulary.


   Table 1 shows the improvements in lexical cov-        tagging of the glossary’s definitions and quota-
erage obtained thanks to the enhancement of LEM -        tions. Indeed, unless tuned on an in-domain train-
LAT through DC . The results are in line with those      ing set, existing stochastic PoS-taggers for Latin
for TFTL. Remarkably, the highest increase in per-       are not yet reliable enough when it comes to pro-
formance is recorded for the least-standardised of       cessing the complex, raw and “freestyle” defini-
the three texts, the Codex diplomaticus, which re-       tions of DC.
mains the most demanding for LEMLAT to analyse.
                                                            The ever-growing availability of digitised Latin
This can be explained by the large presence of lo-
                                                         texts from various eras urges us to build NLP tools
cal names of people and places (e. g. Sichelpertus,
                                                         capable of automatically analysing such varied
Eboli), and especially by the very frequent devia-
                                                         sets of linguistic data. In this respect, enhancing
tions from the orthographic standard (e. g. abentes
                                                         the lexical basis of LEMLAT with a Medieval Latin
for habentes ’having (pl.)’, ecclesie for ecclesiae
                                                         dictionary is a first step towards the development
’of/to the church; churches’); the latter are also
                                                         of well-performing tools on diachronic data. Con-
the source of many false positives, which LEMLAT
                                                         versely, even if building a tool suitable for differ-
does not discriminate from true positives. Names
                                                         ent diachronic varieties of Latin were feasible for
are challenging, too, as can be observed, for exam-
                                                         low-level annotation tasks (like e. g. lemmatisation
ple, from the fact that among the 363 unrecognised
                                                         and morphological analysis), this does not seem
forms in the Historia Mongalorum, the majority
                                                         to be the case for tasks such as syntactic parsing
are ethnonyms, toponyms and anthroponyms (e. g.
                                                         or word sense disambiguation, for which either
Caracoron ‘Karakorum’, circassos ‘Circassians’,
                                                         highly flexible or highly specialised tools will be
Mengu ‘Möngkh’).
                                                         needed.
   At the same time, LEMLAT is now able to anal-
yse words which, while absent from the vocabu-              This is an open issue not only for Latin. Indeed,
lary of Classical Latin, are tied to key, widespread     the portability of NLP tools across domains and
concepts in the Middle Ages. For example, in             genres is currently one of the main challenges in
the Historia Mongalorum the enhanced LEMLAT              NLP . Thanks to its highly diverse corpus, Latin is
can now detect terms like orda ‘horde’ (11 occur-        a perfect case-study language to tackle these prob-
rences) or protonotarius ‘prothonotary’ (4 occur-        lems.
rences), both important in the 13th century on-             For the future, we plan to expand LEMLAT’s
ward in the context of conflicts and diplomatic          lexical database with all of the graphical variants
missions between Western Europe and the Mongol           reported in DC and possibly also with other Me-
Empire. Interestingly, the source for these lemmas       dieval Latin thesauri, such as the Dictionary of
in DC is not the Historia Mongalorum itself, which       Medieval Latin from British Sources (Ashdown
is an indication of the effective circulation of such    et al., 2018), so as to improve both its diatopic
words.                                                   and diachronic coverage. In general, we aspire to
                                                         make LEMLAT’s algorithm better able to cope with
6     Conclusion                                         the most widespread and predictable orthographic
                                                         variations recorded in Medieval manuscripts and
In this paper we present the rule-based pro-             texts.17
cess performed to semi-automatically enhance the
Latin morphological analyser LEMLAT with the
Du Cange glossary. While dated, such an ap-
proach is still necessary if the intent is to minimise     17
                                                              An introduction and an approach to this issue can be
the error rate resulting from the automatic PoS-         found in Kestemont and De Gussem (2017).
References                                                 Marco Passarotti, Marco Budassi, Eleonora Litta, and
                                                            Paolo Ruffolo. 2017. The 3.0 Package for Mor-
Richard K Ashdown, David R Howlett, and Ronald E            phological Analysis of Latin. In Proceedings of the
  Latham, editors. 2018. Dictionary of Medieval             NoDaLiDa 2017 Workshop on Processing Histori-
  Latin from British Sources. Oxford University Press       cal Language, pages 24–31, Gothenburg, Sweden.
  for the British Academy, Oxford, UK.                      Northern European association for language tech-
                                                            nology (NEALT), Linköping University Electronic
Bruno Bon. 2011. OMNIA : outils et méthodes                Press.
  numériques pour l’interrogation et l’analyse des
  textes médiolatins (3).      Bulletin du centre         Marco Passarotti. 2004. Development and perspec-
  d’études médiévales d’Auxerre BUCEMA, (15).            tives of the Latin morphological analyser LEMLAT.
  Online at http://journals.openedition.                    Linguistica computazionale, XX-XXI:397–414.
  org/cem/12015.
                                                           Edoardo Maria Ponti and Marco Passarotti. 2016. Dif-
Andrea Bozzi and Giuseppe Cappelli. 1990. A project          ferentia compositionem facit. a slower-paced and re-
  for Latin lexicography: 2. A Latin morphological           liable parser for Latin. In Proceedings of the tenth
  analyzer. Computers and the Humanities, 24(5-              international Conference on Language Resources
  6):421–426.                                                and Evaluation (LREC ’16), pages 683–688, Por-
                                                             torož, Slovenia. European Language Resources As-
Marco Budassi and Marco Passarotti. 2016. Nomen              sociation (ELRA).
 omen. Enhancing the Latin morphological analyser
 Lemlat with an onomasticon. In Proceedings of the         Uwe Springmann, Helmut Schmid, and Dietmar Na-
 10th SIGHUM Workshop on Language Technology                 jock. 2016. LatMor: A Latin finite-state mor-
 for Cultural Heritage, Social Sciences, and Human-          phology encoding vowel quantity. Open Linguis-
 ities (LaTeCH), pages 90–94, Berlin, Germany. As-           tics - Topical Issue on Treebanking and Ancient
 sociation for Computational Linguistics.                    Languages: Current and Prospective Research,
                                                             2(1):386–392.
Charles du Fresne du Cange, Bénédictins de Saint-
                                                           Paul Tombeur. 1998. Thesaurus formarum totius La-
  Maur, Pierre Carpentier, Louis Henschel, and
                                                             tinitatis: a Plauto usque ad saeculum XXum. Bre-
  Léopold Favre. 1883–1887. Glossarium mediae et
                                                             pols, Turnhout, Belgium.
  infimae latinitatis. Niortm France.

Edoardo Ferrarini.    2017.       ALIM ieri e
  oggi.  Umanistica Digitale, 1(1).   Online at
  https://umanisticadigitale.unibo.
  it/article/view/7193.

Karl Ernst Georges and Heinrich Georges. 1913–
  1918.        Ausführliches lateinisch-deutsches
  Handwörterbuch. Hahn, Hannover, Germany.

Hercule Géraud. 1839. Historique du glossaire de la
  basse latinité de Du Cange. Bibliothèque de l’École
  Nationale des Chartes, 1:498–510.

Peter GW Glare. 1982. Oxford Latin dictionary.
  Clarendon Press. Oxford University Press, Oxford,
  UK.

Otto Gradenwitz. 1904. Laterculi Vocum Latinarum:
  voces Latinas et a fronte et a tergo ordinandas.
  Hirzel, Leipzig, Germany.

Mike Kestemont and Jeroen De Gussem. 2017. In-
  tegrated Sequence Tagging for Medieval Latin Us-
  ing Deep Representation Learning. Journal of
  Data Mining & Digital Humanities, Special Issue
  on Computer-Aided Processing of Intertextuality in
  Ancient Languages, August. Online at https:
  //jdmdh.episciences.org/3835.

Nino Marinone. 1990. A project for Latin lexicog-
  raphy: 1. Automatic lemmatization and word-list.
  Computers and the Humanities, 24(5-6):417–420.