=Paper= {{Paper |id=Vol-3033/paper27 |storemode=property |title=Linking the Lewis & Short Dictionary to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin |pdfUrl=https://ceur-ws.org/Vol-3033/paper27.pdf |volume=Vol-3033 |authors=Francesco Mambrini,Eleonora Litta,Marco Passarotti,Paolo Ruffolo |dblpUrl=https://dblp.org/rec/conf/clic-it/MambriniLPR21 }} ==Linking the Lewis & Short Dictionary to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin== https://ceur-ws.org/Vol-3033/paper27.pdf
                             Linking the Lewis & Short Dictionary
                                  to the LiLa Knowledge Base
                        of Interoperable Linguistic Resources for Latin
          Francesco Mambrini, Eleonora Litta, Marco Passarotti, Paolo Ruffolo
                              CIRCSE Research Centre
                        Università Cattolica del Sacro Cuore
                        Largo Gemelli, 1 - 20123 Milan, Italy
    francesco.mambrini@unicatt.it, eleonoramaria.litta@unicatt.it,
         marco.passarotti@unicatt.it, paolo.ruffolo@posteo.eu
                         Abstract                                  treebanks2 and lexica3 . These digital resources
                                                                   join the large set of textual and lexical resources
     This paper describes the steps taken to                       that were created over the centuries for Latin: tex-
     include data from the Lewis & Short                           tual collections, thesauri, lexica, glossaries and
     bilingual Latin-English dictionary into the                   mono/bilingual dictionaries. Among the latter,
     Knowledge Base of linguistic resources                        we could mention, for instance, the Oxford Latin
     for Latin LiLa. First, data were extracted                    Dictionary (Glare, 1968), the Dictionary of me-
     from the original XML and matched with                        dieval Latin from British sources (Ashdowne et al.,
     entries in LiLa, overcoming ambigui-                          1975), the Forcellini lexicon (Forcellini and Fac-
     ties and structural inconsistencies in the                    ciolati, 1871) and the still under construction The-
     source. Subsequently, senses were mod-                        saurus Linguae Latinae (Ehlers, 1968), many of
     elled using the Ontolex Lemon Lexico-                         which are today accessible also in digital format.
     graphic module (lexicog), so that they                           However, the impact of these digital resources
     could be included in the LiLa Knowledge                       on the everyday work of classicists is still limited.
     Base and thus made interoperable with the                     On the one side, this is due to the still existing di-
     (meta)data of the linguistic resources for                    visive dichotomy between “traditional” Humani-
     Latin therein interlinked.                                    ties and computational approaches. On the other,
                                                                   it is a matter of fact that classicists are not yet
1    Introduction                                                  put in the best condition to fully exploit all avail-
                                                                   able resources for ancient languages, as these are
Since the pioneering times of 1949, when the Je-                   currently scattered across the web in uncommu-
suit Roberto Busa persuaded Thomas Watson Sr.,                     nicative blocks, using different query languages,
CEO of IBM, to fund his project aimed at pro-                      data formats, annotation criteria and tagsets. The
cessing the Latin texts of Thomas Aquinas with                     last decade has seen a number of exploratory so-
computers (Jones, 2016), scholars in the areas                     lutions to tackle the sparseness of linguistic re-
of Computational Linguistics, Literary Comput-                     sources. Among them, the European infrastruc-
ing and Digital Humanities have built a plethora                   ture CLARIN4 represents a common hub where
of linguistic resources for both modern and histor-                data and metadata of resources collected in sin-
ical languages.                                                    gle repositories (at national level) can be searched
   Particularly over the last two decades, many and                (through the so-called Virtual Language Observa-
diverse linguistic resources have been made avail-                 tory) and processed with different tools (through
able for Latin. These consist in corpora of texts                  the CLARIN Language Resource Switchboard).
spanning different eras and genres1 , dependency                   As for Classical languages, Logeion5 is a meta-
                                                                      2
      Copyright © 2021 for this paper by its authors. Use per-           Index Thomisticus Treebank (Passarotti, 2019), Late
mitted under Creative Commons License Attribution 4.0 In-          Latin Charter Treebank (Cecchini et al., 2020a), UDante
ternational (CC BY 4.0).                                           (Cecchini et al., 2020b), PROIEL (Eckhoff et al., 2018) and
    1
      See, for example, Musisque deoque for Classical Latin        Latin Dependency Treebank (Bamman and Crane, 2011).
                                                                       3
poetry (Manca et al., 2011), CLaSSES, containing epigraphic              Such as, for instance, valency and subcategorisation lex-
material (De Felice et al., 2015), the large corpus of Classical   ica (Passarotti et al., 2016; McGillivray and Vatri, 2015),
Latin prose and poetic texts by LASLA (Denooz, 2007) and           the Latin WordNet (Minozzi, 2017) and word lists (Tombeur,
CroALa, which brings together writings by Croatian authors         1998; Ramminger, 2008).
                                                                       4
produced between the 10th and 20th centuries (Jovanović,                https://www.clarin.eu.
                                                                       5
2012).                                                                   https://logeion.uchicago.edu/lexidium.
dictionary that allows to query together the lexical      Net and a valency lexicon (Mambrini et al., 2021).
entries of several dictionaries for Ancient Greek         The most recent among the LiLa connections is
and Latin, while Corpus Corporum6 is a meta-              the bilingual Latin-English dictionary by Charlton
collection that allows searches across more than          Lewis and Charles Short (1879). The inclusion of
twenty different corpora for Latin. However, what         this type of lexicon in LiLa was much needed, as
such initiatives still lack is to provide a real inter-   no resource providing semantic information con-
operability between distributed resources, which          sisting of translations and definitions was avail-
would result in interaction at both syntactic (struc-     able in the network of connected resources before.
tural) and semantic (conceptual) level.                   Since Lewis & Short is the first lexical resource of
   Syntactic interoperability is defined as ‘the abil-    its kind included in LiLa, the process of its link-
ity of different systems to process (read) ex-            ing to the KB opened a number of LLOD-related
changed data either directly or via trivial conver-       challenges.
sion’, using a common data model consisting of               This paper describes how such challenges have
shared protocols and data formats. Semantic in-           been tackled and is organised as follows: Section 2
teroperability, on the other hand, is ‘the ability        describes the Lewis & Short dictionary in its main
to automatically interpret exchanged information          characteristics. Section 3 discusses the ontologies
meaningfully and accurately in order to produce           involved in the modelling phase, the challenges
useful results’, by using a set of common linguistic      that need to be overcome in the representation of
data categories defined in ad-hoc ontologies (Ide         the linguistic data as LLOD (3.1), and the strate-
and Pustejovsky, 2010).                                   gies adopted to represent the dictionary entries us-
   Attaining syntactic and semantic interoperabil-        ing the chosen vocabularies (3.2). Finally, Section
ity between distributed linguistic resources is the       4 discusses conclusions and highlights directions
objective of the Linguistic Linked Open Data              for future work.
(LLOD) community, which applies the princi-
ples of the Linked Data paradigm (Bizer et al.,           2     The “Lewis & Short” Dictionary
2008) to the (meta)data contained in linguistic re-
sources. As for Classical languages, the LiLa             2.1    The Printed and Digital Dictionary
Knowledge Base (KB)7 (Passarotti et al., 2020)
                                                          The Latin Dictionary, curated by Ch. T. Lewis
makes textual and lexical resources for Latin inter-
                                                          and Ch. Short and commonly referred to as the
act through a commonly used data model, called
                                                          “Lewis & Short” (L&S), was published by Harper
the Resource Description Framework (RDF) (Las-
                                                          and Oxford University Press in 1879 (Lewis and
sila et al., 1998), and ontologies developed and
                                                          Short, 1879). Though based on previous work by
shared by the LLOD community. In this way, the
                                                          German scholars, it remained a standard in Latin
linked resources become interoperable with each
                                                          lexicography in the English-speaking world until
other as well as with those for other languages de-
                                                          it was superseded by the Oxford Latin Dictionary
scribed following the same structural and concep-
                                                          (Glare, 1968).
tual principles.
   Based on a large collection of “canonical                 In the digital age, its importance rests on two
forms” (lemmas) - the so-called “Lemma Bank”,             grounds. On the one hand, its relevance for the
LiLa achieves interoperability between resources          history of Classical Scholarship is undeniable. On
by linking all those entries in lexical resources and     the other hand, also on account of its copyright
tokens in corpora that point to the same lemma in         status, as the dictionary belongs now to the pub-
the LiLa collection.                                      lic domain, the L&S has quickly become one of
                                                          the most used and best curated digital Latin dic-
   The lexical resources for Latin linked so far
                                                          tionaries on the web. Following the same work-
to LiLa include a word formation lexicon (Pelle-
                                                          flow used for the Greek-English Lexicon (Liddell
grini et al., 2021), a polarity lexicon (Sprugnoli et
                                                          et al., 1940), the Perseus Project has developed a
al., 2020), an etymological dictionary (Mambrini
                                                          widely used digital edition of the dictionary based
and Passarotti, 2020) and a joint resource provid-
                                                          on the standards of the Text Encoding Initiative
ing a manually checked subset of the Latin Word-
                                                          (TEI) (Rydberg-Cox, 2002). The digital L&S has
   6
       http://www.mlat.uzh.ch/MLS/.                       been incorporated in the word-search tools avail-
   7
       https://lila-erc.eu.                               able on the Perseus website and in a series of other
desktop and web applications.8                                      2.2    Linking the L&S to LiLa
   Perseus’ TEI edition is the point of departure of
                                                                    The LiLa KB includes about 200,000 canonical
our work.9 Though its publication was a remark-
                                                                    forms, each of which is described by a series of
able achievement, this electronic text is not exempt
                                                                    properties that record the part of speech (PoS),
from occasional flaws and inconsistencies, which
                                                                    the full morphological description and the inflec-
had to be taken into account.
                                                                    tional category. Also, the data property “written
   In the digital edition, entries from the L&S are
                                                                    representation”, defined in the ontology Ontolex
based on an XML encoding of the whole dic-
                                                                    (see Section 3.1), registers all the attested spellings
tionary. The XML structure, albeit not always
                                                                    of any lemma. Publishing a lexical resource as
consistent, offers the following information about
                                                                    LLOD within LiLa means to both represent its in-
each word:
                                                                    formation using the appropriate standards and vo-
  1. Entry: the headword. Entries are encoded                       cabularies (Section 3.1) and to link the dictionary
     within the TEI element  and                         entries to the right form in LiLa by matching the
     are 51,596 in total.10                                         lemmas used to index the records to the appropri-
                                                                    ate form in the KB.
  2. Information about inflection, encoded as at-                      In order to achieve the latter goal, firstly we
     tributes in the XML and visualised in the out-                 had to normalise the spelling of the L&S dictio-
     put reproducing the customary descriptions                     nary lemmas by removing upper case initials and
     for Latin dictionaries, e.g. a masculine noun                  substituting j with i and v with u in order to mir-
     of the second declension (e.g. gallus ‘cock’)                  ror LiLa’s conventions. Then, after mapping part-
     is followed by the genitive singular ending of                 of-speech and inflectional information between re-
     the word (‘i’), and the abbreviation for gen-                  sources, we extracted 31,142 1:1 matches, 2,998
     der ‘m.’ (e.g. gallus, i, m.).                                 1:N matches and 4,553 1:0 matches, on the basis
  3. Etymological or derivational information, en-                  of the tuple written representation - PoS. The lat-
     coded within the same element .                          ter group was subsequently matched only on the
                                                                    basis of graphical representation, at which point
  4. Sense(s): these act as containers where the                    we obtained 946 1:1 matches and 50 1:N matches.
     meaning of the word is matched with a num-                     Of the remaining 3,557 unmatched entries, 1,289
     ber of representative citations from Classical                 were successfully analysed by the morphological
     Latin sources. Each citation is accompanied                    analyser Lemlat (Passarotti et al., 2017), leaving
     by its canonical reference (e.g. “Cic. Sen. 8,                 2,239 definitely unmatched entries. After resolv-
     26” for a reference to Cicero, De Senectute,                   ing multi-word spellings and graphical variants,
     chapter 8, paragraph 26).                                      the unmatched entries were all added to the LiLa
                                                                    Lemma Bank, while 1:N matches were manually
Entries can contain what we call “sub-entries”,
                                                                    disambiguated and matched to the relevant lem-
words that are not given a record of their own, but
                                                                    mas.
are discussed within another entry. Usually, these
sub-entries consist of lexicalised present and past
                                                                    3     Modelling Lexical Entries
participles like, for example, adolescens ‘young
man’ – sub-entry of adolesco ‘to grow up’; an-                      3.1    LiLa, Ontolex and lexicog
other instance is the substantivised forms of ad-
jectives, such as verum ‘the truth’ – sub-entry of                  As said, the LiLa KB for Latin resources is built
verus ‘true’. Sub-entries are encoded within the                    around a collection of canonical forms that can be
 element and followed by the same type                       used both as head words of dictionaries or as “tar-
of inflectional information structured as the main                  gets” for the lemmatisation of corpora (Passarotti
entries.                                                            et al., 2020). These lemmas are modelled using
    8                                                               the Ontolex ontology, a now de facto standard of
      One example is the app Diogenes for querying corpora
of Greek and Latin texts: https://d.iogen.es/.                      the LLOD community (Cimiano et al., 2020; Mc-
    9
      The digital edition is available from the repository of the   Crae et al., 2017). In particular, lemmas in the
Perseus DL and is distributed under a CC BY SA 4.0 license:         LiLa KB are defined as forms of words that are
https://github.com/PerseusDL/lexica.
   10
      See https://tei-c.org/release/doc/tei-                        linked (or are ready to be linked) to lexical entries
p5-doc/en/html/ref-entryFree.html.                                  via the property “canonical form” of the Ontolex
ontology.11                                                Lexicographic entries are a special subset of
   Ontolex provides several classes and properties      a larger class called Lexicographic Component.
to describe the relationships that lexical entries      Apart from whole dictionary articles (the en-
have with, on the one hand, the grammatical forms       tries), components can be used to represent senses,
attested in language and, on the other, the senses      sense groups or subentries (like the substantivised
and the meanings of words. The core Ontolex             verum) within lexicographic entries.
module, however, imposes a series of restrictions          It is important to stress once again that compo-
that make its classes and properties ill-suited to      nents represent only structural units; all linguis-
represent the information in most standard dictio-      tic information that is conveyed within these units
naries. The class Lexical Entry from the core On-       must be expressed using Ontolex. The property
tolex module, for instance, is inadequate to rep-       lexicog:describes provides a link between
resent entries that license multiple syntactic inter-   the two dimensions, so that a lexicographic entry
pretations, such as words that are registered in a      can be said to describe a lexical entry (as defined
dictionary as both adverb and conjunction. Suben-       in Ontolex). In the same way, the lexicographic
tries like the noun verum from the adjective verus,     components that discuss a sense of a word or in-
formed by a process of substantivisation from the       troduce a subentry, describe that specific lexical
word in the main entry, would also produce a mis-       sense (as defined in Ontolex) or another lexical en-
match between the dictionary and the lexical entry.     try.
Finally, the L&S, as most dictionaries, defines the
senses of all but the most simple words by group-       3.2   Lexicographic and Lexical Entries in the
ing them in sense clusters; those clusters are gen-           L&S
erally organized into hierarchies with multiple lev-    The LLOD version of the L&S linked to LiLa is
els of nesting, from the most general to the most       now available online in the LiLa KB.13 The entries
specific sense, a structure for which Ontolex has       can also be searched using LiLa’s query interface
no suitable representation.                             and SPARQL endpoint.14
   In order to overcome these issues, the Ontolex          Figure 1 shows a visualisation of how the infor-
community has developed a specific extension of         mation from a sample entry, the adjective hosticus
the ontology called the “OntoLex lexicography           in the L&S dictionary, is represented in LiLa. In
module” or lexicog (Bosque-Gil and Gracia,              particular, the interplay between the linguistic and
2019).12 The module is explicitly designed to cap-      structural information is reflected in the complex
ture the structural information expressed in a lex-     relation between the lexical and lexicographic en-
icographic resource and is primarily intended to        tries.
support the conversion of lexicographic data that          The L&S distinguishes two senses for the word:
are not native to Ontolex. Retro-digitised dictio-      “belonging to an enemy, hostile” and “belonging
naries like the L&S are thus a perfect use case.        to a stranger, foreign”. Following the Ontolex
   As said, lexicog focuses on the structural           approach, these meanings are represented by the
properties of dictionaries and does not attempt to      two ‘triangles’ between the lexical entry (the light
convey any lexical, or indeed linguistic informa-       green node on the left), the concepts evoked by the
tion, which are left to the classes and properties      word (gray-blue nodes), and the senses, labeled 0
of Ontolex. The most important of these structural      ad 1, that mediate between them (greenish-yellow
elements introduced in the vocabulary is that of the    nodes).
Lexicographic Entry. In lexicog, an entry is a             The lexical entry is described by a lexicographic
container that represents a lexicographic article or    entry, identified by the id n21014 (inherited from
record as it is arranged in the source (Bosque-Gil      the TEI XML file of the Perseus DL), while a spe-
and Gracia, 2019). Thus, while a lexical entry (as      cific lexicographic component describes each of
defined in Ontolex) is an item in the lexicon of a      the two senses (n21014 0 and n21014 1, respec-
given language, a lexicographic entry is a record in    tively). What is particularly relevant is that the
a linguistic resource that documents or discusses       component n21014 0, which corresponds to the
some properties of a given lexical item.                  13
                                                             http://lila-erc.eu/data/lexicalResour
  11
     http://www.w3.org/ns/lemon/ontolex#c               ces/LewisShort/Lexicon.
anonicalForm.                                             14
                                                             https://lila-erc.eu/query/, and https:
  12
     https://www.w3.org/ns/lemon/lexicog#.              //lila-erc.eu/sparql/.
                        Figure 1: An entry in the LiLa’s representation of the L&S.


sense “hostile”, is linked to a sub-component that        tion, as recorded in the Word Formation Latin re-
describes the lexical entry of the noun hosticum,         source, which is also linked to LiLa (Litta et al.,
a substantivised usage of the neuter adjective that       2020). The adjective hosticus of Figure 1, for in-
means “the enemy’s territory”. That section of            stance, clearly inherits its two main senses (‘hos-
the entry that discusses the subentry “hosticum”,         tile’ and ‘foreign’) from the same polysemy of the
which is itself a section of the paragraph dedicated      noun hostis ’stranger’ or ’enemy’, from which it is
to the first sense, is thus linked (via the “describes”   derived. At the same time, while other resources
property) to a different lexical entry.                   in LiLa describe the senses of words, such as the
                                                          Latin WordNet (Franzini et al., 2019; Mambrini
4   Conclusions and Future Work                           et al., 2021), the complex relations between those
                                                          senses (whether, for instance, one sense is inter-
Perhaps even more than for any other modern lan-          preted as a specialised derivation from another) is
guage, a great number of lexical resources, either        generally available only in traditional lexical re-
bi- or monolingual, is available for Latin, many          sources like the L&S.
of which have already been digitised and dissem-
inated on the web. In this paper, we described
a model of how this huge wealth of information
can be published using the modern standards of
the Semantic Web. The greatest advantage of this             The solutions we found to address the chal-
approach is that all the lexical resources published      lenges raised by the representation of the L&S in
according to the same data model can be integrated        LLOD will be reused when we will link further
in a wider network of linguistic information, along       bilingual, as well as monolingual, dictionaries of
with the other digital resources that are connected       Latin to the KB. Including such lexical resources
to it. In the case of the L&S in LiLa, the Latin          in LiLa is an important achievement, as it makes
lexical entries of the bilingual dictionary can be        it possible for the KB to interact with linguistic
queried together with the information about the           (meta)data for languages other than Latin. Un-
same words provided by the other linguistic re-           doubtedly, such an inter-linguistic (re)use of dis-
sources linked to the lemmas in the KB.                   tributed resources is one of the objectives of the
   One example of the fruitful interactions be-           LLOD community, to which LiLa contributes by
tween resources is the possibility to investigate         steadily providing it also with new (kinds of) lin-
the polysemy of words in relation to their deriva-        guistic resources represented in LLOD.
Acknowledgments                                            Egidio Forcellini and Jacobo Facciolati. 1871. Lexicon
                                                             totius latinitatis, volume 3. Typis seminarii.
This project has received funding from the Eu-
ropean Research Council (ERC) under the Euro-              Greta Franzini, Andrea Peverelli, Paolo Ruffolo, Marco
pean Union’s Horizon 2020 research and innova-               Passarotti, Helena Sanna, Edoardo Signoroni, Vi-
                                                             viana Ventura, and Federica Zampedri. 2019. Nunc
tion programme – Grant Agreement No. 769994.                 Est Aestimandum. Towards an evaluation of the
                                                             Latin WordNet. In Raffaella Bernardi, Roberto Nav-
                                                             igli, and Giovanni Semeraro, editors, Sixth Italian
References                                                   Conference on Computational Linguistics (CLiC-it
                                                             2019), pages 1–8, Bari, Italy. CEUR-WS.org.
Richard Ashdowne, David R Howlett, and Ronald Ed-
  ward Latham. 1975. Dictionary of medieval Latin          Peter GW Glare. 1968. Oxford latin dictionary.
  from British sources. Oxford University Press.             Clarendon Press, Oxford.
David Bamman and Gregory Crane. 2011. The an-
  cient greek and latin dependency treebanks. In Lan-      Nancy Ide and James Pustejovsky. 2010. What does
  guage technology for cultural heritage, pages 79–          interoperability mean, anyway? toward an oper-
  98. Springer.                                              ational definition of interoperability for language
                                                             technology. In Proceedings of the Second Inter-
Christian Bizer, Tom Heath, Kingsley Idehen, and             national Conference on Global Interoperability for
  Tim Berners-Lee. 2008. Linked data on the web              Language Resources. Hong Kong, China.
  (ldow2008). In Proceedings of the 17th interna-
  tional conference on World Wide Web, pages 1265–         Steven E Jones. 2016. Roberto Busa, SJ, and the emer-
  1266.                                                       gence of humanities computing: the priest and the
                                                              punched cards. Routledge.
Julia Bosque-Gil and Jorge Gracia. 2019. The On-
   toLex lemon lexicography module. https://on             Neven Jovanović. 2012. Croala. enhancing a tei-
   tolex.github.io/lexicog/.                                 encoded text collection. Journal of the Text Encod-
                                                             ing Initiative, (2).
Flavio Massimiliano Cecchini, Timo Korkiakangas,
   and Marco Passarotti. 2020a. A new latin tree-          Ora Lassila, Ralph R. Swick, World Wide, and Web
   bank for universal dependencies: Charters between         Consortium. 1998. Resource description frame-
   ancient latin and romance languages. In Proceed-          work (rdf) model and syntax specification.
   ings of The 12th Language Resources and Evalua-
   tion Conference, pages 933–942.                         Charlton T. Lewis and Charles Short. 1879. A Latin
                                                             Dictionary. Founded on Andrews’ edition of Fre-
Flavio Massimiliano Cecchini, Rachele Sprugnoli,             und’s Latin dictionary. Clarendon Press, Oxford.
   Giovanni Moretti, and Marco Passarotti. 2020b.
   Udante: First steps towards the universal dependen-     Henry Liddell, Robert Scott, and Henry Stuart Jones.
   cies treebank of dante’s latin works. In CLiC-it.         1940. A Greek-English Lexicon. Clarendon Press,
                                                             Oxford, 9 edition.
Philipp Cimiano, Christian Chiarcos, John P. Mc-
  Crae, and Jorge Gracia. 2020. Linguistic Linked          Eleonora Litta, Marco Passarotti, and Francesco Mam-
  Data: Representation, Generation and Applications.         brini. 2020. Derivations and Connections: Word
  Springer, Cham.                                            Formation in the LiLa Knowledge Base of Linguis-
                                                             tic Resources for Latin. The Prague Bulletin Of
Irene De Felice, Giovanna Marotta, and Margherita
                                                             Mathematical Linguistics, 115:163–186.
   Donati. 2015. Classes: A new digital resource for
   latin epigraphy. IJCoL. Italian Journal of Computa-
                                                           Francesco Mambrini and Marco Passarotti. 2020.
   tional Linguistics, 1(1-1):125–136.
                                                             Representing etymology in the lila knowledge base
Joseph Denooz. 2007. Opera latina: le nouveau                of linguistic resources for latin. In Proceedings of
   site internet du lasla. Journal of Latin Linguistics,     the 2020 Globalex Workshop on Linked Lexicogra-
   9(3):21–34.                                               phy, pages 20–28.

Hanne Eckhoff, Kristin Bech, Gerlof Bouma, Kris-           Francesco Mambrini, Marco Passarotti, Eleonora Litta,
  tine Eide, Dag Haug, Odd Einar Haugen, and Mar-            and Giovanni Moretti. 2021. Interlinking valency
  ius Jøhndal. 2018. The proiel treebank family:             frames and wordnet synsets in the lila knowledge
  a standard for early attestations of indo-european         base of linguistic resources for latin. In Further with
  languages. Language Resources and Evaluation,              Knowledge Graphs, pages 16–28. IOS Press.
  52(1):29–65.
                                                           Massimo Manca, Linda Spinazzè, Paolo Mastandrea,
Wilhelm Ehlers. 1968. Der thesaurus linguae latinae.        Luigi Tessarolo, and Federico Boschetti. 2011. Mu-
  prinzipien und erfahrungen. Antike und Abendland,         sisque deoque: Text retrieval on critical editionse. J.
  14(1):172–184.                                            Lang. Technol. Comput. Linguistics, 26(2):127–138.
John P. McCrae, Julia Bosque-Gil, Jorge Gracia, Paul     Paul Tombeur. 1998. Thesaurus formarum totius La-
  Buitelaar, and Philipp Cimiano.       2017.   The        tinitatis: a Plauto usque ad saeculum XXum; TF.[2].
  OntoLex-Lemon Model: development and applica-            CETEDOC Index of Latin forms: database for the
  tions. In Proceedings of eLex 2017, pages 587–597.       study of the vocabulary of the entire Latin world;
                                                           base de données pour l’étude du vocabulaire de
Barbara McGillivray and Alessandro Vatri. 2015.            toute la latinité. Brepols.
  Computational valency lexica for latin and greek in
  use: a case study of syntactic ambiguity. Journal of
  Latin Linguistics, 14(1):101–126.

Stefano Minozzi. 2017. Latin wordnet, una rete
   di conoscenza semantica per il latino e alcune
   ipotesi di utilizzo nel campo dell’information re-
   trieval. Strumenti digitali e collaborativi per le
   Scienze dell’Antichità, (14):123–134.

Marco Passarotti, Berta González Saavedra, and
 Christophe Onambele. 2016. Latin vallex. a
 treebank-based semantic valency lexicon for latin.
 In Proceedings of the Tenth International Con-
 ference on Language Resources and Evaluation
 (LREC’16), pages 2599–2606.

Marco Passarotti, Marco Budassi, Eleonora Litta, and
 Paolo Ruffolo. 2017. The Lemlat 3.0 Package for
 Morphological Analysis of Latin. In Gerlof Bouma
 and Yvonne Adesam, editors, Proceedings of the
 NoDaLiDa 2017 Workshop on Processing Histori-
 cal Language, volume 133, pages 24–31, Gothen-
 burg. Linköping University Electronic Press.

Marco Passarotti, Francesco Mambrini, Greta Franzini,
 Flavio Massimiliano Cecchini, Eleonora Litta, Gio-
 vanni Moretti, Paolo Ruffolo, and Rachele Sprug-
 noli. 2020. Interlinking through lemmas. the lexi-
 cal collection of the lila knowledge base of linguis-
 tic resources for latin. Studi e Saggi Linguistici,
 58(1):177–212.

Marco Passarotti. 2019. The project of the index
 thomisticus treebank. In Digital Classical Philol-
 ogy, pages 299–320. De Gruyter Saur.

Matteo Pellegrini, Eleonora Litta, Marco Passarotti,
 Francesco Mambrini, and Giovanni Moretti. 2021.
 The two approaches to word formation in the lila
 knowledge base of latin resources. In Proceedings
 of the Third International Workshop on Resources
 and Tools for Derivational Morphology (DeriMo
 2021), pages 101–109.

Johann Ramminger. 2008. Neulateinische Wortliste.
  Ein Wörterbuch der Lateinischen von Petrarca bis
  1700. Thesaurus Linguae Latinae.

Jeffrey A Rydberg-Cox. 2002. Mining Data from
   an Electronic Greek Lexicon. Classical Journal,
   98(2):183–188.

Rachele Sprugnoli, Francesco Mambrini, Giovanni
  Moretti, and Marco Passarotti. 2020. Towards the
  modeling of polarity in a latin knowledge base. In
  WHiSe@ ESWC, pages 59–70.