<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LatInfLexi: an Inflected Lexicon of Latin Verbs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Pellegrini</string-name>
          <email>matteo.pellegrini@unibg.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Passarotti</string-name>
          <email>marco.passarotti@unicatt.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIRCSE Research Centre, Università Cattolica del Sacro Cuore</institution>
          ,
          <addr-line>Largo Gemelli, 1 - 20123 Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Bergamo/Pavia</institution>
          ,
          <addr-line>Piazza Rosate, 2 -, 24129 Bergamo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1908</year>
      </pub-date>
      <abstract>
        <p>English. We present a paradigm-based inflected lexicon of Latin verbs built to provide empirical evidence supporting an entropybased estimation of the degree of uncertainty in inflectional paradigms. The lexicon contains information on the inflected forms that occupy the 254 morphologically possible paradigm cells of 3,348 verbal lexemes extracted from a frequency lexicon of Latin. The resource also includes annotation of vowel length and the frequency of each form in different epochs. In this paper, we describe the construction of LatInfLexi, an inflected lexicon of Latin verbs organized in lexemes1 and paradigm cells.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Presentiamo un lessico di forme
flesse basato sui paradigmi per i verbi latini,
costruito per fornire evidenza empirica che
permetta di quantificare il grado di
incertezza nei paradigmi flessivi tramite l’entropia.
Il lessico contiene informazioni sulle forme
flesse che occupano le 254 celle possibili dal
punto di vista morfologico di 3.348 lessemi
verbali estratti da un dizionario frequenziale
del latino. La risorsa include anche
l’annotazione della lunghezza vocalica e la
frequenza di ogni forma in diverse epoche.
1 The term “lexeme” is used for the abstract
theoretical concept normally adopted in morphology and
lexicology, while “lemma” refers to the concrete citation
form representing an entry in dictionaries. Since we
In morphological theory, there is a recent
trend towards a more realistic modelling of
complex inflectional systems: for instance, Ackerman
et al. (2009) and Bonami and Boyé (2014)
propose that the analysis should take a full inflected
form as a starting point, without assuming any
segmentation a priori. In such approaches, what
is investigated is not the construction of forms
from smaller units like stems and inflectional
endings, but rather their predictability given
knowledge of other forms. This can be done by
using the information theoretic notion of
conditional entropy to estimate the uncertainty in
guessing the content of the paradigm cell of a
lexeme knowing another inflected form of the
same lexeme, by weighting the probability of
application of each inflectional pattern based on
their type frequency in real data.</p>
      <p>To do so, large-scale inflected lexicons listing
all forms of a representative selection of lexemes
are needed. Such resources are increasingly
being developed for modern languages – see
among else Zanchetta and Baroni (2005) and
Calderone et al. (2017) for Italian, Neme (2013)
for Arabic, Bonami et al. (2014) and Hathout et
al. (2014) for French. However, to the best of our
knowledge, there are no resources of this kind
for Latin, although their (semi-)automatic
building is made possible by the current availability of
several morphological analyzers for Latin,
including Words
(http://archives.nd.edu/words.html),
Lemlat (www.lemlat3.eu), Morpheus
(https://github.com/tmallon/morpheus), the
PROIEL Latin morphology system
(https://github.com/mlj/proielaim at a resource suitable for theoretical inquiries, we
use the first term as a label in our resource.
webapp/tree/master/lib/morphology) and
LatMor (http://cistern.cis.lmu.de). Our
resource was created to fill this gap and to enable
a quantitative, entropy-based analysis of Latin
verb inflection.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Design</title>
      <p>
        A distinctive feature of our inflected lexicon is
that it is based on lexemes and paradigm cells,
rather than on forms. This means that for each
lexeme, all the morphologically possible
paradigm cells are filled with a form, and not only
those forms that are indeed attested in Latin texts
are stored in paradigm cells. In this respect, our
resource is similar to other recently developed
inflected lexicons, like for instance Flexique for
French
        <xref ref-type="bibr" rid="ref5">(Bonami et al., 2014)</xref>
        .
      </p>
      <p>For each paradigm cell, the following
information is provided:
(i)
(ii)
(iii)
(iv)
the inflected form that occupies the
paradigm cell;
a univocal identifier of the lexeme to
which it belongs;
the set of its morphological features;
information on the frequency of the form
in different epochs.</p>
      <p>
        As for (i), it should be noted that there is never
more than one form per paradigm cell. In cases
of overabundance
        <xref ref-type="bibr" rid="ref11">(i.e. cells that are filled by
more than one form, cf. Thornton, 2012)</xref>
        , a
choice was made to decide which “cell-mate”
        <xref ref-type="bibr" rid="ref11">(Thornton, 2012: 183)</xref>
        should be kept, and which
one discarded.
      </p>
      <p>
        On the other hand, in some cases a paradigm
cell could be empty, either because it is defective
– like for instance the passive cells of intransitive
verbs – or because it is not filled by a synthetic
form, but rather it is analytically expressed, by
means of a phrase – like for instance, in Latin,
the perfective cells of deponent verbs, for which
the periphrasis PRF.PTCP 2 + AUX esse ‘to be’ is
used (e.g. PRF.IND.1SG hortātus sum ‘I incited’).
In both cases, the cell is marked as #DEF# in the
resource. This convention is adopted also in
Flexique
        <xref ref-type="bibr" rid="ref5">(Bonami et al., 2014: 2585)</xref>
        , and it fits
the requirements of the Qumin package for
entropy calculations on the predictability of
implic2 Throughout the paper, we will refer to grammatical
features by using the standard abbreviations of the
Leipzig Glossing Rules.
ative relations between inflected forms
        <xref ref-type="bibr" rid="ref3">(Bonami
and Beniamine, 2016; Beniamine, 2017)</xref>
        .
      </p>
      <p>As for (ii), the identifier corresponds to the
citation form of the lexeme, almost always the
first-person singular of the present indicative,
following the Latin lexicographical and
didactical tradition. A diacritic is added in those rare
cases where different verbs have the same
citation form (see infra, §3.2).</p>
      <p>Regarding (iii), we use the PoS-tags of the
Universal Part-of-Speech Tagset by Petrov et al.
(2012) and the morphological features used in
Universal Dependencies
(http://universaldependencies.org/u/feat
/index.html).</p>
      <p>Lastly, the frequency data in (iv) are taken
from Tombeur’s (1998) Thesaurus Formarum
Totius Latinitatis (see infra, §3.3).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Building the Lexicon</title>
      <p>This section details the procedure followed to
build the lexicon.
3.1</p>
      <sec id="sec-3-1">
        <title>Selecting the Lexemes</title>
        <p>Our first objective is to build an inflected lexicon
of Latin featuring all the possible inflected forms
of verbs only. To this aim, we include all the
verbal entries contained in Delatte et al.’s (1981)
Dictionnaire fréquentiel et Index inverse de la
langue latine (henceforth DFILL). This yields a
total of 3,348 verbs. In rare cases, more than one
entry of DFILL corresponds to one and the same
lexeme in our resource. This happens because
some verbs are lemmatized twice in DFILL. For
instance, for the verb verso two different entries
appear in DFILL, using as citation form both the
first-person singular of the present active
indicative verso and the corresponding
morphologically passive form versor. This choice is likely to be
motivated by the different semantics of the two
verbs, with the first one meaning ‘to turn’ and
the second one meaning ‘to remain’. However, in
such cases our resource gives priority to
collecting into one common inflectional paradigm all
the forms that can be assigned to the same
lexeme based on their morphological relatedness,
rather than separating them in paradigms of
different lexemes according to semantic criteria.
Therefore, our lexicon includes only one lexeme
verso, for which both active and passive forms
are listed.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Generating the Forms</title>
        <p>
          In order to fill all of the paradigm cells of the
selected lexemes, we exploit the database of
Lemlat
          <xref ref-type="bibr" rid="ref5 ref8">(Passarotti et al., 2017)</xref>
          . For each lexeme,
the database of Lemlat contains a list of
segments called LES – roughly corresponding to the
stems that are used in different subparadigms –
each with a corresponding CODLES that provides
(among else) information on the inflectional
endings that can be attached to a LES. We make use
of this information to generate the relevant
forms.
        </p>
        <p>To illustrate the details of the procedure, let’s
consider the verb rumpo ‘to break’. For this verb,
the database of Lemlat features the LESs and
CODLESs shown in Table 1.</p>
        <p>LES CODLES
rump v3r
rumpisse fe
rup v7s
rupsit fe
rupt n41
rupt n6p1
ruptur n6p2</p>
        <p>
          The two LESs with CODLES “fe” (“forma
eccezionale”, ‘exceptional form’) were discarded,
since they are full irregular forms that are stored
as such. As for the other LESs, the one with
CODLES “v3r” is used to fill all the cells of the
present system, by adding the inflectional
endings of the conjugation represented by the
CODLES (i.e. the 3rd conjugation). Similarly, the
LES with CODLES “v7s” is used to fill the cells of
the perfect system. From the remaining LESs,
some nominal forms built upon the so-called
“third stem”
          <xref ref-type="bibr" rid="ref2">(Aronoff, 1994)</xref>
          can be derived,
namely the supine rupt-um and rupt-ū from the
LES with CODLES “n41”, the perfect participle
rupt-us, -a, -um from the LES with CODLES
“n6p1” and the future participle ruptūr-us, -a,
um from the LES with CODLES “n6p2”.
        </p>
        <p>This given, our first step is to extract
information on the LESs and CODLESs of each lexeme.
Since Lemlat is a tool built to analyze rather than
produce forms, it contains also several LESs
occurring only in irregular and/or rare forms. To
avoid the risk of overgeneration, we choose and
keep only one LES for each CODLES. The choice
is based on lexicographical sources, namely
Lewis and Short (1879) and Glare (1982). In
these dictionaries, at the very beginning of each
verbal entry there is a set of four “principal
parts” (Bennett, 1908: 55), i.e. exemplary
inflected forms from which the whole paradigm of
the lexeme can be inferred. We keep only those
LESs that correspond to such principal parts,
excluding the ones that correspond to more
marginal forms that do appear in dictionaries but are
given less prominence in the entry. For instance,
Lemlat includes two LESs with CODLES “v3r” for
the verb dico ‘to say’: “dic” and “deic”.
However, in both the lexicographical sources we use,
the relevant principal parts are dico and dicere,
corresponding to the first LES, while the second
one is only mentioned later in the entries as an
alternative form. Therefore, the LES selected for
our resource is “dic”.</p>
        <p>We use the same dictionaries also to manually
annotate the vowel length for each LES. This is a
necessary enhancement, because in Latin verb
inflection there are homographic forms that can
be distinguished only based on that, like for
instance PRS.ACT.IND.3SG fugit ‘(s)he flees’ vs.
PRF.ACT.IND.3SG fūgit ‘(s)he fleed’.</p>
        <p>Following this process, we fill all the 254
paradigm cells of each of the 3,348 lexemes.
However, because of Lemlat’s design, for some quite
frequent verbs with a highly irregular inflectional
paradigm, it was not possible to apply the same
procedure, at least for the cells of the present
system, which is where most irregularity of the
inflectional endings of Latin verbs happens. For
the verbs shown in Table 2 and for those derived
from them by prefixation (e.g. abeo ‘to go away’
from verb eo ‘to go’), although it was technically
possible to adopt a similar approach by using
more than one LES for a CODLES, it proved to be
faster and practical to manually record the
correct forms as such.</p>
        <p>Lemma
aio
eo
fero
fio
inquam
malo
nolo
possum
sum
volo</p>
        <p>Meaning
to say
to go
to bring
to become
to say
to prefer
not to want
can
to be
to want</p>
        <p>To each of the 850,392 generated paradigms
cells, a univocal lexeme identifier is assigned,
which corresponds to the lemma used in Lemlat.
In those rare cases where two or more verbs have
the same lemma in Lemlat (although they inflect
differently), a numeric diacritic is added to make
the relevant distinction: for instance, we have
volo1 ‘to fly’ and volo2 ‘to want’.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Frequency Data</title>
        <p>Many forms included in the paradigm cells of
our lexicon are never attested in Latin texts. In
order to make it possible to distinguish between
plausible but unattested forms and those indeed
occurring in texts, we enhance forms with
information on their frequency. This information is
taken from Tombeur’s (1998) Thesaurus
Formarum Totius Latinitatis (henceforth TFTL),
where each form is assigned the number of its
occurrences in four different epochs, respectively
called Antiquitas (from the origins to the end of
the 2nd century A.D.), Aetas Patrum (2nd
century735 A.D.), Medium Aeuum (736-1499) and
Recentior Latinitas (1500-1965).</p>
        <p>By including the frequency of each form in the
lexicon, we know how many of the 752,5373
forms recorded in the lexicon are never actually
attested. Table 3 reports the relevant data4.
TFTL epoch
Antiquitas
Aetas Patrum
Medium Aeuum
Recentior Latinitas
all epochs
unattested forms (%)
544,395 (72.34%)
482,324 (64.1%)
484,421 (64.37%)
640,552 (85.12%)
401,690 (53.38%)</p>
        <p>It can be observed that a significant amount of
forms recorded in our lexicon are not attested,
even in such a large corpus as the one the TFTL
is based on. However, this is not surprising:
recent large-scale corpus-based investigations (e.g.
Bonami and Beniamine, 2016: 158 ff.) show that
3 The 97,855 paradigm cells marked as #DEF# are
excluded from this count.
4 In total, the TFTL includes 554,828 different forms,
corresponding to 62,922,781 occurrences in the
reference corpus used by the Thesaurus. Our lexicon
contains 165,898 of these unique forms (forms appearing
in more than one paradigm cell are counted only
once), for a total of 18,261,179 occurrences. This
means that our resource covers around 30% of the
forms of the TFTL, in terms of both type and token
frequency. In addition, it also contains several other
forms that are not attested in the TFTL (245,623
unique forms).
in languages with large inflectional paradigms –
like the ones of Latin verbs – it is perfectly
normal that many plausible forms do not appear,
even in very large datasets, and the lexemes for
which the full paradigm is attested are very few.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Future Work</title>
      <p>We described the design and building of a
lexeme-based inflected lexicon consisting of
850,392 paradigm cells of 3,348 Latin verbs. Our
first objective in the near future is to make the
resource complete in terms of lexical coverage,
including the lexemes of the other PoS. The
lexicon is available for download as a .csv file at
https://github.com/matteopellegrini/LatInfLexi.</p>
      <p>
        We also plan to include phonetic annotation,
by giving the IPA transcription of each form,
which can be obtained semi-automatically by
applying a script provided by the Classical
Language Toolkit
        <xref ref-type="bibr" rid="ref5">(Johnson et al., 2014-17)</xref>
        to stems
and endings.
      </p>
      <p>Another welcome addition would be to
account for cases of overabundance, by allowing
more than one form to appear in the same
paradigm cell. However, to decide which cell-mates
to keep and which ones to discard, their
frequency in Latin texts should be preliminarily
evaluated. In this respect, it has to be noted that the
frequencies in the TFTL refer to bare surface forms,
with no contextual disambiguation. For instance,
the frequency of veniam comprises not only
occurrences of both the PRS.ACT.SBJV.1SG and
FUT.ACT.IND.1SG of the verb venio ‘to come’, but
also of the ACC.SG of the noun venia
‘indulgence’.</p>
      <p>To get an idea of the impact of morphological
ambiguity on our lexicon, we analyzed all the
generated forms with Lemlat (version 3.0). We
found that only for about 23% (170,735) of the
752,537 forms Lemlat outputs only one analysis
(i.e. one lemma and one set of morphological
features), the remaining 581,802 (about 77%)
being ambiguous. This result weakens the
reliability of the frequency data provided in the
lexicon. Therefore, disambiguation is needed,
although this would require a very time-consuming
work.</p>
      <p>
        However, to tackle the problem of ambiguity,
a first useful step is distinguishing between cases
like veniam above, which can be analyzed as an
inflected form of two different lemmas, and
cases where the different analyses only refer to
different forms of the same lemma, e.g. laudatis,
that appears both in the PRS.ACT.IND.2PL and in
the PRF.PTCP.DAT/ABL.PL of laudo ‘to praise’,
but cannot be a form of other lemmas. We call
these different types ‘exolemmatic’ and
‘endolemmatic’ ambiguity, respectively
        <xref ref-type="bibr" rid="ref9">(cf. Passarotti
and Ruffolo, 2004)</xref>
        . Cases of exolemmatic
ambiguity are clearly more problematic, but they are
also much rarer: only 79,490 (about 10%) of the
forms in our resource belong to this type. The
great majority of ambiguous forms only give rise
to endolemmatic ambiguity, as can be observed
in Table 4 below, where the relevant data are
summarized.
unambiguous forms
ambiguous forms
only endolemmatic amb.
exolemmatic amb.
      </p>
      <p>As far as endolemmatic ambiguity is
concerned, although its quantitative impact is far
greater, it could be considerably reduced in a
principled manner. Indeed, it should be noted
that in many cases this kind of ambiguity is due
to systematic syncretism. For instance, the cells
FUT.ACT.IMP.2SG and FUT.ACT.IMP.3SG are never
unambiguously analyzed, because they are
always identical for a same verb. Given the full
systematicity of this syncretism, which holds for
all lexemes, these cells could be considered as
only one from a purely morphological point of
view. Therefore, the problem of endolemmatic
ambiguity could be at least reduced by adopting
an approach based on “morphomic paradigms”
(Boyé and Schalchli, 2016), where always
syncretic cells are conflated, rather than on
morphosyntactic paradigms. This would be helpful
especially in nominal forms like participles and
gerundives, where such cases of systematic
syncretism are widespread.</p>
      <p>When such ambiguity issues will have been
resolved, it will also be possible to exploit the
frequency data in a more systematic fashion, e.g.
to perform diachronic investigations on how the
frequency of specific (groups of) forms or
paradigm cells change across the four considered
epochs, or to model Latin inflectional
morphology in an even more realistic way, by considering
also the token frequency of inflected forms, as
has been recently proposed by Boyé (2016).</p>
      <p>Olivier Bonami and Sarah Beniamine. 2016. Joint
predictiveness in inflectional paradigms. Word
Structure 9(2): 156–182.</p>
      <p>Olivier Bonami and Gilles Boyé. 2014. De formes en
thèmes. In Florence Villoing, Sophie David and
Sarah Leroy, editors, Foisonnements
morphologiques: Études en hommage à Françoise
Kerleroux. Presses universitaires de Paris Ouest, Paris:
17–45.</p>
      <p>Olivier Bonami, Gauthier Caron and Clément Plancq.
2014. Construction d’un lexique flexionnel
phonétisé libre du français. In Franck Neveu, Peter
Blumenthal, Linda Hriba, Annette Gerstenberg, Judith
Meinschaefer and Sophie Prévost, editors, Actes du
quatrième congrès mondial de linguistique
française: 2583–2596.</p>
      <p>Gilles Boyé. 2016. Pour une modélisation surfaciste
de la flexion. Le cas de la conjugaison du français.
In SHS Web of Conferences. Vol. 27. EDP
Sciences.</p>
      <p>Gilles Boyé and Gauvain Shalchli. 2016. The status of
paradigms. In Andrew Hippisley and Gregory
Stump, editors, The Cambridge Handbook of
Morphology. Cambridge University Press, Cambridge:
206–234.</p>
      <p>Basilio Calderone, Matteo Pascoli, Nabil Hathout and
Franck Sajous. 2017. Hybrid method for stress
prediction applied to GLAFF-IT, a large-scale
Italian lexicon. In International Conference on
Language, Data and Knowledge. Springer, Cham: 26–
41.</p>
      <p>Louis Delatte, Étienne Evrard, Suzanne Govaerts and
Joseph Denooz. 1981. Dictionnaire fréquentiel et
index inverse de la langue latine. L.A.S.L.A,
Liege.</p>
      <p>Peter G.W. Glare. 1982. Oxford Latin Dictionary.</p>
      <p>Oxford University Press, Oxford.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Farrell</given-names>
            <surname>Ackerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>James P.</given-names>
            <surname>Blevins</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Malouf</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Parts and wholes: Implicative patterns in inflectional paradigms</article-title>
          . In James P. Blevins and Juliette Blevins, editors,
          <source>Analogy in Grammar: Form and Acquisition</source>
          . Oxford University Press, Oxford:
          <fpage>54</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Aronoff</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Morphology by itself: Stems and inflectional classes</article-title>
          . MIT Press, Cambridge/London.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Sacha</given-names>
            <surname>Beniamine</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Un algorithme universel pour l'abstraction automatique d'alternances morphophonologiques</article-title>
          .
          <source>In 24e Conférence sur le Traitement Automatique des Langues Naturelles (TALN).</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Nabil</given-names>
            <surname>Hathout</surname>
          </string-name>
          , Franck Sajous and
          <string-name>
            <given-names>Basilio</given-names>
            <surname>Calderone</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GLÀFF, a large versatile French lexicon</article-title>
          .
          <source>In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          :
          <fpage>1007</fpage>
          -
          <lpage>1012</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Kyle P. Johnson</surname>
          </string-name>
          et al. 2014
          <article-title>-2017</article-title>
          .
          <article-title>CLTK: The Classical Language Toolkit</article-title>
          .
          <source>DOI 10</source>
          .5281/zenodo593336.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Charlton</given-names>
            <surname>Lewis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Charles</given-names>
            <surname>Short</surname>
          </string-name>
          .
          <year>1879</year>
          .
          <string-name>
            <given-names>A Latin</given-names>
            <surname>Dictionary. Clarendon</surname>
          </string-name>
          , Oxford.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Amid Neme</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A fully inflected Arabic verb resource constructed from a lexicon of lemmas by using finite-state transducers</article-title>
          .
          <source>Revue RIST: revue de l'information scientifique et technique</source>
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <fpage>7</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Passarotti</surname>
          </string-name>
          , Marco Budassi,
          <source>Eleonora Litta and Paolo Ruffolo</source>
          <year>2017</year>
          .
          <article-title>The Lemlat 3.0 Package for Morphological Analysis of Latin</article-title>
          .
          <source>In Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language:</source>
          <fpage>24</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Passarotti</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Ruffolo</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>L'utilizzo del lemmatizzatore LEMLAT per una sistematizzazione dell'omografia in latino</article-title>
          .
          <source>EUPHROSYNE 32(A)</source>
          :
          <fpage>99</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Slav</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dipanjan Das</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ryan McDonald</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A universal part-of-speech tagset</article-title>
          .
          <source>ArXiv</source>
          :
          <fpage>1104</fpage>
          -
          <lpage>2086</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Anna</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Thornton</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Reduction and maintenance of overabundance. A case study on Italian verb paradigms</article-title>
          .
          <source>Word Structure</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>183</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tombeur</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Thesaurus formarum totius latinitatis a Plauto usque ad saeculum XXum</article-title>
          . Brepols, Turnhout.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Eros</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Morph-it!: a free corpus-based morphological resource for the italian language</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>