<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linking the Corpus CLaSSES to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene De Felice</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Tamponi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Iurescia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Passarotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università Cattolica del Sacro Cuore</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Genova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università di Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the process of linking the corpus CLaSSES (which collects non-literary Latin texts of diferent periods and places) to the LiLa Knowledge Base of linguistic resources for Latin made interoperable through their publication as Linked Data. The paper details the RDF modeling of the (meta)data provided by CLaSSES and presents three queries on data from diferent resources that interact in LiLa. ontologies for the representation of linguistic (meta)data. Among the resources interlinked in LiLa is the The Latin language shows a large diversity, in the light CLaSSES corpus, which enhances the set of lexical and of its wide usage in terms both of diachrony (spanning textual data made interoperable by the Knowledge Base across two millennia) and diatopy (all over Europe and with a peculiar kind of non-literary Latin texts (such beyond). Such diversity is mirrored in the set of linguistic as inscriptions, writing tablets, and letters) written in resources currently available for Latin, ranging from col- diferent periods and provinces of the Roman Empire, lections of literary texts of the Classical era,1 to corpora of thus contributing to extend the coverage of LiLa with a documentary texts of the Medieval times,2 dictionaries,3 typology of texts not present so far in the Knowledge and glossaries.4 Base. Like for many other languages, one limitation that af- This paper details the process of linking CLaSSES to fects the wealth of resources for Latin is their sparseness, LiLa, and is organized as follows. Section 2 presents the which prevents the full exploitation of the data they pro- corpus CLaSSES. Section 3 describes the LiLa Knowledge vide. The LiLa Knowledge Base was built to overcome Base. Section 4 discusses the modeling and the linking of such limitation, making distributed resources for Latin CLaSSES into LiLa. Section 5 reports three examples of interact through their publication as Linked Data, by us- queries that exploit the interoperability of CLaSSES with ing a set of commonly used vocabularies provided by other resources in LiLa. Finally, Section 6 provides some conclusions on the results of the linking, and outlines directions of future work.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Latin</kwd>
        <kwd>Textual resources</kwd>
        <kwd>Linguistic Linked Open Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CLiC-it 2023: 9th Italian Conference on Computational Linguistics,
Nov 30 — Dec 02, 2023, Venice, Italy
* Corresponding author.
$ irene.defelice@edu.unige.it (I. De Felice);
lucia.tamponi@fileli.unipi.it (L. Tamponi); 2. CLaSSES
federica.iurescia@unicatt.it (F. Iurescia);
marco.passarotti@unicatt.it (M. Passarotti)</p>
      <p>
        © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License CLaSSES (Corpus for Latin Sociolinguistic Studies on
EpiACPWrElooUrckRteshdohinpgs oIhStpN:/uc1e6u1r3-w-0s.o7gr3ghACtttEhribUiustRiopnWa4.p0oIenrrtekrwnsahtaioosnpalc(PCoCrnoBcYcee4i.0ev).deidngasn(dCdEiUscRu-sWseSd.ojrogi)ntly by the gorraaptohriyc otefxPtSh)oinseaticdsigaintadl PrhesoonuorlcoegycraetaPteidsabUyntihveerLsaitby-.
authors, solely for academic purposes scientific responsibility is to Freely accessible on the internet,5 it consists of over 3,400
be divided up as follows: I. De Felice wrote Sections 2 and 4.2; L. non-literary Latin texts such as inscriptions, private
letTamponi Sections 5, 5.1, 5.2, 5.3; F. Iurescia wrote Section 4.1; M. ters, ink tablets, ostraka and papyri from various
periaPuasthsaorrost.ti wrote Sections 1 and 3. Section 6 is to be attributed to all ods (6th century BCE to 6th century CE) and regions
1Such as the LASLA corpus: https://www.lasla.uliege.be/cms/c_ of the Roman Empire. The goal of CLaSSES is to use
8570411/fr/lasla-textes-latins. non-literary texts that exhibit (ortho-)graphic variants
2Such as the corpus of Computational Historical Semantics: https: as a source to study the sociolinguistic variation of Latin
3/S/uwcwhwas.cothmepbhiilsintsgeuma.loLrga/thino-mEne.ghltimshl.dictionary curated by Ch. T. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. The identification of these spelling variants is
Lewis and Ch. Short [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. the most crucial aspect of the corpus: words like dedet
4Such as the Medieval Latin Glossarium Mediae et Infimae Latinitatis
by du Cange [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. 5http://classes-latin-linguistics.fileli.unipi.it.
(CIL-I2-9-26) and Vivia (ILLRP-S-99-8) are categorized
as “non-classical" forms in comparison to the standard
spelling of Classical Latin, which would be dedit and
Vibia respectively. CLaSSES is divided into four sections
based on the place of provenance of the texts: Rome
and Italy, Roman Britain, Sardinia, Egypt and Eastern
Mediterranean. The database includes 3,415 texts, which
were first automatically tokenized, resulting in 46,888
tokens. Then, expert annotators lemmatized the entire
corpus manually, given the high number of incomplete
and misspelt words that cannot be easily processed by
automatic tools. They also provided a meta-linguistic and
extra-linguistic annotation, including additional
information about each document (place of provenance, dating,
text type, author/addressee) and about each token of the
corpus (graphic form, language). Finally, the linguistic
annotation identifies non-classical variants and classifies
them according to the variation phenomena [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>Lemmas
Lexical Entries</p>
      <p>Tokens</p>
      <p>NLP Output
Lexical Resources
- Latin Wordnet
- Valency Lexicon
- Dictionaries...</p>
      <p>Textual Resources
- Digital libraries
- Treebanks
- Textual corpora...</p>
      <p>NLP Tools
- Tokenizers
- Taggers/parsers
- Lemmatizers...</p>
    </sec>
    <sec id="sec-2">
      <title>3. The LiLa Knowledge Base</title>
      <p>sumption that strikes a good balance between feasibility
The aim of the “LiLa - Linking Latin” ERC project (2018- and granularity: textual resources are made of
(occur2023)6 was to reach interoperability between the wealth rences of) words (“tokens”), lexical resources describe
of existing lexical and textual resources that have been properties of words (in “lexical entries”), and NLP tools
developed in the last decades for Latin. One of the main process words (producing “NLP outputs”).8
problems that LiLa solved is the fact that such resources The core of the Knowledge Base is the so-called Lemma
and tools are often characterized by diferent conceptual Bank,9 a collection of about 200,000 Latin lemmas taken
and structural models, which makes it dificult for them from the database of the morphological analyzer LEMLAT
to interact with one another. [12]. Interoperability is achieved by linking all those</p>
      <p>To this goal, LiLa undertook the creation of an open- entries in lexical resources and tokens in corpora that
ended Knowledge Base, following the principles of the point to the same lemma.</p>
      <p>
        Linked Data paradigm.7 All content involved or
referenced in the linguistic resources connected in LiLa is 4. CLaSSES into LiLa
made unambiguously findable and accessible by
assigning an HTTP Uniform Resource Identifier (URI) to each
data point. Data reusability and interoperability between 4.1. Modeling (Meta)data
resources are achieved by establishing links between dif- The Lemma Bank of the LiLa Knowledge Base is
modferent URIs and by using web standards such as: [a] the eled as a collection of Lexical Forms of Ontolex-Lemon.
RDF data model, which is based on triples: (i) a predicate- Lexical Forms are the inflected forms of Lexical
Enproperty connects (ii) a subject (a resource) with (iii) its tries and are assigned one, or more graphical variants
object (another resource, or a literal) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; and [b] SPARQL, (ontolex:writtenRep).10 One of the Lexical Forms of
a query language specifically devised for RDF data. a Lexical Entry is linked to the latter by the property
      </p>
      <p>
        Furthermore, the LiLa Knowledge Base makes refer- ontolex:canonicalForm, to model that it is the form
ence to classes and properties of already existing ontolo- that is canonically chosen to represent the entire lexical
gies to model the relevant information. The main ones entry, i.e., the lemma. As a consequence, the Lemma Bank
are POWLA for corpus data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], OLiA for linguistic an- is not a lexical resource (as it does not contain Lexical
Ennotation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and Ontolex-Lemon for lexical data [
        <xref ref-type="bibr" rid="ref10">10, 11</xref>
        ]. tries), rather it is a collection of Ontolex-Lemon Lexical
      </p>
      <p>Within this framework, LiLa uses the lemma as the Forms that can be used as Canonical Forms in the
remost productive interface between lexical resources,
annotated corpora and NLP tools. Consequently, the archi- 8In Figure 1 the arrows going from and to the node for “NLP Output”
tecture of the LiLa Knowledge Base is highly lexically
based (Figure 1), grounding on a simple, but efective
asrepresent the fact that tokens that are the output of a specific NLP
tool (a tokenizer) become the input of further tools (like, for instance,
a syntactic parser).
9http://lila-erc.eu/lodview/data/id/lemma/LemmaBank.
10http://www.w3.org/ns/lemon/ontolex#writtenRep.</p>
      <sec id="sec-2-1">
        <title>6https://lila-erc.eu/.</title>
        <p>7https://www.w3.org/DesignIssues/LinkedData.html.
sources for Latin to be interlinked in the LiLa Knowledge
Base.</p>
        <p>In particular, textual resources are connected to the
Lemma Bank through the property lila:hasLemma,11
which links a token in a corpus with its lemma in the
Lemma Bank. In LiLa, textual resources are modeled as
objects of the type Corpus from the POWLA ontology .12
Each Corpus includes one, or more powla:Document,13
which are the parts in which the corpus is divided, like for
instance the diferent texts that it contains, or its sections.
In the case of the Corpus entitled CLaSSES, there are 10
documents, corresponding to as many sections of the
resource.14 Every document of CLaSSES is assigned two
layers, namely (1) a Document Layer, which collects all
the tokens of a section, and (2) a Citation Layer, which
records the full citation path of each token of a section.</p>
        <p>For instance, Figure 2 shows the modeling of one
token from CLaSSES. The token (sacra) is linked
to its lemma in the Lemma Bank (sacer) by the
lila:hasLemma property, and to the Document Layer
by the POWLA:hasLayer property.15 The
properties lila:isLayer,16 lila:hasCitSubUnit17 and
POWLA:hasChild18 link the Citation Layer to the
token. In the example of Figure 2, the token sacra occurs
in the inscription number 27 of volume S of the
Document entitled Inscriptiones latinae liberae rei publicae, to
which both its Document and Citation Layers are linked
through the property POWLA:hasDocument.19</p>
        <sec id="sec-2-1-1">
          <title>4.2. Linking Process</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Out of the 46,888 tokens of CLaSSES, only those that are</title>
        <p>assigned a lemma are linked to the Lemma Bank of LiLa.
Around 14k tokens of CLaSSES are not lemmatized due
to the fragmentary nature of the texts contained therein.
By exploiting the original lemmatization of the corpus,
the automatic linking of the tokens of CLaSSES resulted
in the following three output categories.</p>
      </sec>
      <sec id="sec-2-3">
        <title>1. Perfect match (or one-to-one lemma; 25,279</title>
        <p>items), i.e. whenever the lemma-PoS couple in
CLaSSES was linked to one single lemma-PoS
couple in the LiLa Lemma Bank. For such cases,
we conducted an evaluation of the mapping on
10% of the couples. The data were randomly
selected; to ensure that the sample was
representative, the original PoS distribution was maintained.
11https://lila-erc.eu/ontologies/lila/hasLemma.
12http://purl.org/powla/powla.owl#Corpus.
13http://purl.org/powla/powla.owl#Document.
14http://lila-erc.eu/data/corpora/CLaSSES/id/corpus.
15http://purl.org/powla/powla.owl#hasLayer.
16https://lila-erc.eu/lodview/ontologies/lila_corpora/isLayer.
17https://lila-erc.eu/lodview/ontologies/lila_corpora/
hasCitSubUnit.
18http://purl.org/powla/powla.owl#hasChild.
19http://purl.org/powla/powla.owl#hasDocument.</p>
      </sec>
      <sec id="sec-2-4">
        <title>In CLaSSES, 3,490 diferent couples are recorded,</title>
        <p>thus the evaluation was conducted on 349 couples.</p>
        <p>Only 7 errors were found, all due to a wrong PoS
tagging in the source data that caused a mapping
error. Thus, the rate error is very low, i.e., 2%.
2. No match (or one-to-zero lemma; 5,366 items), i.e.</p>
        <p>when the lemma in CLaSSES was not associated
with any lemma in LiLa. In this case, with the
addition of the new lemma in LiLa we have
enriched the Lemma Bank. Proper names are the
category more afected, since inscriptions typically
feature a wide range of anthroponyms which can
identify the committee of the text (e.g., in public
texts), the honorand (e.g., in sacred inscriptions)
or the name of the dead on epitaphs [13]. In
addition, given the wide geographical extension of
our corpus, CLaSSES features local proper names
typical of specific areas (e.g., Sardinia, or Roman
Britain) that do not occur easily in Classical texts;
an example from Sardinia [14, 15] is Scribonissa
in ANRW-B61-6 [15, p. 45]. A few lemmas
pertaining to other parts of speech were also added
to the Lemma Bank, consisting mainly of hapax,
like ansata in BTT-196-47 (lemma ansatus
‘provided with handles’),20 infrascribo in CEL-I-232-8
(lemma infrascribo ‘to write lower down’),21
internumero in BTT-645-48 (lemma internumero ‘to
reckon among other things’).22
3. Ambiguous match (one-to-many lemmas; 1,503
items), i.e. when the lemma in CLaSSES was
associated with several possible lemmas in LiLa. In
most cases, the correct lemma between two or
more possible ones was identified manually by a
20http://lila-erc.eu/data/id/lemma/89148.
21http://lila-erc.eu/data/id/lemma/142756.</p>
        <p>22http://lila-erc.eu/data/id/lemma/142757.
disambiguation based on the linguistic context of Lemma Bank. This is particularly useful for both
quanthe document; this happens, for instance, in the titative and qualitative linguistic analysis. For example,
case of homographs, as for the word dico, linked among the forms found in CLaSSES, it is possible to find
both to d˘ıco, ‘to proclaim’ or ‘to dedicate’23 and occurrences of the same lemma either in the form with
to d¯ıco, ‘to name’, ‘to utter’.24 On rare occasions a single consonant or with a double consonant – such
(29 tokens), it was however not possible to disam- as the name Mummius in the tituli mummiani, which
biguate between the lemmas available in LiLa: as is displayed either in the forms Mumius, in CIL-I2-628,
a consequence, we linked the ambiguous tokens or Mummius, in CIL-I2-627, 629 [13]. The presence of
to all their corresponding lemmas. This was due the alternation between &lt;C&gt; and &lt;CC&gt; in these
into the fragmentary nature of some texts, where scriptions can be interpreted as a sign of an incomplete
an analyzable context for disambiguation was not generalization of consonant doubling at this stage.
Howavailable. This is the case, for example, of BTT- ever, it is fundamental to exclude the possibility that
609-16 mallus (context: [...] mallus alu[...] [...]us), the form Mumius occurring in our corpus represents a
for which two senses are equally possible, that of commonly attested variant of the proper name Mummius.
‘pole’25 and ‘appletree’.26 This information is not readily retrievable in the available
sources, since such spelling variants of proper names are
generally not recorded in the dictionaries. However, by
5. Querying CLaSSES in LiLa collecting the occurrences of the lemma Mummius in the
textual resources interlinked through LiLa, it is possible
Thanks to the interoperability of CLaSSES with the other to ascertain that the variant without consonant doubling
resources for Latin linked to the LiLa Knowledge Base, is never attested in any of the texts provided by such
research questions related to non-literary texts can be resources (e.g., Cicero’s De Lege Agraria, In Verrem and
empirically investigated on the several diferent textual Tacitus’ Annales, included in the LASLA corpus).28 Thus,
resources interlinked in the Knowledge Base by running we may assume that the form Mumius found in
CIL-I2queries on the SPARQL endpoint of LiLa27. By focusing
628 is a hint of the incomplete generalization of geminatio
on the question of spelling variants attested in the in- consonantium, in line with the chronology proposed in
scriptions, in what follows we shortly consider two case the literature.
studies, i.e., consonant doubling (see 5.1), and the writing
of long /i:/ through the diphthong &lt;EI&gt; (see 5.2).
Moreover, we report and briefly discuss a query that exploits 5.2. &lt;EI&gt; for /i:/
the information on derivational morphology recorded in
the Lemma Bank (see 5.3).</p>
      </sec>
      <sec id="sec-2-5">
        <title>The linking of the tokens of CLaSSES to the Lemma Bank</title>
        <p>of LiLa can also shed light on the writing of /i:/ through
&lt;EI&gt; in Latin sources. It is known from the literature
5.1. Consonant doubling [19, 20, 21, 22] that, in the ‘urban’ Latin of the city of
Rome, the monophthongization of the diphthong [ej]
As is known, the spelling of Latin long consonants took place in two steps: (i) [ej] &gt; [e:],29 between the 3rd
through geminatio consonantium was introduced at the and mid-2nd century BCE; (ii) [e:] &gt; [i:], between the 2nd
end of the third century BCE [16, 17, 18]. Consonant dou- and 1st century BCE. The data from CLaSSES, obtained
bling, however, generalized slowly, so it was seldom omit- through the function "Search for linguistic phenomena"
ted in the second century BCE in inscriptions. For exam- (label "Diphthong - Classical &lt;I&gt; /¯ı/ = &lt;EI&gt;"), confirm
ple, in the 2nd-century inscriptions included in CLaSSES, the traditional picture, indicating that the spelling &lt;EI&gt;
28 tokens (20 lemmas) display single for double conso- for /i:/ is either a conservative spelling retained in earlier
nants over 72 spellings with geminatio consonantium. documents, or an archaizing feature that characterizes
These tokens can be easily retrieved through the func- the solemn register of later public and oficial
inscription "Search for linguistic phenomena" available in the tions. More in detail, in CLaSSES the spelling &lt;EI&gt; for
CLaSSES online search interface, by selecting the label /i:/ is found in 225 occurrences (99 lemmas), mainly in
"single pro double consonant". Thanks to the interoper- older public inscriptions, before the 1st century BCE (212
ability between distributed resources provided by LiLa, occurrences over 225). A more comprehensive view of
it is possible to search the occurrences of the lemmas this phenomenon can be obtained thanks to the
interopfor these tokens in the corpora interlinked through the erability between diferent Latin corpora made possible
in LiLa. By running a query on the corpora interlinked
in the Knowledge Base, it is possible to collect all the
tokens linked to the 99 lemmas concerned and select those
23http://lila-erc.eu/data/id/lemma/99301.
24http://lila-erc.eu/data/id/lemma/99302.
25http://lila-erc.eu/data/id/lemma/111421.
26http://lila-erc.eu/data/id/lemma/111423.
27https://lila-erc.eu/sparql/.
28https://lila-erc.eu/data/corpora/Lasla/id/corpus.
29Possibly a long lax [i:] [20, 23].
where the spelling &lt;EI&gt; for /i:/ takes place.</p>
        <p>For instance, of particular interest is the form sei for
s¯ı ‘if’ that is found in Archaic Latin. By using LiLa, it
is possible to find that out of the 22,161 occurrences of
si in the corpora interlinked therein, 10 show the form
sei. One relevant example is from Plautus’ Epidicus (Ep.
567, twice). These 2 occurrences of sei, which in LiLa are
recorded as 2 tokens from the LASLA corpus, testify to
the above-mentioned first step of the
monophthongization process ([ej] &gt; [e:]), which takes place in the age of
Plautus and which is attested elsewhere in his works.</p>
        <sec id="sec-2-5-1">
          <title>5.3. Derivational Morphology</title>
          <p>• con-, 440 occurrences (6.6%).</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>These afixes have a very diferent distribution in</title>
        <p>LASLA, in which only con- is among the most frequent
afifxes, with 32,763 occurrences (7,9%), whereas -in counts
just for 1.2% of all afixes extracted from the corpus (5,137
occ.), -(t)or for 2.3% (9,593 occ.), and -t for 1% (4,024 occ.).</p>
        <p>Such diferences are largely due to a number of lexemes
that are highly frequent in epigraphic texts, in particular
dominus ‘master’ (198 occ.) for the sufix -in and
imperator ‘general, emperor’ (153 occ.) for the sufix -(t)or,
which are most frequent in public inscriptions, or
libertus/liberta ‘freedman’ (281 occ.) for the sufix -t, which is
most frequent in funerary inscriptions, where the epitaph
often refers to the civil status of freed slaves. Therefore,
even if there is a major diference in dimension between
the two corpora, a query such as the one here illustrated
can bring to light specificities of the corpus CLaSSES that
go beyond the lexical level and that could not be observed
without comparison with other resources.</p>
        <p>So far, we have discussed some very easy queries on
specific lexical items that can be performed to compare
information provided by CLaSSES to that provided by
other resources. However, LiLa allows not only to
explore and compare single corpora at the lexical level (via
the Lemma Bank), but also to conduct in-depth linguistic
analysis, concerning, for instance, morphology. For
example, it is possible to compare the type and number of 6. Conclusion and Future Work
afixes found in CLaSSES, investigating how many (and
which type of) derivational morphemes are represented The linking of CLaSSES into LiLa represents an added
in non-literary texts. The list of afixes that build up the value for both the resources. As for CLaSSES, its
lexicon of CLaSSES can be accessed with a SPARQL query (meta)data are now interoperable with the other
rethat retrieves all the lemmas in the CLaSSES corpus that sources interlinked in the Knowledge Base. As for LiLa,
feature an afix (either prefix, or sufix) in their morpho- the non-literary texts of CLaSSES increased significantly
logical form, and reports the number of their occurrences its textual coverage, both in terms of size and in terms of
therein (see Listing 130). register variation.</p>
        <p>Morphological information was not annotated in In the near future, we plan to model and interlink in
CLaSSES. Thus, the link to LiLa allows to conduct more LiLa other types of metadata provided by CLaSSES, such
in-depth linguistic analyisis; most importantly, it also as information about the provenance and the dating of
allows users to compare diferent corpora with relation the texts. We plan to start from metadata on the time span
to specific linguistic features. For instance, it is possi- of the texts, that we will model as Linked Data using data
ble to investigate to what extent the derivational mor- categories and properties from the CIDOC Conceptual
phology found in non-literary texts deviates from that Reference Model.31
of Classical texts by performing the very same query
on the LASLA corpus, by simply replacing the URI
for CLaSSES in the SPARQL query (as subject of the Acknowledgments
powla:hasSubDocument property) with that for the
LASLA corpus: http://lila-erc.eu/data/corpora/Lasla/id/
corpus.</p>
        <p>The afixes that most frequently occur in the CLaSSES
corpus are three sufixes and a prefix:</p>
      </sec>
      <sec id="sec-2-7">
        <title>The “LiLa - Linking Latin” project has received funding</title>
        <p>from the European Research Council (ERC) under the
European Union’s Horizon 2020 research and innovation
programme – Grant Agreement No. 769994. The work
of Lucia Tamponi is partly funded by the PRIN Project
“Ancient languages and writing systems in contact: a
touchstone for language change”, prot. 2017JBFP9H.
• -in, 486 occurrences (7.3% of afixes extracted</p>
        <p>from the corpus);
• -(t)or, 456 occurrences (6.9%);
• -t, 442 occurrences (6.7%);
30The query outputs a table with four columns: the label of
the lemma (?lemmaLabel), the type of afix, either prefix
or sufix ( ?affixType), the label of the afix ( ?affixLabel)
and the total number of tokens for the lemma in CLaSSES
((count(?tokenClasses) as ?count)).
31https://www.cidoc-crm.org/.</p>
        <p>Listing 1: A SPARQL query on the LiLa Knowledge Base
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX lila: &lt;http://lila-erc.eu/ontologies/lila/&gt;
PREFIX dc: &lt;http://purl.org/dc/elements/1.1/&gt;
PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX powla: &lt;http://purl.org/powla/powla.owl#&gt;
SELECT ?lemmaLabel ?affixType ?affixLabel (count(?tokenClasses) as ?count)
WHERE {
?lemmaLiLa a lila:Lemma ;
(lila:hasPrefix|lila:hasSuffix) ?affix ;
rdfs:label ?lemmaLabel .
?affix rdfs:label ?affixLabel ;</p>
        <p>rdf:type ?affixType .
?tokenClasses lila:hasLemma ?lemmaLiLa ;
powla:hasLayer ?docLayer ;
rdfs:label ?tokenClassesLabel .
?docLayer powla:hasDocument ?subDocLayer .
?subDocLayer dc:title ?titlesubDocLayer .</p>
        <p>&lt;http://lila-erc.eu/data/corpora/CLaSSES/id/corpus&gt; powla:hasSubDocument ?subDocLayer .
}
GROUP BY ?lemmaLabel ?affixType ?affixLabel
ORDER BY DESC(?count)</p>
        <p>Bretschneider, Roma, 2019, pp. 13–53.
[18] L. Tamponi, La geminatio consonantium: studio su
un corpus di epigrafi latine anteriori al I secolo d.C.,</p>
        <p>Studi e Saggi Linguistici 60 (2022) 29–50.
[19] M. Niedermann, Phonétique historique du latin,
Li</p>
        <p>brairie C. Klincksieck, Paris, 1953.
[20] M. Leumann, Lateinische Laut- und Formenlehre,</p>
        <p>Beck, Munchen, 1977.
[21] W. S. Allen, Vox latina: A Guide to the
Pronunciation of Classical Latin, Cambridge University Press,</p>
        <p>Cambridge, 1978.
[22] M. Mancini, Dilatandis litteris: uno studio su
Cicerone e la pronunzia ‘rustica’, in: R. Bombi, G.
Cifoletti, F. Fusco, L. Innocente, V. Orioles (Eds.), Studi
linguistici in onore di Roberto Gusmani, Edizioni
dell’Orso, Alessandria, 2006, pp. 1023–1046.
[23] M. Benedetti, G. Marotta, Monottongazione e
geminazione in latino: nuovi elementi a favore
dell’isocronismo sillabico, in: P. Molinelli, P.
Cuzzolin, C. Fedriani (Eds.), Latin Vulgaire–Latin Tardif,
Actes du Xe Colloque International sur le Latin
Vulgaire et Tardif, Sestante, Bergamo, 2014, pp. 25–43.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Short</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Latin</given-names>
            <surname>Dictionary</surname>
          </string-name>
          .
          <article-title>Founded on Andrews' edition of Freund's Latin dictionary</article-title>
          , Clarendon Press, Oxford,
          <year>1879</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>d</surname>
          </string-name>
          . F. s. du Cange, bénédictins de la congréga- Ponsoda, T. Declerck, Ontology Lexicalization: tion de Saint-Maur,
          <string-name>
            <given-names>d. P.</given-names>
            <surname>Carpentier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Adelung</surname>
          </string-name>
          ,
          <article-title>The lemon Perspective</article-title>
          , in: Proceedings of the G.
          <string-name>
            <surname>A. L. Henschel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Diefenbach</surname>
          </string-name>
          , L. Favre, Glos- Workshops-9th
          <source>International Conference on Termisarium mediae et infimae latinitatis, Favre, Niort, nology and Artificial Intelligence (TIA</source>
          <year>2011</year>
          ),
          <year>2011</year>
          , France,
          <fpage>1883</fpage>
          -
          <lpage>1887</lpage>
          . pp.
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marotta</surname>
          </string-name>
          ,
          <article-title>Talking stones</article-title>
          . Phonology in Latin [11]
          <string-name>
            <surname>J. McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bosque-Gil</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gracia</surname>
          </string-name>
          , P. Buitelaar, inscriptions?,
          <source>Studi e Saggi Linguistici</source>
          <volume>53</volume>
          (
          <year>2015</year>
          ) P. Cimiano, The
          <string-name>
            <surname>OntoLex-Lemon</surname>
            <given-names>Model</given-names>
          </string-name>
          :
          <fpage>Devel39</fpage>
          -
          <lpage>63</lpage>
          . opment and Applications, in: Proceedings of eLex,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marotta</surname>
          </string-name>
          , Sociolinguistica storica ed epigrafia
          <year>2017</year>
          , pp.
          <fpage>587</fpage>
          -
          <lpage>597</lpage>
          . latina: il
          <source>corpus CLaSSES I, Linguarum Varietas</source>
          <volume>5</volume>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Budassi</surname>
          </string-name>
          , E. Litta,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rufolo</surname>
          </string-name>
          , The (
          <year>2016</year>
          )
          <fpage>145</fpage>
          -
          <lpage>159</lpage>
          .
          <article-title>Lemlat 3.0 Package for Morphological Analysis of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>I. De Felice</surname>
          </string-name>
          , G. Marotta, M. Donati,
          <article-title>CLaSSES: A new Latin, in: Proceedings of the NoDaLiDa 2017 Workdigital resource for Latin epigraphy</article-title>
          ,
          <source>Italian Journal shop on Processing Historical Language</source>
          ,
          <year>2017</year>
          , pp.
          <source>of Computational Linguistics</source>
          <volume>1</volume>
          (
          <year>2015</year>
          )
          <fpage>119</fpage>
          -
          <lpage>130</lpage>
          .
          <fpage>24</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rovai</surname>
          </string-name>
          , I. De Felice, L. Tamponi, [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tamponi</surname>
          </string-name>
          ,
          <article-title>Consonant gemination in Latin epigCLaSSES: Orthographic variation in non-literary raphy between variation and standard</article-title>
          , in: Latin Latin,
          <source>Studi e Saggi Linguistici</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>39</fpage>
          -
          <lpage>65</lpage>
          .
          <string-name>
            <surname>Vulgaire-Latin Tardif</surname>
            <given-names>XIV</given-names>
          </string-name>
          ,
          <article-title>Actes du XIVème col-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Swick</surname>
          </string-name>
          ,
          <article-title>Resource Description Frame- loque international sur le latin vulgaire et tardif, work (RDF) Model</article-title>
          and
          <string-name>
            <given-names>Syntax</given-names>
            <surname>Specification</surname>
          </string-name>
          ,
          <year>1998</year>
          . Brepols, Turnhout, Forthcoming.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chiarcos</surname>
          </string-name>
          , POWLA: Modeling Linguistic Cor- [14]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Rowland</surname>
          </string-name>
          ,
          <article-title>Onomastic remarks on Roman Sarpora in OWL/DL</article-title>
          , in: E. Simperl, P. Cimi- dinia,
          <source>Names</source>
          <volume>21</volume>
          (
          <year>1973</year>
          )
          <fpage>82</fpage>
          -
          <lpage>102</lpage>
          . ano, A.
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            , V. Presutti (Eds.), [15]
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lupinu</surname>
          </string-name>
          ,
          <article-title>Latino epigrafico della Sardegna: aspetti The Semantic Web: Research and Applications, fonetici</article-title>
          , Ilisso, Nuoro,
          <source>2000. Lecture Notes in Computer Science</source>
          , Springer, [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mancini</surname>
          </string-name>
          , Lucilius and Nigidius Figulus on orBerlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>239</lpage>
          . doi:10. thographic iconicity,
          <source>Journal of Latin Linguistics</source>
          <volume>1007</volume>
          /
          <fpage>978</fpage>
          -3-
          <fpage>642</fpage>
          -30284-8_
          <fpage>22</fpage>
          . 18 (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chiarcos</surname>
          </string-name>
          , M. Sukhareva, OLiA - Ontologies [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mancini</surname>
          </string-name>
          ,
          <article-title>Repertori grafici e regole d'uso: il caso of Linguistic Annotation, Semantic Web 6 (2015) del latino &lt;XS&gt;</article-title>
          , in: L.
          <string-name>
            <surname>Agostiniani</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          <string-name>
            <surname>Marchese</surname>
          </string-name>
          379-
          <fpage>386</fpage>
          . (Eds.), Lingua, testi, storia. Atti della giornata di
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Montiel- studi in ricordo di Aldo Luigi Prosdocimi</article-title>
          , Giorgio
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>