<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linking CompL-it to the LiITA Knowledge Base</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eleonora Litta</string-name>
          <email>eleonoramaria.litta@unicatt.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Passarotti</string-name>
          <email>marco.passarotti@unicatt.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Moretti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Brasolin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Mambrini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <email>valerio.basile@unito.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Di Fabio</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eliana Di Palma</string-name>
          <email>eliana.dipalma@unito.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emiliano Giovannetti</string-name>
          <email>emiliano.giovannetti@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Marchi</string-name>
          <email>simone.marchi@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Bellandi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Flavia Sciolette</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cnr-Istituto di Linguistica Computazionale "A. Zampolli"</institution>
          ,
          <addr-line>Via G. Moruzzi 1, 56124 Pisa</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università Cattolica del Sacro Cuore</institution>
          ,
          <addr-line>Largo Gemelli 1, 20123 Milano</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università di Torino</institution>
          ,
          <addr-line>Via Verdi 8, 10124 Torino</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents the integration of CompL-it, a Linked Open Data (LOD) computational lexicon for contemporary Italian, into LiITA (Linking Italian), a Knowledge Base (KB) designed for linguistic interoperability. CompL-it contains over 101k lexical entries enriched with detailed morphological and semantic information, derived from multiple authoritative sources and modelled using the OntoLex-Lemon vocabulary. The linking process involved aligning lexical entries with lemmas in the LiITA's Lemma Bank (LB), addressing both exact and ambiguous matches through systematic and semantically informed strategies. Moreover, 12,739 new lemmas were added to the LiITA LB. This integration enhances the expressiveness and interoperability of LiITA, enabling complex SPARQL queries that exploit the semantic network encoded in CompL-it. Examples are provided to demonstrate the advantages of querying interlinked resources.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linked Open Data</kwd>
        <kwd>Italian</kwd>
        <kwd>language resources</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        2. LiITA - Architecture
chitecture, by following the LOD principles. In LiLa,
lemmas act as pivots between textual data (composed
by tokenised texts) and lexical metadata (compiled by In the LiITA LB, lemmas are represented using a
dedilexical entries). Lemmas are collected in a Lemma Bank cated ontology,8 inherited from LiLa, which was
specif(LB) to serve as the nexus for integrating distributed ically developed to capture the morphological and
linlinguistic resources and enabling seamless connections guistic characteristics of Latin. This ontology encodes
across heterogeneous datasets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This architecture has features such as Part-of-Speech (PoS), gender, and
innot only proven efective in unifying Latin resources, lfectional properties, drawing on the OLiA annotation
but has also demonstrated its adaptability to other lan- framework [
        <xref ref-type="bibr" rid="ref7">7, 151–155</xref>
        ] to ensure consistency and formal
guages. Building upon the LiLa framework, the LiITA interoperability.
(Linking Italian) Knowledge Base has been conceived as a The ontology also defines the essential Classes and
Knowledge Base for Italian linguistic resources[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. LiITA Properties required for modelling lemmatisation. Among
inherits the lemma-centric design, constructing a LB for these is the Property lila:hasLemma,9 which associates
Italian. This LB, initially comprising over 113,000 entries lemmas with the tokens they annotate within a corpus.
extracted from the Nuovo De Mauro dictionary,6 is metic- Within the OntoLex-Lemon model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], lexical
ulously curated to support interoperability, particularly forms can have one or more graphical variants,
in the context of divergent lemmatisation standards. By captured using the Property ontolex:writtenRep
modelling each lemma using the OntoLex-Lemon vocab- (http://www.w3.org/ns/lemon/ontolex#writtenRep),
ulary and a shared ontology derived from LiLa, LiITA as well as phonetic realisations, specified
ensures that lexical entries and their associated textual by the Property ontolex:phoneticRep
occurrences can be connected across otherwise incom- (http://www.w3.org/ns/lemon/ontolex#phoneticRep).
patible datasets. Its architecture not only allows for the The Property ontolex:canonicalForm
integration of existing datasets but also accommodates (http://www.w3.org/ns/lemon/ontolex#canonicalForm)
the dynamic evolution of linguistic knowledge as new identifies the standard or representative form within an
resources become available in the KB, in an ever-growing inflectional paradigm.
fashion. The LiITA LB is composed of such canonical forms,
      </p>
      <p>
        As part of its ongoing development, LiITA is currently which are represented as instances of the Class
in the process of interlinking via its LB several key lexi- lila:Lemma,10 a subclass of ontolex:Form within
cal and textual resources. These include the Vocabolario the OntoLex-Lemon ontology. Moreover, the class
della Lingua Parmigiana glossary, a bilingual lexicon hav- lila:Hypolemma, a subclass of lila:Lemma, is used
ing Italian entries and the corresponding translations to represent citation forms that belong to a word’s
reguin Parmigiano,7 and CompL-it[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a computational lexi- lar inflectional paradigm but receive a diferent PoS tag
con for Italian already published as Linked Open Data. than the lemma. It is the case of participles such as
amThis paper describes the process of linking the compu- ato ‘loved’, adjective, which is part of the inflectional
tational lexicon CompL-it to LiITA and it is structured paradigm of amare, ‘to love’, verb.
as follows: Section 2 contains a short description of the With respect to morphological annotation, each lemma
LiITA architecture, section 3 contains a description of in the LB is assigned a Part-of-Speech label using the
the CompL-it resource and of how it is modelled in RDF; Property lila:hasPos,11 in accordance with the UPOS
Section 4 describes the process of linking to the LiITA (Universal POS) tag set [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>KB and how the LiITA LB has been enriched by the addi- The LiITA LB is not made of lexical entries because it
tion of new lemmas from CompL-it; Section 5 contains does not function as an autonomous lexical resource.
examples of the advantages given by the linking of the Rather, it constitutes a curated repository of
canoniCompL-it resource to LiITA, including an example of a cal forms that (i) is intended to grow progressively as
SPARQL queries performed on the current KB; Section 6 new sources, including those containing previously
undraws conclusions and outlines future perspectives and recorded lemmas, are integrated, and (ii) serves as a
foundevelopments. dation for both text lemmatisation and the indexing of
lexical entries within distributed resources published as
LOD.</p>
      <p>However, linguistic resources often adopt
heterogeneous tag sets, standards, and annotation schemes,
particularly with respect to lemmatisation.</p>
      <sec id="sec-1-1">
        <title>6https://dizionario.internazionale.it/</title>
        <p>7https://github.com/LiITA-LOD/LocalVarieties/tree/main/
Parmigiano</p>
      </sec>
      <sec id="sec-1-2">
        <title>8http://lila-erc.eu/ontologies/lila/ 9http://lila-erc.eu/ontologies/lila/hasLemma 10http://lila-erc.eu/ontologies/lila/Lemma 11http://lila-erc.eu/ontologies/lila/hasPOS</title>
        <p>To accommodate this variation in lemmatisation ap- tic and phonological). The lemmas of the resources have
proaches found across linguistic resources, the LiITA LB been converted as Lexical Entries of the OntoLex-Lemon
defines two specialised Properties. The first is the sym- model and the forms as Lexical Forms; regarding the
metric Property lila:lemmaVariant,12 which links PoS and the morphological traits (e.g. gender, number),
diferent forms within the same inflectional paradigm each of the three resources had a diferent vocabulary
that may be used as lemmas, while maintaining their as- for describing them. Therefore, they were mapped and
sociated PoS. A common case involves *pluralia tantum*, converted according to the LexInfo vocabulary, the main
which can appear as either singular or plural lemmas. For linguistic ontology for OntoLex-Lemon model.
example, both the plural occhiali and the singular occhiale The strength of CompL-it, however, is the semantic
(‘glasses/optical instrument’) are represented as distinct layer, partly converted from LexicO; it is worth noting
lila:Lemma , connected via the lila:lemmaVariant that the senses in CompL-it (derived from LexicO, since
Property. there are no senses in either M-GLF or treebanks) are</p>
        <p>In contrast, the Property lila:hasHypolemma,13 richly described through a vocabulary consisting of 137
along with its inverse relation lila:isHypolemma,14 relations, divided in eight classes. Where possible, some
is used to relate a lila:Lemma to a lila:Hypolemma. relations have been mapped to LexInfo19, otherwise,
cus</p>
        <p>By means of this modelling framework, the LB pro- tom object properties were created. The conversion of the
vides a coherent structure capable of accommodating data thus prepared, coming from the three sources into
divergent lemmatisation practices. For example, some OntoLex-Lemon, was performed by an algorithm in two
resources lemmatise participles under their participial steps: i) conversion of the linguistic information
accordform, while others prefer the base verbal form. Thanks to ing to the formalisation described in the core ontolex
this flexible architecture, such diferences can be recon- module of the model; ii) serialisation of the data into
Turciled, thereby promoting interoperability across corpora tle. The obtained lexicon was then loaded into Ontotext
and lexical resources employing distinct lemmatisation GraphDB20, a semantic repository compliant with RDF
conventions. and SPARQL21.</p>
        <p>The following is an example of an RDF
OntoLexLemon representation of a CompL-it lexical entry in
Tur3. CompL-it tle format.</p>
        <p>CompL-it is a computational lexicon for contemporary :coniglio_entry a ontolex:Word;
Italian, modelled according to the already cited OntoLex- lexinfo:partOfSpeech lexinfo:noun;
Lemon model, the de facto standard for lexical resources oonnttoolleexx::coatnhoenriFcoarlmFcoornmicgolniiog_lfioor_ml_e1;mma;
and compliant with the principles of LOD. This resource ontolex:sense coniglio_sense_1, coniglio_sense_2,
was created by merging three diferent sources of data: coniglio_sense_3 .</p>
        <p>
          M-GLF (MAGIC-Generated Lemmatized Forms), a list
of lemmatised forms with morphological information coniglio_lemma a ontolex:Form;
generated by the MAGIC tool, a morphological analyser lleexxiinnffoo::nguemnbdeerr lleexxiinnffoo::msaisncguullianr;e;
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; a set of Italian language treebanks available ontolex:writtenRep "coniglio"@it, "rabbit"@en .
through the UD repository (Italian Stanford Dependency
Treebank, ISDT15; Venice Italian Treebank, VIT16; Paral- coniglio_form_1 a ontolex:Form;
lelTut, ParTut17; ParlaMint-It18); the computational lexi- lexinfo:gender lexinfo:masculine;
con LexicO [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which constitutes the base of the entire olnetxoilnefxo::wnruimtbteernlReexpi"ncfoon:pilgulri"a@l;it, "rabbits"@en .
resource, from the point of view of the model.
        </p>
        <p>
          LexicO represents the revised version of another im- coniglio_sense_1 a ontolex:LexicalSense;
portant resource in the framework of Italian Lexicogra- skos:definition "mammifero della famiglia dei
phy, Parole-Simple-Clips [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], with which it shares the Leporidi, con pelame di vario colore, lunghe
same model based on the theory of Generative Lexicon ionrceicscihviie,"@oictc,hi"Mgarammnadil oefsptohregLeenptoireidgareofsasmiily,
by James Pustejovsky [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], with four diferent layers of with variously colored fur, long ears, large,
linguistic information (morphological, semantic, syntac- protruding eyes and large incisors"@en;
lexinfo:hyponym mammifero_sense;
simple:polysemyAnimalFood coniglio_sense_3 .
12http://lila-erc.eu/ontologies/lila/lemmaVariant
13http://lila-erc.eu/ontologies/lila/hasHypolemma
14http://lila-erc.eu/ontologies/lila/isHypolemma
15https://github.com/UniversalDependencies/UD_Italian-ISDT
16https://github.com/UniversalDependencies/UD_Italian-VIT
17https://github.com/UniversalDependencies/UD_Italian-ParTUT
18https://github.com/UniversalDependencies/UD_
        </p>
        <p>Italian-ParlaMint
coniglio_sense_2 a ontolex:LexicalSense;</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Linking</title>
      <p>skos:definition "persona timida e molto paurosa"@it,</p>
      <p>"shy and very fearful person"@en;
lexinfo:hyponym persona_sense;
simple:metaphor coniglio_sense_1 .</p>
      <p>Linking a lexical resource to the LiITA LB entails
establishing a relationship between the lexical entries
coniglio_sense_3 a ontolex:LexicalSense; of the resource and the lemmas in the LB. Typically,
skos:definition "carne dell’omonimo animale"@it, this process begins with modeling the resource as a
"meat of the animal"@en . LOD resource, followed by creating the connections
between the resource’s entries and the LB lemmas.</p>
      <p>Modelling the link between CompL-it and LiITA was,</p>
      <p>In this example, the lexical entry coniglio (rabbit) is however, relatively straightforward. One of the main
linked to two word forms: one designated as the canon- advantages of integrating a resource that already
adical form (lemma), and the other corresponding to the heres to LOD standards is that each CompL-it entry,
plural form conigli (rabbits). Both forms are annotated already represented as an ontolex:Word, a subclass
with the appropriate morphological features. of ontolex:LexicalEntry, can be directly linked to</p>
      <p>The lexical entry is also connected, via the LiITA via the ontolex:canonicalForm relation.
ontolex:sense property, which links lexical en- The linking process between CompL-it and LiITA
betries to their semantic interpretations, to three lexical gins necessarily with a mapping between the diferent
senses, each of which includes a definition expressed in PoS tags used in CompL-it, which are described using
Lexnatural language. info, and the UPOS tagset used in LiITA. Table 2 shows</p>
      <p>Furthermore, the first two senses are semantically en- the PoS mapping between the two tagsets operated on
riched through relations that connect them to other lexi- the data before matching CompL-it entries with LiITA
cal senses in the resource. For instance, rabbit_sense_2 is lemmas.
modelled as a hyponym of mammal_sense. Subsequently a match between CompL-it lexical
en</p>
      <p>CompL-it contains 101,795 lexical entries (comprising tries and lemmas in LiITA was performed on the
lemmaa total of 791,541 word forms), classified with 36 PoS PoS pair. Out of over 101k lexical entries in CompL-it,
categories and described with morphological traits; from the matching process yielded the following results:
a semantic standpoint, CompL-it describes 55,713 word
senses connected to each other through 137 types of
semantic relations, totaling 86,577 instances.</p>
      <p>Table 1 shows a distribution of the 10 most numerous
types of semantic relation instances:</p>
      <p>Semantic relation
# instances</p>
      <p>an example
hyponym
approximateSynonym
usedFor
partMeronym
partHolonym
createdBy
ObjectOfTheActivity
memberMeronym
ResultingState
memberHolonym
other
total
43,069
5,666
3,291
3,159
3,159
2,857
1,366
1,318
1,063
979
• 1:1 match: 83,340 lexical entries (an exact match
between a CompL-it lexical entry and a LiITA
lemma + PoS combination)
• 1:N match: 4,219 lexical entries (more than one
potential lemma-POS pairs in LiITA
corresponding to a single CompL-it lexical entry)
• 1:0 match: 14,314 lexical entries (no
correspond</p>
      <p>ing lemma-POS pair found in LiITA)</p>
      <p>The linking is operationalised using the
ontolex:canonicalForm relation, which
connects a CompL-it lexical entry to a corresponding lemma
in LiITA. For example:
http://lexica/mylexicon#MUSmerendaNOUN
ontolex:canonicalForm
http://liita.it/data/id/lemma/1010136
(merenda)</p>
      <p>Disambiguation of 1:N matches posed a significant
challenge. At the time of this initial linking efort,
CompLit was the first external resource to be linked to the LiITA
LB, meaning that no additional semantic cues, such as
sense distinctions or contextual usage, were yet available
in the lemma database. As a result, each lemma in LiITA
was limited to grammatical information such as PoS,
gender, or conjugation and reflexivity (for verbs). Although,
as noted in Section 1, the lemmas were extracted from
adjective
adposition
adverb
article
auxiliary
cardinalNumeral
commonNoun
conjunction
coordinatingConjunction
definiteArticle
demonstrativeDeterminer
demonstrativePronoun
determiner
exclamativeDeterminer
exclamativePronoun
fusedPreposition
indefiniteArticle
indefiniteDeterminer
indefinitePronoun
interjection
interrogativeAdverb
interrogativeDeterminer
interrogativePronoun
noun
numeral
numeralDeterminer
numeralPronoun
particle
personalPronoun
possessiveAdjective
possessiveDeterminer
possessivePronoun
pronoun
relativeDeterminer
relativePronoun
subordinatingConjunction
verb
ADJ
ADP
ADV
DET
VERB
NUM</p>
      <p>NOUN
SCONJ-ADV</p>
      <p>CCONJ</p>
      <p>DET
DET
PRON
DET
DET
PRON
ADP
DET
DET
PRON
INTJ
ADV
DET
PRON
NOUN
NUM
DET
PRON
PART
PRON
ADJ
DET
PRON
PRON
DET
PRON
SCONJ</p>
      <p>VERB</p>
      <p>The CompL-it resource contains a substantial
number of words in plural form. Entries such as pantaloni
(“trousers”) and mutande (“underpants”), braccia, ottavi,
which refers to the “round of 16” in a tournament
setting, have been added to the LB. In such cases the new
lemma has been linked to their singular variant in the LB
with the Property lila:lemmaVariant as described in
Section 2.</p>
      <p>A few additional noteworthy inclusion strategies from
the CompL-it resource that have been adopted are
outlined below:
• Truncated word forms, such as quest’, nessun’,
and verun, have been added as written
representations of existing lemmas.
• Adjectives and determiners occurring in
feminine or plural forms have been systematically
linked to their corresponding singular masculine
lemmas in LiITA.
• Adverbial forms that appear to be derived from
adjectives, pronouns, or determiners (e.g., quante,
prese) have been included in the resource as
hypolemmas of their corresponding base entries.</p>
      <p>This modelling choice ensures compatibility with
texts in which such adverbial forms are
lemmatised under their base categories—namely,
adjectives, pronouns, or determiners—thereby
promoting consistency across heterogeneous
lemmatisation practices.
• Composite pronouns, such as glieli, glielo,
gliene, and others, have also been included in the
LB, following the same rationale outlined above.</p>
      <p>This ensures alignment with sources in which
these forms are treated as distinct lemmas (as
opposed to split into e.g. glielo gli + lo)
• Orthographic errors (e.g., perchè, with grave
accent on the final e, instead of the correct
perché) have been linked to the appropriate lemma,
although their incorrect spellings have not been
recorded as alternative written representations.
the Nuovo De Mauro Dictionary, no sense-level metadata
was incorporated from the dictionary.</p>
      <p>In the absence of semantic information, we adopted 5. Querying CompL-it in LiITA
a pragmatic yet arbitrary strategy for disambiguation:
where multiple LiITA lemmas shared the same form and One of the key advantages of storing data in RDF is the
PoS, we selected the lemma that appears first in the LB ability to formulate federated SPARQL queries that
re(by id). While this approach lacks empirical grounding, trieve information from datasets distributed across
mulit provided a consistent criterion for initiating the align- tiple endpoints. Examples of SPARQL queries performed
ment process. on the LiITA Knowledge Base are continuously added</p>
      <p>In cases involving a 1:0 match, the correspondence to https://www.liita.it/?page_id=158. The integration of
with the string may be either complete—for instance, in CompL-it into the LiITA Knowledge Base enables the
the case of a previously unseen word—or partial, as when exploitation of its rich semantic network and facilitates
inflected forms of lemmas already present in the LB are interoperability with other linked linguistic resources.
encountered. The strategy for inclusion varies according For instance, it becomes possible to retrieve Italian lexical
to the characteristics of the lexical resource being linked. entries linked to CompL-it whose definitions begin with
PREFIX lime: &lt;http://www.w3.org/ns/lemon/lime#&gt;
PREFIX vartrans: &lt;http://www.w3.org/ns/lemon/</p>
      <p>vartrans#&gt;
PREFIX rdfs:
&lt;http://www.w3.org/2000/01/rdf</p>
      <p>schema#&gt;
PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core</p>
      <p>#&gt;
PREFIX dct: &lt;http://purl.org/dc/terms/&gt;
PREFIX onto: &lt;http://www.ontotext.com/&gt;
PREFIX lexinfo: &lt;http://www.lexinfo.net/ontology</p>
      <p>/3.0/lexinfo#&gt;
PREFIX ontolex: &lt;http://www.w3.org/ns/lemon/</p>
      <p>ontolex#&gt;
PREFIX rdf:
&lt;http://www.w3.org/1999/02/22-rdf</p>
      <p>syntax-ns#&gt;
uccello (bird) and to display their corresponding transla- ?leParLexiconPar ontolex:canonicalForm ?
tions in the Parmigiano Glossary, another resource linked parmigianoLemma .
to LiITA.22 It is interesting to explore the added value that ?parmigianoLemma ontolex:writtenRep ?wr
CompL-it contributes through its dense network of se- }GROUP BY ?senseHyponym ?liitaLemma ?
mantic relations. For instance, one of the example queries parmigianoLemma ?wr
provided on the LiITA website retrieves lexical entries ORDER BY ASC(?wr)
associated with color by filtering definitions that begin
with the string colore (“colour”). While this method yields
relevant results, a more semantically informed strategy The query interrogates the CompL-it repository hosted
involves querying for all hyponyms of the specific sense in GraphDB to extract lexical entries classified as nouns,
of the lemma colore defined as "qualità dei corpi per cui whose written representation is colore and which are
essi riflettono in vario modo la luce" (“property of bodies associated with a sense that has at least one hyponym.
by which they reflect light in various ways”). Below is Additionally, it retrieves all the available definitions of
the SPARQL query text retrieving all the hyponyms of such hyponyms. Subsequently, the query accesses the
colore. local LiITA graph to extract the Italian written
representation of each hyponym, identify the corresponding
lexical entry, verify its inclusion in the Parmigiano
lexicon, and retrieve its translation along with the written
representation in dialect. The final output includes the
hyponymic senses, their definitions (if available), the
Italian canonical forms, their written representations, and
the corresponding lemma in the Parmigiano resource. A
selection of the results is shown in Table 3, including the
written representations of the Italian and corresponding
Parmigiano lemmas.</p>
      <p>parm.</p>
      <p>SELECT ?senseHyponym</p>
      <p>(GROUP_CONCAT(str(?_definition);SEPARATOR="
; esempio: ") AS ?definition)</p>
      <p>?liitaLemma ?parmigianoLemma ?wr
WHERE {</p>
      <p>SERVICE
&lt;https://klab.ilc.cnr.it/graphdb-complit/&gt; {
?word a ontolex:Word ;</p>
      <p>lexinfo:partOfSpeech [ rdfs:label ?pos ]
;
ontolex:sense ?sense ;
ontolex:canonicalForm [ ontolex:
writtenRep ?lemma ] .
?sense lexinfo:hypernym ?senseHyponym .</p>
      <p>OPTIONAL { ?senseHyponym skos:definition ?
_definition } .</p>
      <p>FILTER(str(?pos) = "noun") .</p>
      <p>FILTER(str(?lemma) = "colore") .</p>
      <p>?wordHyponym ontolex:sense ?senseHyponym .
}
?wordHyponym ontolex:canonicalForm ?liitaLemma .
?leItaLexiconPar ontolex:canonicalForm ?
liitaLemma ;</p>
      <p>^lime:entry &lt;http://liita.it/
data/LexicalReources/DialettoParmigiano/</p>
      <p>Lexicon&gt; .
?leItaLexiconPar vartrans:translatableAs ?</p>
      <p>leParLexiconPar .
22https://liita.it/data/id/DialettoParmigiano/lemma/LemmaBank.</p>
      <p>html
italian
tabacco
piombo
mattone
mattone
mattone
rame
pisello
rosso
ruggine
topo
topo
ciliegia
sabbia
cenere
topo
tabacco
topo
verde
verde
verdone
violetto
ciliegia
giallo
oro
pisighén
piómb
quaderlètt
quaderlón
quadrél
ram
reviót
ròss
rùzzna
sorghén
sorgón
sréza
sàbia
sèndra
sòrrogh
tabach
topén
verdzén
verdén
verdón
violètt
vìssola
zaldón
òr
italian
argento
azzurro
grigio
grigio
grigio
blu
cenere
bronzo
prugna
caramella
carminio
carota
crema
cremisi
ferro
giallo
giallo
giallo
grigio
limone
muschio
miele
nocciola
paglia
parm.
argént
azúr
bergnôl
biz
bizón
blò
bornìza
brónz
brùggna
caraméla
carmzén
caròtla
crèmma
crèmmez
fér
gialdètt
gialdón
giäld
griz
limón
musc’
méla
nisôla
paja</p>
      <p>This sense-centred approach results in approximately
thirty additional lexical entries, as many of the
corresponding definitions do not explicitly include the word
colore, but are nonetheless semantically linked through
hyponymy. This example highlights the potential of
leveraging CompL-it’s semantic network to formulate
richer and more accurate queries.</p>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusions</title>
      <p>The integration of CompL-it into the LiITA Knowledge
Base marks a significant milestone in the development
of interoperable linguistic resources for Italian. By
linking over 100,000 lexical entries, many of which include
rich semantic annotations, to LiITA’s LB, this initiative
enhances the interoperability and expressiveness of both
resources. The linking process also prompted the creation
of new lemma variants, refinement of linking strategies,
and the accommodation of plural forms and multiword
expressions, thereby contributing to the ongoing
enrichment of the LB. This work demonstrates the feasibility
and advantages of integrating heterogeneous linguistic
resources using Linked Open Data principles and shared
ontologies. The ability to execute cross-resource SPARQL
queries further exemplifies the practical benefits of
semantic interoperability. One of the next crucial steps will
be the integration of Italian textual corpora into LiITA.
This will allow not only for the validation of lemma-token
alignment but also for exploring contextual usage
patterns of lexical entries. Moreover, this will allow for the
semantic richness of CompL-it to be exploited through
designing and testing of more complex SPARQL queries.
Lastly, one of the key challenges in achieving impact
within the linguistic community, or more broadly, the
humanities fields that engage with data, will be to
evaluate and explore text-to-SPARQL systems using Large
Language Models (LLMs). This can be done through
Retrieval-Augmented Generation (RAG), where a set of
SPARQL queries over the LIITA KB is provided, and
various few-shot prompts are tested to equip the LLM with
knowledge about the Classes and Properties used in the
KB.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This contribution is funded by the European Union
- Next Generation EU, Mission 4 Component 1 CUP
J53D2301727OOO1. The PRIN 2022 PNRR project</p>
      <p>LiITA: Interlinking Linguistic Resources for
Italian via Linked Data is carried out jointly by the
Università Cattolica del Sacro Cuore, Milano and the Università
di Torino.</p>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Gemini (Google) in
order to: Paraphrase and reword, Improve writing style, and Grammar and spelling check. After
using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Roventini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bertagna</surname>
          </string-name>
          , ItalWordNet v.
          <volume>2</volume>
          ,
          <year>2016</year>
          . URL: http://hdl.handle.
          <source>net/20.500</source>
          .11752/ ILC-62,
          <article-title>ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli"</article-title>
          , National Research Council, in Pisa.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Favretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          , C. De Santis, Coris/- codis:
          <article-title>A corpus of written italian based on a defined and a dynamic model, A rainbow of corpora: Corpus linguistics and the languages of the world (</article-title>
          <year>2002</year>
          )
          <fpage>27</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chiarcos</surname>
          </string-name>
          , POWLA:
          <article-title>Modeling linguistic corpora in OWL/DL</article-title>
          , in: C.
          <string-name>
            <surname>P. P. A. C. O. P. V. Simperl</surname>
          </string-name>
          , E. (Ed.),
          <source>The Semantic Web: Research and Applications. ESWC</source>
          <year>2012</year>
          , volume
          <volume>7295</volume>
          of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>239</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>642</fpage>
          -30284-8_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mambrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Franzini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cecchini</surname>
          </string-name>
          , E. Litta, G. Moretti,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rufolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <article-title>Interlinking through lemmas. the lexical collection of the lila knowledge base of linguistic resources for latin</article-title>
          ,
          <source>Studi e Saggi Linguistici</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>177</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E. M. G.</given-names>
            <surname>Litta Modignani Picozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brasolin</surname>
          </string-name>
          , G. Moretti1,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mambrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Fabio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <article-title>The Lemma Bank of the LiITA Knowledge Base of Interoperable Resources for Italian</article-title>
          , ITA,
          <year>2024</year>
          . URL: https://publicatt. unicatt.it/handle/10807/299843, accepted:
          <fpage>2024</fpage>
          -
          <lpage>12</lpage>
          - 04T14:
          <fpage>12</fpage>
          :
          <fpage>09Z</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sciolette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bellandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Giovannetti</surname>
          </string-name>
          , S. Marchi,
          <article-title>CompL-it: a Computational Lexicon of Italian</article-title>
          , AIDAinformazioni
          <volume>42</volume>
          (
          <year>2024</year>
          )
          <fpage>119</fpage>
          -
          <lpage>148</lpage>
          . URL: https://doi.org/10.57574/596545646. doi:
          <volume>10</volume>
          . 57574/596545646.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chiarcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          ,
          <source>Linguistic Linked Data: Representation, Generation and Applications</source>
          , Springer, Cham,
          <year>2020</year>
          . URL: https: //www.springer.com/gp/book/9783030302245. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30225-2.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gràcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <source>The OntoLex-Lemon Model: Development and Applications</source>
          ,
          <year>2017</year>
          . URL: https://www.semanticscholar. org/paper/The-OntoLex-Lemon-Model%
          <article-title>3A-Development-and-</article-title>
          <string-name>
            <surname>McCrae-Gil</surname>
          </string-name>
          /
          <year>3ab2877e3cf9d8f7bad3a4fb9a03602010e00691</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Universal</surname>
          </string-name>
          Part
          <article-title>-of-Speech Tagset</article-title>
          , in: N. C. C. Chair),
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Doğan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Istanbul, Turkey,
          <year>2012</year>
          , pp.
          <fpage>2089</fpage>
          -
          <lpage>2096</lpage>
          . URL: http://www.lrec-conf.org/proceedings/ lrec2012/pdf/274_Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Battista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          ,
          <article-title>Una piattaforma di morfologia computazionale per l'analisi e la generazione delle parole italiane</article-title>
          ,
          <source>Technical Report</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Battista</surname>
          </string-name>
          ,
          <article-title>The paradigmatic dimension of stem allomorphy in Italian verb inflection</article-title>
          ,
          <source>Italian Journal of Linguistics</source>
          <volume>12</volume>
          (
          <year>2000</year>
          )
          <fpage>307</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sciolette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Giovannetti</surname>
          </string-name>
          , S. Marchi,
          <article-title>LexicO: an Italian Computational Lexicon derived from Parole-Simple-</article-title>
          <string-name>
            <surname>Clips</surname>
          </string-name>
          ,
          <source>Umanistica Digitale</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>169</fpage>
          -
          <lpage>193</lpage>
          . URL: https: //umanisticadigitale.unibo.it/article/view/15176. doi:
          <volume>10</volume>
          .6092/issn.2532-
          <issue>8816</issue>
          /15176.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>AA.VV.</given-names>
            ,
            <surname>PAROLE-SIMPLE-CLIPS</surname>
          </string-name>
          ,
          <year>2016</year>
          . URL: http: //hdl.handle.
          <source>net/20.500</source>
          .11752/ILC-88.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          , The Generative Lexicon, The MIT Press,
          <year>1995</year>
          . URL: https://direct.mit.edu/books/ book/4726/The-Generative-Lexicon. doi:
          <volume>10</volume>
          .7551/ mitpress/3225.001.0001.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>