1. Introduction

Linking CompL-it to the LiITA Knowledge Base

Eleonora Litta

eleonoramaria.litta@unicatt.it 1

Marco Passarotti

marco.passarotti@unicatt.it 1

Giovanni Moretti

Paolo Brasolin

Francesco Mambrini

Valerio Basile

valerio.basile@unito.it 2

Andrea Di Fabio

Eliana Di Palma

eliana.dipalma@unito.it 2

Emiliano Giovannetti

emiliano.giovannetti@ilc.cnr.it 0

Simone Marchi

simone.marchi@ilc.cnr.it 0

Andrea Bellandi

Flavia Sciolette

0 0 Cnr-Istituto di Linguistica Computazionale "A. Zampolli" , Via G. Moruzzi 1, 56124 Pisa , Italia 1 Università Cattolica del Sacro Cuore , Largo Gemelli 1, 20123 Milano , Italia 2 Università di Torino , Via Verdi 8, 10124 Torino , Italia

2025

This paper presents the integration of CompL-it, a Linked Open Data (LOD) computational lexicon for contemporary Italian, into LiITA (Linking Italian), a Knowledge Base (KB) designed for linguistic interoperability. CompL-it contains over 101k lexical entries enriched with detailed morphological and semantic information, derived from multiple authoritative sources and modelled using the OntoLex-Lemon vocabulary. The linking process involved aligning lexical entries with lemmas in the LiITA's Lemma Bank (LB), addressing both exact and ambiguous matches through systematic and semantically informed strategies. Moreover, 12,739 new lemmas were added to the LiITA LB. This integration enhances the expressiveness and interoperability of LiITA, enabling complex SPARQL queries that exploit the semantic network encoded in CompL-it. Examples are provided to demonstrate the advantages of querying interlinked resources.

eol>Linked Open Data Italian language resources

1. Introduction

2. LiITA - Architecture chitecture, by following the LOD principles. In LiLa, lemmas act as pivots between textual data (composed by tokenised texts) and lexical metadata (compiled by In the LiITA LB, lemmas are represented using a dedilexical entries). Lemmas are collected in a Lemma Bank cated ontology,8 inherited from LiLa, which was specif(LB) to serve as the nexus for integrating distributed ically developed to capture the morphological and linlinguistic resources and enabling seamless connections guistic characteristics of Latin. This ontology encodes across heterogeneous datasets [ 4 ]. This architecture has features such as Part-of-Speech (PoS), gender, and innot only proven efective in unifying Latin resources, lfectional properties, drawing on the OLiA annotation but has also demonstrated its adaptability to other lan- framework [ 7, 151–155 ] to ensure consistency and formal guages. Building upon the LiLa framework, the LiITA interoperability. (Linking Italian) Knowledge Base has been conceived as a The ontology also defines the essential Classes and Knowledge Base for Italian linguistic resources[ 5 ]. LiITA Properties required for modelling lemmatisation. Among inherits the lemma-centric design, constructing a LB for these is the Property lila:hasLemma,9 which associates Italian. This LB, initially comprising over 113,000 entries lemmas with the tokens they annotate within a corpus. extracted from the Nuovo De Mauro dictionary,6 is metic- Within the OntoLex-Lemon model [ 8 ], lexical ulously curated to support interoperability, particularly forms can have one or more graphical variants, in the context of divergent lemmatisation standards. By captured using the Property ontolex:writtenRep modelling each lemma using the OntoLex-Lemon vocab- (http://www.w3.org/ns/lemon/ontolex#writtenRep), ulary and a shared ontology derived from LiLa, LiITA as well as phonetic realisations, specified ensures that lexical entries and their associated textual by the Property ontolex:phoneticRep occurrences can be connected across otherwise incom- (http://www.w3.org/ns/lemon/ontolex#phoneticRep). patible datasets. Its architecture not only allows for the The Property ontolex:canonicalForm integration of existing datasets but also accommodates (http://www.w3.org/ns/lemon/ontolex#canonicalForm) the dynamic evolution of linguistic knowledge as new identifies the standard or representative form within an resources become available in the KB, in an ever-growing inflectional paradigm. fashion. The LiITA LB is composed of such canonical forms,

As part of its ongoing development, LiITA is currently which are represented as instances of the Class in the process of interlinking via its LB several key lexi- lila:Lemma,10 a subclass of ontolex:Form within cal and textual resources. These include the Vocabolario the OntoLex-Lemon ontology. Moreover, the class della Lingua Parmigiana glossary, a bilingual lexicon hav- lila:Hypolemma, a subclass of lila:Lemma, is used ing Italian entries and the corresponding translations to represent citation forms that belong to a word’s reguin Parmigiano,7 and CompL-it[ 6 ], a computational lexi- lar inflectional paradigm but receive a diferent PoS tag con for Italian already published as Linked Open Data. than the lemma. It is the case of participles such as amThis paper describes the process of linking the compu- ato ‘loved’, adjective, which is part of the inflectional tational lexicon CompL-it to LiITA and it is structured paradigm of amare, ‘to love’, verb. as follows: Section 2 contains a short description of the With respect to morphological annotation, each lemma LiITA architecture, section 3 contains a description of in the LB is assigned a Part-of-Speech label using the the CompL-it resource and of how it is modelled in RDF; Property lila:hasPos,11 in accordance with the UPOS Section 4 describes the process of linking to the LiITA (Universal POS) tag set [ 9 ].

KB and how the LiITA LB has been enriched by the addi- The LiITA LB is not made of lexical entries because it tion of new lemmas from CompL-it; Section 5 contains does not function as an autonomous lexical resource. examples of the advantages given by the linking of the Rather, it constitutes a curated repository of canoniCompL-it resource to LiITA, including an example of a cal forms that (i) is intended to grow progressively as SPARQL queries performed on the current KB; Section 6 new sources, including those containing previously undraws conclusions and outlines future perspectives and recorded lemmas, are integrated, and (ii) serves as a foundevelopments. dation for both text lemmatisation and the indexing of lexical entries within distributed resources published as LOD.

However, linguistic resources often adopt heterogeneous tag sets, standards, and annotation schemes, particularly with respect to lemmatisation.

6https://dizionario.internazionale.it/

7https://github.com/LiITA-LOD/LocalVarieties/tree/main/ Parmigiano

8http://lila-erc.eu/ontologies/lila/ 9http://lila-erc.eu/ontologies/lila/hasLemma 10http://lila-erc.eu/ontologies/lila/Lemma 11http://lila-erc.eu/ontologies/lila/hasPOS

To accommodate this variation in lemmatisation ap- tic and phonological). The lemmas of the resources have proaches found across linguistic resources, the LiITA LB been converted as Lexical Entries of the OntoLex-Lemon defines two specialised Properties. The first is the sym- model and the forms as Lexical Forms; regarding the metric Property lila:lemmaVariant,12 which links PoS and the morphological traits (e.g. gender, number), diferent forms within the same inflectional paradigm each of the three resources had a diferent vocabulary that may be used as lemmas, while maintaining their as- for describing them. Therefore, they were mapped and sociated PoS. A common case involves *pluralia tantum*, converted according to the LexInfo vocabulary, the main which can appear as either singular or plural lemmas. For linguistic ontology for OntoLex-Lemon model. example, both the plural occhiali and the singular occhiale The strength of CompL-it, however, is the semantic (‘glasses/optical instrument’) are represented as distinct layer, partly converted from LexicO; it is worth noting lila:Lemma , connected via the lila:lemmaVariant that the senses in CompL-it (derived from LexicO, since Property. there are no senses in either M-GLF or treebanks) are

In contrast, the Property lila:hasHypolemma,13 richly described through a vocabulary consisting of 137 along with its inverse relation lila:isHypolemma,14 relations, divided in eight classes. Where possible, some is used to relate a lila:Lemma to a lila:Hypolemma. relations have been mapped to LexInfo19, otherwise, cus

By means of this modelling framework, the LB pro- tom object properties were created. The conversion of the vides a coherent structure capable of accommodating data thus prepared, coming from the three sources into divergent lemmatisation practices. For example, some OntoLex-Lemon, was performed by an algorithm in two resources lemmatise participles under their participial steps: i) conversion of the linguistic information accordform, while others prefer the base verbal form. Thanks to ing to the formalisation described in the core ontolex this flexible architecture, such diferences can be recon- module of the model; ii) serialisation of the data into Turciled, thereby promoting interoperability across corpora tle. The obtained lexicon was then loaded into Ontotext and lexical resources employing distinct lemmatisation GraphDB20, a semantic repository compliant with RDF conventions. and SPARQL21.

The following is an example of an RDF OntoLexLemon representation of a CompL-it lexical entry in Tur3. CompL-it tle format.

CompL-it is a computational lexicon for contemporary :coniglio_entry a ontolex:Word; Italian, modelled according to the already cited OntoLex- lexinfo:partOfSpeech lexinfo:noun; Lemon model, the de facto standard for lexical resources oonnttoolleexx::coatnhoenriFcoarlmFcoornmicgolniiog_lfioor_ml_e1;mma; and compliant with the principles of LOD. This resource ontolex:sense coniglio_sense_1, coniglio_sense_2, was created by merging three diferent sources of data: coniglio_sense_3 .

M-GLF (MAGIC-Generated Lemmatized Forms), a list of lemmatised forms with morphological information coniglio_lemma a ontolex:Form; generated by the MAGIC tool, a morphological analyser lleexxiinnffoo::nguemnbdeerr lleexxiinnffoo::msaisncguullianr;e; [ 10 ] [ 11 ]; a set of Italian language treebanks available ontolex:writtenRep "coniglio"@it, "rabbit"@en . through the UD repository (Italian Stanford Dependency Treebank, ISDT15; Venice Italian Treebank, VIT16; Paral- coniglio_form_1 a ontolex:Form; lelTut, ParTut17; ParlaMint-It18); the computational lexi- lexinfo:gender lexinfo:masculine; con LexicO [ 12 ], which constitutes the base of the entire olnetxoilnefxo::wnruimtbteernlReexpi"ncfoon:pilgulri"a@l;it, "rabbits"@en . resource, from the point of view of the model.

LexicO represents the revised version of another im- coniglio_sense_1 a ontolex:LexicalSense; portant resource in the framework of Italian Lexicogra- skos:definition "mammifero della famiglia dei phy, Parole-Simple-Clips [ 13 ], with which it shares the Leporidi, con pelame di vario colore, lunghe same model based on the theory of Generative Lexicon ionrceicscihviie,"@oictc,hi"Mgarammnadil oefsptohregLeenptoireidgareofsasmiily, by James Pustejovsky [ 14 ], with four diferent layers of with variously colored fur, long ears, large, linguistic information (morphological, semantic, syntac- protruding eyes and large incisors"@en; lexinfo:hyponym mammifero_sense; simple:polysemyAnimalFood coniglio_sense_3 . 12http://lila-erc.eu/ontologies/lila/lemmaVariant 13http://lila-erc.eu/ontologies/lila/hasHypolemma 14http://lila-erc.eu/ontologies/lila/isHypolemma 15https://github.com/UniversalDependencies/UD_Italian-ISDT 16https://github.com/UniversalDependencies/UD_Italian-VIT 17https://github.com/UniversalDependencies/UD_Italian-ParTUT 18https://github.com/UniversalDependencies/UD_

Italian-ParlaMint coniglio_sense_2 a ontolex:LexicalSense;

4. Linking

skos:definition "persona timida e molto paurosa"@it,

"shy and very fearful person"@en; lexinfo:hyponym persona_sense; simple:metaphor coniglio_sense_1 .

Linking a lexical resource to the LiITA LB entails establishing a relationship between the lexical entries coniglio_sense_3 a ontolex:LexicalSense; of the resource and the lemmas in the LB. Typically, skos:definition "carne dell’omonimo animale"@it, this process begins with modeling the resource as a "meat of the animal"@en . LOD resource, followed by creating the connections between the resource’s entries and the LB lemmas.

Modelling the link between CompL-it and LiITA was,

In this example, the lexical entry coniglio (rabbit) is however, relatively straightforward. One of the main linked to two word forms: one designated as the canon- advantages of integrating a resource that already adical form (lemma), and the other corresponding to the heres to LOD standards is that each CompL-it entry, plural form conigli (rabbits). Both forms are annotated already represented as an ontolex:Word, a subclass with the appropriate morphological features. of ontolex:LexicalEntry, can be directly linked to

The lexical entry is also connected, via the LiITA via the ontolex:canonicalForm relation. ontolex:sense property, which links lexical en- The linking process between CompL-it and LiITA betries to their semantic interpretations, to three lexical gins necessarily with a mapping between the diferent senses, each of which includes a definition expressed in PoS tags used in CompL-it, which are described using Lexnatural language. info, and the UPOS tagset used in LiITA. Table 2 shows

Furthermore, the first two senses are semantically en- the PoS mapping between the two tagsets operated on riched through relations that connect them to other lexi- the data before matching CompL-it entries with LiITA cal senses in the resource. For instance, rabbit_sense_2 is lemmas. modelled as a hyponym of mammal_sense. Subsequently a match between CompL-it lexical en

CompL-it contains 101,795 lexical entries (comprising tries and lemmas in LiITA was performed on the lemmaa total of 791,541 word forms), classified with 36 PoS PoS pair. Out of over 101k lexical entries in CompL-it, categories and described with morphological traits; from the matching process yielded the following results: a semantic standpoint, CompL-it describes 55,713 word senses connected to each other through 137 types of semantic relations, totaling 86,577 instances.

Table 1 shows a distribution of the 10 most numerous types of semantic relation instances:

Semantic relation # instances

an example hyponym approximateSynonym usedFor partMeronym partHolonym createdBy ObjectOfTheActivity memberMeronym ResultingState memberHolonym other total 43,069 5,666 3,291 3,159 3,159 2,857 1,366 1,318 1,063 979 • 1:1 match: 83,340 lexical entries (an exact match between a CompL-it lexical entry and a LiITA lemma + PoS combination) • 1:N match: 4,219 lexical entries (more than one potential lemma-POS pairs in LiITA corresponding to a single CompL-it lexical entry) • 1:0 match: 14,314 lexical entries (no correspond

ing lemma-POS pair found in LiITA)

The linking is operationalised using the ontolex:canonicalForm relation, which connects a CompL-it lexical entry to a corresponding lemma in LiITA. For example: http://lexica/mylexicon#MUSmerendaNOUN ontolex:canonicalForm http://liita.it/data/id/lemma/1010136 (merenda)

Disambiguation of 1:N matches posed a significant challenge. At the time of this initial linking efort, CompLit was the first external resource to be linked to the LiITA LB, meaning that no additional semantic cues, such as sense distinctions or contextual usage, were yet available in the lemma database. As a result, each lemma in LiITA was limited to grammatical information such as PoS, gender, or conjugation and reflexivity (for verbs). Although, as noted in Section 1, the lemmas were extracted from adjective adposition adverb article auxiliary cardinalNumeral commonNoun conjunction coordinatingConjunction definiteArticle demonstrativeDeterminer demonstrativePronoun determiner exclamativeDeterminer exclamativePronoun fusedPreposition indefiniteArticle indefiniteDeterminer indefinitePronoun interjection interrogativeAdverb interrogativeDeterminer interrogativePronoun noun numeral numeralDeterminer numeralPronoun particle personalPronoun possessiveAdjective possessiveDeterminer possessivePronoun pronoun relativeDeterminer relativePronoun subordinatingConjunction verb ADJ ADP ADV DET VERB NUM

NOUN SCONJ-ADV

CCONJ

DET DET PRON DET DET PRON ADP DET DET PRON INTJ ADV DET PRON NOUN NUM DET PRON PART PRON ADJ DET PRON PRON DET PRON SCONJ

VERB

The CompL-it resource contains a substantial number of words in plural form. Entries such as pantaloni (“trousers”) and mutande (“underpants”), braccia, ottavi, which refers to the “round of 16” in a tournament setting, have been added to the LB. In such cases the new lemma has been linked to their singular variant in the LB with the Property lila:lemmaVariant as described in Section 2.

A few additional noteworthy inclusion strategies from the CompL-it resource that have been adopted are outlined below: • Truncated word forms, such as quest’, nessun’, and verun, have been added as written representations of existing lemmas. • Adjectives and determiners occurring in feminine or plural forms have been systematically linked to their corresponding singular masculine lemmas in LiITA. • Adverbial forms that appear to be derived from adjectives, pronouns, or determiners (e.g., quante, prese) have been included in the resource as hypolemmas of their corresponding base entries.

This modelling choice ensures compatibility with texts in which such adverbial forms are lemmatised under their base categories—namely, adjectives, pronouns, or determiners—thereby promoting consistency across heterogeneous lemmatisation practices. • Composite pronouns, such as glieli, glielo, gliene, and others, have also been included in the LB, following the same rationale outlined above.

This ensures alignment with sources in which these forms are treated as distinct lemmas (as opposed to split into e.g. glielo gli + lo) • Orthographic errors (e.g., perchè, with grave accent on the final e, instead of the correct perché) have been linked to the appropriate lemma, although their incorrect spellings have not been recorded as alternative written representations. the Nuovo De Mauro Dictionary, no sense-level metadata was incorporated from the dictionary.

In the absence of semantic information, we adopted 5. Querying CompL-it in LiITA a pragmatic yet arbitrary strategy for disambiguation: where multiple LiITA lemmas shared the same form and One of the key advantages of storing data in RDF is the PoS, we selected the lemma that appears first in the LB ability to formulate federated SPARQL queries that re(by id). While this approach lacks empirical grounding, trieve information from datasets distributed across mulit provided a consistent criterion for initiating the align- tiple endpoints. Examples of SPARQL queries performed ment process. on the LiITA Knowledge Base are continuously added

In cases involving a 1:0 match, the correspondence to https://www.liita.it/?page_id=158. The integration of with the string may be either complete—for instance, in CompL-it into the LiITA Knowledge Base enables the the case of a previously unseen word—or partial, as when exploitation of its rich semantic network and facilitates inflected forms of lemmas already present in the LB are interoperability with other linked linguistic resources. encountered. The strategy for inclusion varies according For instance, it becomes possible to retrieve Italian lexical to the characteristics of the lexical resource being linked. entries linked to CompL-it whose definitions begin with PREFIX lime: <http://www.w3.org/ns/lemon/lime#> PREFIX vartrans: <http://www.w3.org/ns/lemon/

vartrans#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf

schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core

#> PREFIX dct: <http://purl.org/dc/terms/> PREFIX onto: <http://www.ontotext.com/> PREFIX lexinfo: <http://www.lexinfo.net/ontology

/3.0/lexinfo#> PREFIX ontolex: <http://www.w3.org/ns/lemon/

ontolex#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf

syntax-ns#> uccello (bird) and to display their corresponding transla- ?leParLexiconPar ontolex:canonicalForm ? tions in the Parmigiano Glossary, another resource linked parmigianoLemma . to LiITA.22 It is interesting to explore the added value that ?parmigianoLemma ontolex:writtenRep ?wr CompL-it contributes through its dense network of se- }GROUP BY ?senseHyponym ?liitaLemma ? mantic relations. For instance, one of the example queries parmigianoLemma ?wr provided on the LiITA website retrieves lexical entries ORDER BY ASC(?wr) associated with color by filtering definitions that begin with the string colore (“colour”). While this method yields relevant results, a more semantically informed strategy The query interrogates the CompL-it repository hosted involves querying for all hyponyms of the specific sense in GraphDB to extract lexical entries classified as nouns, of the lemma colore defined as "qualità dei corpi per cui whose written representation is colore and which are essi riflettono in vario modo la luce" (“property of bodies associated with a sense that has at least one hyponym. by which they reflect light in various ways”). Below is Additionally, it retrieves all the available definitions of the SPARQL query text retrieving all the hyponyms of such hyponyms. Subsequently, the query accesses the colore. local LiITA graph to extract the Italian written representation of each hyponym, identify the corresponding lexical entry, verify its inclusion in the Parmigiano lexicon, and retrieve its translation along with the written representation in dialect. The final output includes the hyponymic senses, their definitions (if available), the Italian canonical forms, their written representations, and the corresponding lemma in the Parmigiano resource. A selection of the results is shown in Table 3, including the written representations of the Italian and corresponding Parmigiano lemmas.

parm.

SELECT ?senseHyponym

(GROUP_CONCAT(str(?_definition);SEPARATOR=" ; esempio: ") AS ?definition)

?liitaLemma ?parmigianoLemma ?wr WHERE {

SERVICE <https://klab.ilc.cnr.it/graphdb-complit/> { ?word a ontolex:Word ;

lexinfo:partOfSpeech [ rdfs:label ?pos ] ; ontolex:sense ?sense ; ontolex:canonicalForm [ ontolex: writtenRep ?lemma ] . ?sense lexinfo:hypernym ?senseHyponym .

OPTIONAL { ?senseHyponym skos:definition ? _definition } .

FILTER(str(?pos) = "noun") .

FILTER(str(?lemma) = "colore") .

?wordHyponym ontolex:sense ?senseHyponym . } ?wordHyponym ontolex:canonicalForm ?liitaLemma . ?leItaLexiconPar ontolex:canonicalForm ? liitaLemma ;

^lime:entry <http://liita.it/ data/LexicalReources/DialettoParmigiano/

Lexicon> . ?leItaLexiconPar vartrans:translatableAs ?

leParLexiconPar . 22https://liita.it/data/id/DialettoParmigiano/lemma/LemmaBank.

html italian tabacco piombo mattone mattone mattone rame pisello rosso ruggine topo topo ciliegia sabbia cenere topo tabacco topo verde verde verdone violetto ciliegia giallo oro pisighén piómb quaderlètt quaderlón quadrél ram reviót ròss rùzzna sorghén sorgón sréza sàbia sèndra sòrrogh tabach topén verdzén verdén verdón violètt vìssola zaldón òr italian argento azzurro grigio grigio grigio blu cenere bronzo prugna caramella carminio carota crema cremisi ferro giallo giallo giallo grigio limone muschio miele nocciola paglia parm. argént azúr bergnôl biz bizón blò bornìza brónz brùggna caraméla carmzén caròtla crèmma crèmmez fér gialdètt gialdón giäld griz limón musc’ méla nisôla paja

This sense-centred approach results in approximately thirty additional lexical entries, as many of the corresponding definitions do not explicitly include the word colore, but are nonetheless semantically linked through hyponymy. This example highlights the potential of leveraging CompL-it’s semantic network to formulate richer and more accurate queries.

6. Conclusions

The integration of CompL-it into the LiITA Knowledge Base marks a significant milestone in the development of interoperable linguistic resources for Italian. By linking over 100,000 lexical entries, many of which include rich semantic annotations, to LiITA’s LB, this initiative enhances the interoperability and expressiveness of both resources. The linking process also prompted the creation of new lemma variants, refinement of linking strategies, and the accommodation of plural forms and multiword expressions, thereby contributing to the ongoing enrichment of the LB. This work demonstrates the feasibility and advantages of integrating heterogeneous linguistic resources using Linked Open Data principles and shared ontologies. The ability to execute cross-resource SPARQL queries further exemplifies the practical benefits of semantic interoperability. One of the next crucial steps will be the integration of Italian textual corpora into LiITA. This will allow not only for the validation of lemma-token alignment but also for exploring contextual usage patterns of lexical entries. Moreover, this will allow for the semantic richness of CompL-it to be exploited through designing and testing of more complex SPARQL queries. Lastly, one of the key challenges in achieving impact within the linguistic community, or more broadly, the humanities fields that engage with data, will be to evaluate and explore text-to-SPARQL systems using Large Language Models (LLMs). This can be done through Retrieval-Augmented Generation (RAG), where a set of SPARQL queries over the LIITA KB is provided, and various few-shot prompts are tested to equip the LLM with knowledge about the Classes and Properties used in the KB.

Acknowledgments

This contribution is funded by the European Union - Next Generation EU, Mission 4 Component 1 CUP J53D2301727OOO1. The PRIN 2022 PNRR project

LiITA: Interlinking Linguistic Resources for Italian via Linked Data is carried out jointly by the Università Cattolica del Sacro Cuore, Milano and the Università di Torino.

Declaration on Generative AI During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Gemini (Google) in order to: Paraphrase and reword, Improve writing style, and Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content.

[1]

Roventini ,

Marinelli ,

Bertagna , ItalWordNet v. 2 , 2016 . URL: http://hdl.handle. net/20.500 .11752/ ILC-62, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli" , National Research Council, in Pisa.

[2]

R. R.

Favretti ,

Tamburini , C. De Santis, Coris/- codis: A corpus of written italian based on a defined and a dynamic model, A rainbow of corpora: Corpus linguistics and the languages of the world ( 2002 ) 27 - 38 .

[3]

Chiarcos , POWLA: Modeling linguistic corpora in OWL/DL , in: C. P. P. A. C. O. P. V. Simperl , E. (Ed.), The Semantic Web: Research and Applications. ESWC 2012 , volume 7295 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2012 , pp. 225 - 239 . doi: 10 .1007/ 978-3- 642 -30284-8_ 22 .

[4]

Passarotti ,

Mambrini ,

Franzini ,

F. M.

Cecchini , E. Litta, G. Moretti,

Rufolo ,

Sprugnoli , Interlinking through lemmas. the lexical collection of the lila knowledge base of linguistic resources for latin , Studi e Saggi Linguistici 58 ( 2020 ) 177 - 212 .

[5]

E. M. G.

Litta Modignani Picozzi ,

M. C.

Passarotti ,

Brasolin , G. Moretti1,

Mambrini ,

Basile ,

A. D.

Fabio ,

Bosco , The Lemma Bank of the LiITA Knowledge Base of Interoperable Resources for Italian , ITA, 2024 . URL: https://publicatt. unicatt.it/handle/10807/299843, accepted: 2024 - 12 - 04T14: 12 : 09Z .

[6]

Sciolette ,

Bellandi ,

Giovannetti , S. Marchi, CompL-it: a Computational Lexicon of Italian , AIDAinformazioni 42 ( 2024 ) 119 - 148 . URL: https://doi.org/10.57574/596545646. doi: 10 . 57574/596545646.

[7]

Cimiano ,

Chiarcos ,

J. P.

McCrae ,

Gracia , Linguistic Linked Data: Representation, Generation and Applications , Springer, Cham, 2020 . URL: https: //www.springer.com/gp/book/9783030302245. doi: 10 .1007/978-3- 030 -30225-2.

[8]

J. P.

McCrae ,

Gil ,

Gràcia ,

Bitelaar ,

Cimiano , The OntoLex-Lemon Model: Development and Applications , 2017 . URL: https://www.semanticscholar. org/paper/The-OntoLex-Lemon-Model% 3A-Development-and- McCrae-Gil / 3ab2877e3cf9d8f7bad3a4fb9a03602010e00691 .

[9]

Petrov , D. Das , R.

McDonald , A

Universal Part -of-Speech Tagset , in: N. C. C. Chair),

Choukri ,

Declerck ,

M. U.

Doğan ,

Maegaard ,

Mariani ,

Moreno ,

Odijk , S. Piperidis (Eds.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) , European Language Resources Association (ELRA) , Istanbul, Turkey, 2012 , pp. 2089 - 2096 . URL: http://www.lrec-conf.org/proceedings/ lrec2012/pdf/274_Paper.pdf.

[10]

Battista ,

Pirrelli , Una piattaforma di morfologia computazionale per l'analisi e la generazione delle parole italiane , Technical Report , 1999 .

[11]

Pirrelli ,

Battista , The paradigmatic dimension of stem allomorphy in Italian verb inflection , Italian Journal of Linguistics 12 ( 2000 ) 307 - 380 .

[12]

Sciolette ,

Giovannetti , S. Marchi, LexicO: an Italian Computational Lexicon derived from Parole-Simple- Clips , Umanistica Digitale 7 ( 2023 ) 169 - 193 . URL: https: //umanisticadigitale.unibo.it/article/view/15176. doi: 10 .6092/issn.2532- 8816 /15176.

[13] AA.VV. , PAROLE-SIMPLE-CLIPS , 2016 . URL: http: //hdl.handle. net/20.500 .11752/ILC-88.

[14]

Pustejovsky , The Generative Lexicon, The MIT Press, 1995 . URL: https://direct.mit.edu/books/ book/4726/The-Generative-Lexicon. doi: 10 .7551/ mitpress/3225.001.0001.