<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Meaning of Beatus: Disambiguating Latin with Contemporary AI Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eleonora Ghizzota</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Siciliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bari Aldo Moro</institution>
          ,
          <addr-line>via Edoardo Orabona 4, 70125, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The objective of this work is to assess the performance of Large Language Models (LLMs) on the task of Word Sense Disambiguation (WSD) for Latin.We evaluate state-of-the-art LLMs-including GPT-4o-mini and LLaMA variants-in both zero-shot and fine-tuned settings, using a dataset derived from the SemEval-2020 Latin Lexical Semantic Change task. Our study aims to determine whether instruction tuning and task-specific fine-tuning can significantly improve the models' ability to disambiguate Latin word senses. Results show that while LLMs demonstrate a non-trivial baseline ability in zero-shot settings, fine-tuning - particularly instruction-based - provides improvements in accuracy and F 1 scores. These findings highlight the potential of LLMs when applied to low-resourced historical languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Lexical Semantics</kwd>
        <kwd>Word Sense Disambiguation</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Latin</kwd>
        <kwd>Low-resource languages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivations</title>
      <p>CLiC-it 2025: Eleventh Italian Conference on Computational
Linguistics, September 24 — 26, 2025, Cagliari, Italy
* Corresponding author.
† These authors contributed equally.
$ e.ghizzota@phd.uniba.it (E. Ghizzota); pierpaolo.basile@uniba.it
(P. Basile); lucia.siciliani@uniba.it (L. Siciliani);
giovanni.semeraro@uniba.it (G. Semeraro)</p>
      <p>0000-0002-0751-3891 (E. Ghizzota); 0000-0002-0545-1105
(P. Basile); 0000-0001-7116-9338 (L. Siciliani); 0000-0002-9421-8566
(G. Semeraro)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License 1https://www.github.com/dbamman/latin-bert</p>
      <p>Attribution 4.0 International (CC BY 4.0).
tionised the landscape by providing vast corpora and ex- period to the 21st century. It achieves state-of-the-art
tensive knowledge graphs extracted from online sources, performance in Latin part-of-speech tagging across all
thereby amplifying the capabilities of both supervised Universal Dependency datasets. To capture the full range
and knowledge-based methods. The introduction of of linguistic variation, the model was trained on multiple
transformer-based architectures [22] marked a signifi- corpora, including the Corpus Thomisticum, the Internet
cant turning point. These models use dense vector rep- Archive, the Latin Library, Patrologia Latina, Perseus, and
resentations to capture semantic meaning in context, re- the Latin Wikipedia. Latin BERT uses Latin-specific
sensulting in further advancements in disambiguation tech- tence and word tokenizers from the Classical Language
niques. A significant development in this domain is the Toolkit, resulting in a vocabulary of 32,895 subword units.
rise of Large Language Models (LLMs), which are built To assess Latin BERT performance in the WSD task, the
upon the Transformer architecture and trained on exten- authors reformulated it into a binary classification task
sive text corpora. LLMs exhibit proficiency in a myriad and created an ad hoc dataset of Latin sense examples
of tasks in zero-shot or few-shot contexts, ruling out extracted from the Lewis and Short Latin Dictionary [23].
the necessity of task-specific training data. This implies In order to be selected, headwords must have at least
an inherent capacity for semantic understanding within two distinct senses – typographically denoted by “I.” and
these models. Nonetheless, LLMs can also be fine-tuned “II.” – supported by at least 10 sentences each, and longer
on particular tasks by utilising tailored training data, en- than five words. For the task, only the two major senses
hancing their performance in specific applications. of a headword were retained; the final dataset consists of</p>
      <p>Considering these premises, the intent of this work is 8,354 examples for 201 dictionary headwords. For each
to assess how state-of-the-art LLMs perform on under- headword, an instance of Latin BERT was fine-tuned on
represented languages like Latin through the lens of a 80% of the examples. The number of training instances
long-standing task in NLP like WSD. In particular, our in- per headword ranges from 16 (8 per sense) to 192 (96 per
vestigation has two objectives. First, we want to test mod- sense); 59% of headwords have 24 or fewer training
exels out-of-the-box ability to disambiguate Latin senses in amples. Latin BERT achieves 75.4% accuracy, compared
a zero-shot setting. In this way, we aim to first establish to the 67.3% of a bidirectional LSTM with static word
how well the models inherent multilingual knowledge embeddings. These results show that, even with few
performs in accurate sense prediction. Next, we also per- training examples, Latin BERT was able to disambiguate
form task-specific fine-tuning, which enables us to adapt senses.
both standard and instruction versions of LLMs. The aim A few years later, [24] fine-tuned Latin BERT on a
is to gauge the gain obtained with this additional training portion of sense representations in the Thesaurus
Linstep. guae Latinae2 (TLL). The TLL is the first comprehensive</p>
      <p>The paper is structured as follows: Section 2 provides dictionary of ancient Latin usage up to 600 AD, ofering
an overview of works related to solving the WSD task a comprehensive, documented overview of every Latin
with LLMs; Section 3 introduces the corpus of choice word’s history, including meanings and constructions,
for this study, while 4 illustrates the methodology. Sec- etymology, inflexion peculiarities, spelling, and prosody,
tion 5 describes the experimental setting and discusses as well as comments from ancient sources on the word
itthe results and the limitations of the proposed strategy, self. The ongoing TLL project begun in 1894 and has been
while Section 6 summarises the takeaway messages of regularly updated since; currently, it contains lemmata
this paper and suggests some future works. from a to resurge¯sco, and it is estimated to contain
approximately 56,000 entries. Inspired by the WSD dataset
created by Bamman and Burns for Latin BERT, the
au2. Related Work thors requested data for the same lemmata from TLL,
obtaining 25,227 quotes for 40 lemmata. The new dataset
2.1. Latin Word Sense Disambiguation leads to a performance gain, with the Mean Macro F1
Currently, solving the WSD task for Latin using language increasing from .695 to .794.
models remains an unexplored strategy, with very few Although both [17] and [24] achieved promising
reworks investigating this line of research in recent years. sults, Latin is still an under-represented language for
The idea of using WSD for measuring the ability of lan- which very few annotated resources are available, when
guage models to deal with Latin is supported by the work compared to English. [25] proposes a language pivoting
proposed by [17] in which Latin BERT is tested on the framework for Latin. Language pivoting, borrowed from
sense disambiguation task. Machine Translation [26], consists of propagating
anno</p>
      <p>Latin BERT is a contextual language model tailored tations from high-resource languages to lower-resource
for Latin, trained on a corpus of 642.7 million words ones. Starting from the 40 lemmata manually annotated
drawn from diverse sources ranging from the Classical</p>
      <sec id="sec-1-1">
        <title>2.2. LLMs and Word Sense</title>
      </sec>
      <sec id="sec-1-2">
        <title>Disambiguation</title>
        <p>
          for SemEval-2020 [13], the authors extract an aligned currence within a sentence, the LLM must generate the
Latin-English dataset in which these lemmata occur. To appropriate definition; and ( ii) given a word occurrence
thi
          <xref ref-type="bibr" rid="ref9">s day, the dataset of SemEval-2020</xref>
          Task 1 is the only and a list of predefined meanings, the LLM must identify
benchmark for Latin, manually annotated by Latin ex- the correct one. Moreover, they use the training data of
perts. These lemmata were then mapped to WordNet, XL-WSD to fine-tune an open LLM based on
LLaMA3.1Latin WordNet3 and Princeton WordNet [27], allowing 8B. The results indicate that while LLMs perform well
for annotation propagation from English to Latin. The in zero-shot settings, they still fall short of surpassing
ifnal result is a dataset of 3,886 annotated sentences for current state-of-the-art methods. Larger models achieve
training and experimentation. the strongest results, whereas medium-sized models tend
to underperform. Notably, however, a fine-tuned model
with a medium parameter size outperforms all others,
including existing state-of-the-art approaches.
        </p>
        <p>
          Over the years, LLMs have consistently demonstrated 3. Dataset
their ability to perform various tasks in a zero- or
fewshot setting with minimal or no specific training data,
suggesting an intrinsic capability of LLMs to grasp the 3.1. Resource
semantics behind language [28, 29]. The dataset of choice is the Latin annotated dataset for
[30] demonstrates that BERT-like models are capable of the Unsupervised Lexical Semantic Change Detection
efectively diferentiating between various word senses, (L
          <xref ref-type="bibr" rid="ref9">SCD) shared task of SemEval-2020</xref>
          [13].
even when only a few examples are available for each. This dataset is a fragment of LatinISE4 [5], a 13 million
Their analysis further reveals that although language words diachronic, annotated Latin corpus. The primary
models can perform nearly perfectly on coarse-grained source of LatinISE is the Latin portion of the IntraText
noun disambiguation in ideal settings where training digital library5. To semi-automatically annotate this
cordata and resources are abundant, such conditions are pus, 2013 state-of-the-art NLP tools – PROIEL6, Quick
rare in practical scenarios, presenting ongoing challenges. Latin7, and TreeTagger8– were used. Hence, LatinISE
Along the lines of BERT-like approaches, [31] examines provides morphological annotations like part-of-speech
multiple WSD methods, including those that use lan- tags and lemma for each word.
guage models to extract contextual embedding
          <xref ref-type="bibr" rid="ref9">s as in- Back in 2020</xref>
          , for the
          <xref ref-type="bibr" rid="ref9">SemEval-2020</xref>
          Unsupervised
Lexiput features and as a foundation for training supervised cal Semantic Change task, two time-specific sub-corpora
models on sense-annotated data. [32] assesses language 1 and 2 were extracted from LatinISE [13, 6]: 1
covmodels’ WSD capabilities through three behavioural ex- ers the period from 2 century BC to 0 (1.7M tokens),
periments designed to evaluate children’s ability to dis- 2 from 0 to 21 century AD (9.4M tokens).
ambiguate word senses. The study ofers a compelling As concerns target words, they are either (i) words
comparison between how children understand semantics that changed their meaning(s) between 1 and 2; or (ii)
and how it is encoded in transformer-based models. The stable words that did not change their meaning during
authors identify a bias in the models toward the most fre- that time. The choice of the set of lexemes for the
annoquent sense and observe a negative correlation between tation was based on an initial process of lexical selection
the size of the training data and model performance. and pre-annotation, carried out by a team member [6]. A
[33] evaluated WSD accuracy of LLMs on eight list of target words comprising those whose meaning has
datasets via a multiple-choice question format, and [34] been attested to have changed between the pre-Christian
extended the analysis by gauging LLM performance on and Christian era [38, 39, 40, 23] was selected. The
presingle-choice questions and examining how diferent annotation trial verified whether the corpus showed
evimodel sizes afect disambiguation accuracy. Similarly, dence of both the late antiquity senses and the previous
[35] creates a benchmark specific for the Italian language senses, and whether the late antiquity senses appeared in
with the aim of evaluating LLMs’ abilities in selecting the later texts only and the classical senses in the earlier
the correct meaning of a word and in generating the texts, although they may also have occurred in later texts.
definition of a word in a sentence. Finally, [ 36] analy- Conversely, stable words were chosen since they are not
ses WSD capabilities of only open LLMs experimenting known for having undergone lexical semantic change
with diferent parameter configurations on several
languages: English, Spanish, French, Italian and German. 4Available at https://lindat.mf.cuni.cz/repository/xmlui/handle/
The authors extend the existing XL-WSD benchmark [37] 11234/1-2506
5http://www.intratext.com
to include two additional subtasks: (i) given a word oc- 6https://www.hf.uio.no/ifikk/english/research/projects/proiel/
7http://www.quicklatin.com/
3http://latinwordnet.exeter.ac.uk/ 8https://www.cis.uni-muenchen.de/âĹĳschmid/tools/TreeTagger/
associated with the period of late antiquity. The final list stratification process, 70% training and 30% testing,
outcomprises 40 target words, of which 23 are stable, while puts a training set of 6,299 sentences and a testing set of
17 have undergone changes in meaning in relation to 2,690. Due to the absence of annotations, sentences of the
Christianity. lemma oportet were excluded from the dataset. DuREL
        </p>
        <p>For each target word, its primary sense definitions annotation statistics are summarised in Table 2 below.
were taken from the Latin portion of the Logeion On- We take full advantage of the annotations in the dataset
line Dictionary9, which includes Lewis and Short’s Latin by creating a separate prompt for each of the judgments
English Lexicon [23], Lewis’s Elementary Latin Dictio- assigned to each of the proposed senses for a single
sennary [41], and Du Fresne Du Cagne’s Glossarium mediae tence. For example, if the annotator marked virtute as “4
et infimae latinitatis [42]. Depending on the cases, the - Identical” for the “manliness, courage, virtue, strength”
sense inventory was simplified, or the definitions were and “1 - Unrelated” for the sense “virtue, personified as a
shortened, while maintaining the principal distinction deity”, two separate prompts are created, each structured
between senses. Finally, for each target word 60 passages as shown in Listings 1 and 2.
sample sentences were extracted, 30 from 1 and 30 2 Listing 1: Prompt generated by each sense annotation for
respectively, for a total of 2,398 passages. regression task.</p>
        <p>The lack of native Latin speakers adds a further layer
of complexity to the sense annotation process. 10 an- Instruction: Given the target word ‘‘virtute’’
notators with a high-level knowledge of Latin were re- ainsdentchleosseendtbeynctehien[TiAnRpGuEtTw]hteraeg,tahnedwtohred
cruited, ranging from undergraduate students to senior following meaning ‘‘virtue, personified as
researchers. Annotators – only one per target word – a deity’’, assign a score between 0 and
scored the relatedness between a usage and a sense def- 4. The score meaning is the following:
inition according to the Diachronic Usage Relatedness 0: Cannot decide
(DUReL) framework [43], specially designed for lexical 1: Unrelated
semantic change annotations. The DUReL framework 2: Distantly Related
consists of a 4-point scale for quantifying the relatedness 3: Closely Related
of a word usage and a sense, or score 0 if the annotator 4: Identical
cannot decide: Answer just with the score.</p>
        <p>Input: &lt;left context&gt; [TARGET] virtue [TARGET]
&lt;right context&gt;
• 0 - Cannot decide
• 1 - Unrelated
• 2 - Distantly related
• 3 - Closely related
• 4 - Identical</p>
      </sec>
      <sec id="sec-1-3">
        <title>3.2. Data preparation</title>
        <p>This process yields a total of 8,989 prompts for the
regression task.</p>
        <p>As for the binary classification task, the DuREL 1-to-4</p>
        <p>Table 1 shows an example of the usage annotation scale was binary encoded as follows:
for target word beatus. The senses presented to the an- • Pairs of sense and sentence scores equal to or
notators were: (a) “blessed”, (b) “rich”, (c) “fortunate”, above 3 were labelled as yes;
(d) “happy” and (e) “rewarded”. Let’s focus on the sense • Pairs of sense and sentence scores equal to or
“blessed”, which only emerged later with the advent of below 2 were labelled as no.
Christianity. Notice how it scores 1 for the first usage, The prompt is the following:
dated 46 BC, while it scores 4 for the second usage, dated
circa 1100 AD. Listing 2: Prompt generated by each sense annotation for</p>
        <p>Target word virtus was chosen for calculating the inter- binary classification task.
annotator agreement between four annotators: the
average pairwise agreement computed as Spearman
correlation coeficient was 0.69, comparable with
interannotator agreement for modern languages, e.g., English
0.69, Swedish 0.57 and German 0.59 [43]. See [6] for
the detailed process behind the creation and annotation
of the dataset.</p>
        <p>Instruction: Given the target word ‘‘virtute’’
and the sentence in input where the word
is enclosed by the [TARGET] tag, and the
following meaning ‘‘virtue, personified as
a deity’’, assign a label "yes" or "no".</p>
        <p>The label meaning is the following:
"yes": The sense for the target word occurrence</p>
        <p>is correct
"no": The sense for the target word occurrence</p>
        <p>is not correct
Answer just with the label.</p>
        <p>Pairs of sense and sentence were split in a stratified
manner, based on the scores assigned to each sense. This
9https://logeion.uchicago.edu/
Input: &lt;left context&gt; [TARGET] virtue [TARGET]
&lt;right context&gt;</p>
        <p>Pairs of sense and sentence with score 0 were not
considered in this experiment; thus, with respect to the
scores distribution in Table 2, the training set for binary
classification task consists of 6,255 instances instead of
6,299, and the testing set has 2,675 examples instead of
2,690, yielding a total of and 8,930 prompts. This binary
encoded dataset comprises 956 instances of class yes and
1,719 no, resulting in a very imbalanced dataset in which
class yes represents only 35.73% of the entire dataset.</p>
        <p>The idea behind this work is to leverage this dataset
for building a benchmark for the evaluation of LLMs in
disambiguating Latin words as described in the following
section.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Methodology</title>
      <p>As stated in the introduction, one of the aims of this paper
is to assess whether fine-tuning on LLMs can improve
their performance on a lower-represented language,
compared to a zero-shot setting. To do so, we exploit the
prompt dataset created from LatinISE, described in
Section 3. Tables 3 and 4 introduce the LLMs of choice and
summarise their characteristics.
• LLaMA-3 instruction-tuned. We use publicly
available checkpoints of Meta’s LLaMA 3.3-70B</p>
      <p>During training, we use the following parameters:
 = 32, ℎ = 64, _ = 2 − 4 and
ℎ_ = 32. We train all models for five epochs on
the whole training dataset. The training was performed
using a single GPU NVIDIA RTX A6000 with 48GB of
memory.</p>
    </sec>
    <sec id="sec-3">
      <title>5. Evaluation</title>
      <p>and 3.1-8B variants with instruction tuning, ac- As mentioned in Section 1, our study has two objectives.
cessed via the TogetherAI API10 and Unsloth First, we want to test the models ability to disambiguate
API11, respectively; Latin senses in a zero-shot setting. In this way, we aim to
• GPT-4o-mini. accessed via Microsoft Azure API, first establish how well the model inherent multilingual
this model is used without any task-specific train- knowledge performs in accurate sense prediction. Next,
ing. Prompting is designed to simulate realistic we perform task-specific fine-tuning, which enables us
WSD instructions. to adapt both standard and instruction versions of LLMs.</p>
      <p>The objective is to quantify the gain obtained through</p>
      <p>For zero-shot WSD, we directly use the prompt test this additional training step.
set, unseen during fine-tuning (see Section 4.2). After a It is worth noticing that the dataset of choice was
preliminary prompt engineering step, we use the prompt initially devised for the Unsupervised LSCD task [13],
in Listing 3, which is the same as the one used for fine- not for WSD; therefore, comparing the results of the
tuning. shared task with the results of this work is not feasible.
GPT-4o-mini and LLaMA-3.3-70B-instruct-turbo
4.2. Fine-tuning act as a zero-shot baseline for this experiment, to assess
the capabilities of models not specially devised or
fineUsing the training split of the dataset, we fine-tune the tuned for the Latin WSD task.
open-weight LLaMA-3.1-8B model. Given the compu- It is crucial to note that the dataset is highly
imbaltational constraints associated with full fine-tuning of anced, as many instances are annotated with 1, since each
large models, we adopt a parameter-eficient fine-tuning word occurrence is generally assigned a single meaning;
(PEFT) approach based on Low-Rank Adaptation (LoRA). consequently, all other meanings receive the lowest score.</p>
      <p>LoRA [44] introduces trainable, low-rank matrices into Notice that all the metrics are computed with the dataset
each transformer layer to adapt the model to a down- imbalance in mind. Balanced Accuracy12 is defined as
stream task. Instead of updating all model parameters, the average recall obtained inch class. Weighted
PreciLoRA freezes the pre-trained weights and injects a low- sion13, Recall14 and F115 calculate metrics for each
larank decomposition into the linear projections of the bel, and find their average weighted by support. Finally,
self-attention and/or feed-forward layers. This strategy Macro F1 and Micro F1 scores are variants of F1. The
significantly reduces the number of trainable parameters former is the only metric that does not take into account
and memory usage, allowing eficient fine-tuning even label imbalance, but computes metrics for each label and
on consumer-grade GPUs. We use the implementation ifnds their unweighted mean; the latter calculates
metprovided by the Unsloth library, which enables us to re- rics globally by counting the total true positives, false
duce the required memory and accelerate the training negatives and false positives. Details about the DuREL
process. During the training, we format the instruction annotation statistics are reported in Table 2.
data using the prompt reported in Listing 3 by relying We release the following resources, available on
on the chat template specific to the LLaMA models. GitHub16: i) the source code; ii) instruction fine-tuning
Listing 3: Prompt used for the fine-tuning. and testing data; iii) links to the fine-tuned models on
HuggingFace and the outputs of all evaluated models.</p>
      <p>System: &lt;Instruction&gt;
User: &lt;Input&gt;
Assistant: &lt;Output&gt;
10https://www.together.ai/
11https://unsloth.ai/</p>
      <sec id="sec-3-1">
        <title>5.1. Regression task</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions and Future Works</title>
      <p>Table 5 illustrates the results of the WSD task. Mean This study explores the ability of Large Language Models
Squared Error (MSE) and Root Mean Squared Error (LLMs) to address Word Sense Disambiguation (WSD)
(RMSE) show that the fine-tuned model is better at pre- in Latin, a historically rich yet computationally
lowdicting the annotation score. To give a complete overview resourced language. The first contribution of our work is
of the results, we also provide classification metrics. Al- the release of a dataset for evaluating the WSD abilities
though GPT-4o-mini shows a higher precision, LLaMA- of LLMs in Latin. This dataset is created by leveraging
3.1-8B-instruct-ft outperforms every other model. It an existing manually annotated dataset. Then, using the
is interesting to note the high diference in performance new dataset and through both zero-shot and fine-tuned
between LLaMA-3.3-70B-instruct-turbo and LLaMA- evaluations, we observed that while general-purpose
3.1-8B-instruct-ft. These results prove that the fine- LLMs exhibit a promising baseline ability to handle Latin
tuning of a medium-sized LLM using a single GPU can WSD, significant improvements are achieved through
overcome a model of the same family with about nine task-specific fine-tuning. The fine-tuned
LLaMA-3.1-8Btimes the number of parameters. instruct model outperformed larger and more
resource</p>
      <p>To better understand the behaviour of each model, intensive models in accuracy and F1 scores,
underscorwe report the confusion matrix of each model in B. The ing the impact of targeted instruction tuning, even on
matrices of GPT-4o-mini (Figure 1) and LLaMA-3.3-70B medium-sized architectures. Nevertheless, challenges
(Figure 2) show that the models often confuse the label 1 remain. The dataset’s inherent class imbalance, with a
with other labels. It is interesting to note that GPT-4o- predominance of “unrelated” sense labels, likely
influmini confuses the label 1 with the label 4 508 times. This enced the models’ predictions and underscores the need
behaviour is more evident in LLaMA-3.1-8B-instruct for more balanced and semantically diverse training data.
(Figure 3) where 913 instances labelled as 1 are confused Future work will focus on three main directions: i)
Exwith label 3 and 579 with labels 4. panding the annotated dataset to include more lemmata</p>
      <p>The fine-tuned model LLaMA-3.1-8B-instruct-ft and a broader variety of senses; ii) Evaluating model
per(Figure 4) is the best at recognising label 1. This be- formance on additional semantic tasks, such as definition
haviour is evident since the model tends to overfit on the generation and contextual paraphrasing in Latin; iii)
Exmore frequent class. ploring multilingual and cross-lingual transfer learning
strategies, leveraging annotations from related Romance
5.2. Binary Classification task languages to further boost Latin model capabilities.
Results of the WSD task framed as a binary
classification task are in Table 6, as well as the confusion matrix Acknowledgments
of each model in Appendix B. Our proposed fine-tuned
model LLaMA-3.1-8B-instruct-ft shows a strong per- We acknowledge the support of the PNRR project FAIR
formance boost with respect to LLaMA-3.1-8B-instruct Future AI Research (PE00000013), Spoke 6 - Symbiotic AI
and LLaMA-3.3-70B-instruct-turbo. On the other (CUP H97G22000210007) under the NRRP MUR program
hand, GPT-4o-mini performance is in line with LLaMA- funded by the NextGenerationEU.
3.1-8B-instruct-ft, and even surpasses it in Precision
and Accuracy. In general, our LLaMA-3.1-8B-instruct- References
ft outperforms the baseline models. Figure 8 shows that
LLaMA-3.1-8B-instruct-ft performs the best on class
no, while GPT-4o-mini predicts class yes better.</p>
      <p>Precision Recall Accuracy</p>
    </sec>
    <sec id="sec-5">
      <title>A. Translation</title>
      <p>Cicero’s Tuscolanae Disputationes
la: [...] Dico enim constanter grauiter sapienter
fortiter. Haec etiam in eculeum coiciuntur, quo uita
non adspirat beata. - Quid igitur? solane beata
uita, quaeso, relinquitur extra ostium limenque
carceris, cum constantia grauitas fortitudo
sapientia reliquaeque uirtutes rapiantur ad tortorem
nullumque recusent nec supplicium nec dolorem?
[...]
en: For I say constantly, gravely, wisely, and strongly.</p>
      <p>These things are also cast into the rack, to which
life does not aspire for happiness. - What then?
Is a blessed life alone, I pray you, left outside the
door and threshold of the prison, when constancy,
gravity fortitude, wisdom and the other virtues
are snatched away to the torturer and refuse
neither punishment nor pain?
Robertus Grossetest’s De libero arbitrio
la: [...] Ex quo fit, ut de nihilo creauerit omnia.”
Eadem itaque ratione solus facit ominia, nulla
adiutus natura. Horum autem obiectorum solutio
haberi potest ut uidetur ex uerbis beati Bernardi
sic dicentis: “Ipsa gratia Liberum arbitrium
excitat, cum seminat cogitatum. Sanat, cum mutat
afectum; roborat, ut perducat ad actum; seruat,
ne sentiat defectum.” [...]
en: From which it comes about that He created all
things out of nothing.” Therefore, by the same
reasoning, He alone creates all things, without
any help from nature. But the solution to these
objections can be found, as can be seen from the
words of Blessed Bernard, who says thus: “Grace
itself awakens Free will when it sows thought. It
heals when it changes afection; it strengthens,
so that it may lead to action; it preserves, so that
it may not feel a deficiency.”</p>
    </sec>
    <sec id="sec-6">
      <title>B. Confusion Matrices</title>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Grammarly in order
to: Paraphrase and reword, Improve writing style, and Grammar and spelling check. After using
these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>linguistic resources and nlp tools for latin</article-title>
          ., in: LDK [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Straka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Straková</surname>
          </string-name>
          , Tokenizing, pos tagging,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>(Posters)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>11</lpage>
          .
          <article-title>lemmatizing and parsing ud 2.0 with udpipe</article-title>
          , in: [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mambrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Franzini</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. M.</surname>
          </string-name>
          <article-title>Cec- Proceedings of the CoNLL 2017 shared task: Multi-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Interlinking through lemmas</article-title>
          .
          <source>the lexical collection dencies</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>of the lila knowledge base of linguistic resources for</article-title>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Stroh</surname>
          </string-name>
          ,
          <article-title>Latein ist tot, es lebe Latein!: kleine</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>latin</surname>
          </string-name>
          ,
          <source>Studi e Saggi Linguistici</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>177</fpage>
          -
          <lpage>212</lpage>
          . Geschichte einer grossen Sprache, List Taschen[3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Litta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cecchini</surname>
          </string-name>
          , M. Pellegrini, buch,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Moretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rufolo</surname>
          </string-name>
          , G. Pedonese, The lila knowl- [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leonhardt</surname>
          </string-name>
          ,
          <article-title>Latin: Story of a world language</article-title>
          , Har-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>edge base of interoperable linguistic resources for</article-title>
          vard University Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>latin. architecture and current state (</article-title>
          <year>2022</year>
          ). [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , H. Du[4]
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cassotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , D. Di Pierro, bossarsky, N. Tahmasebi, Semeval-2020 task 1: Un-
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <article-title>Using graph databases for historical lan- supervised lexical semantic change detection</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>guage data: Challenges and opportunities (</article-title>
          <year>2023</year>
          ). arXiv:
          <year>2007</year>
          .
          <volume>11464</volume>
          . [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kilgarrif</surname>
          </string-name>
          , Tools for historical [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cecchini</surname>
          </string-name>
          , M. Pel-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>corpus research, and a corpus of latin, New methods legrini, Overview of the EvaLatin 2020 evaluation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>in historical corpus linguistics 1</source>
          (
          <year>2013</year>
          )
          <fpage>247</fpage>
          -
          <lpage>257</lpage>
          . campaign, in: R. Sprugnoli, M. Passarotti (Eds.), [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kondakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burman</surname>
          </string-name>
          ,
          <source>Proceedings of LT4HALA 2020 - 1st Workshop on</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>framework for latin diachronic lexical semantics, tion (ELRA), Marseille</article-title>
          , France,
          <year>2020</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>Journal of Latin Linguistics</source>
          <volume>21</volume>
          (
          <year>2022</year>
          )
          <fpage>47</fpage>
          -
          <lpage>105</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .lt4hala-
          <fpage>1</fpage>
          .16/. [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Minozzi</surname>
          </string-name>
          , Latin wordnet, una rete di conoscenza [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cecchini</surname>
          </string-name>
          , M. Fan-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>semantica per il latino e alcune ipotesi di utilizzo toli, G. Moretti, Overview of the EvaLatin 2022</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          (
          <year>2017</year>
          )
          <fpage>123</fpage>
          -
          <lpage>134</lpage>
          .
          <source>on Language Technologies for Historical and An</source>
          [8]
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , P. J.
          <string-name>
            <surname>Burns</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Stewart</surname>
          </string-name>
          , T. Cook, cient Languages,
          <source>European Language Resources</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Besnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J. B.</given-names>
            <surname>Mattingly</surname>
          </string-name>
          , The Classical Lan- Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>guage Toolkit</surname>
          </string-name>
          :
          <article-title>An NLP framework for pre-modern URL: https://aclanthology</article-title>
          .org/
          <year>2022</year>
          .lt4hala-
          <fpage>1</fpage>
          .29/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          languages,
          <source>in: Proceedings of the 59th Annual</source>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Iurescia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          , Overview
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Meeting of the Association for Computational Lin- of the evalatin 2024 evaluation campaign</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>guistics and the 11th International Joint Confer- Proceedings of the Third Workshop on Language</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Demonstrations</surname>
          </string-name>
          ,
          <article-title>Association for Computational (LT4HALA)@ LREC-COLING-</article-title>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>190</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>29</lpage>
          . URL: https:// 197.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          aclanthology.org/
          <year>2021</year>
          .acl-demo.3. doi:
          <volume>10</volume>
          .18653/ [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <article-title>Latin bert: A contextual lan-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          v1/
          <year>2021</year>
          .
          <article-title>acl-demo.3. guage model for classical philology</article-title>
          , arXiv preprint [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Straka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hajic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Straková</surname>
          </string-name>
          , Udpipe: trainable arXiv:
          <year>2009</year>
          .
          <volume>10053</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <article-title>pipeline for processing conll-u files performing tok-</article-title>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>Conference on Language Resources</source>
          and Evaluation arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>(LREC'16)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>4290</fpage>
          -
          <lpage>4297</lpage>
          . [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Weaver</surname>
          </string-name>
          , Translation, in: Proceedings of the
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>conference on mechanical translation</source>
          ,
          <source>1952. Science Society</source>
          , volume
          <volume>45</volume>
          ,
          <year>2023</year>
          . [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <article-title>Word sense disambiguation: A survey</article-title>
          , [33]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kibria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dipta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Adnan</surname>
          </string-name>
          , On functional compe-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>ACM computing surveys (CSUR) 41 (</article-title>
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          .
          <article-title>tence of llms for linguistic disambiguation</article-title>
          , in: Pro[21]
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Gale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Church</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yarowsky</surname>
          </string-name>
          ,
          <article-title>A method ceedings of the 28th Conference on Computational</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>for disambiguating word senses in a large corpus</article-title>
          ,
          <source>Natural Language Learning</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>Computers and the Humanities</source>
          <volume>26</volume>
          (
          <year>1992</year>
          )
          <fpage>415</fpage>
          -
          <lpage>439</lpage>
          . [34]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Yae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Skelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Ranly</surname>
          </string-name>
          , P. M. LaCasse, [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <article-title>Leveraging large language models for word sense</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>tention is all you need</article-title>
          ,
          <source>Advances in neural infor- tions 37</source>
          (
          <year>2025</year>
          )
          <fpage>4093</fpage>
          -
          <lpage>4110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <source>mation processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ). [35]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Musacchio</surname>
          </string-name>
          , L. Siciliani, Ita-sense[23]
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Short</surname>
          </string-name>
          ,
          <article-title>A latin dictionary. clarendon, evaluate llms' ability for italian word sense disam-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          1879.
          <article-title>biguation: A calamita challenge</article-title>
          , in: Proceedings [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lendvai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wick</surname>
          </string-name>
          ,
          <article-title>Finetuning latin bert for word of the 10th Italian Conference on Computational</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>sense disambiguation on the thesaurus linguae lati- Linguistics (CLiC-it</article-title>
          <year>2024</year>
          ), Pisa, Italy,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          nae, in: Proceedings of the Workshop on Cognitive [36]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Siciliani</surname>
          </string-name>
          , E. Musacchio, G. Semeraro,
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>Aspects of the Lexicon</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>41</lpage>
          .
          <article-title>Exploring the word sense disambiguation capa</article-title>
          [25]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ghinassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tedeschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marongiu</surname>
          </string-name>
          , R. Navigli,
          <article-title>bilities of large language models</article-title>
          , arXiv preprint
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <source>Language pivoting from parallel arXiv:2503.08662</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>corpora for word sense disambiguation of historical</article-title>
          [37]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raganato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <article-title>Xl-wsd: An extra-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <article-title>the 2024 Joint International Conference on Compu- word sense disambiguation</article-title>
          , in: Proceedings of
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <article-title>ation (LREC-COLING</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>10073</fpage>
          -
          <lpage>10084</lpage>
          . ume 35,
          <year>2021</year>
          , pp.
          <fpage>13648</fpage>
          -
          <lpage>13656</lpage>
          . [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Pivot language approach for [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Clackson</surname>
          </string-name>
          ,
          <article-title>A companion to the Latin language,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <article-title>phrase-based statistical machine translation</article-title>
          , in: John Wiley &amp; Sons,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaenen</surname>
          </string-name>
          , A. van den Bosch (Eds.), Proceedings of [39]
          <string-name>
            <given-names>J.</given-names>
            <surname>Clackson</surname>
          </string-name>
          , G. Horrocks, The Blackwell history of
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <article-title>the 45th Annual Meeting of the Association of Com- the Latin language</article-title>
          , John Wiley &amp; Sons,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>putational Linguistics</surname>
            , Association for Computa- [40]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Glare</surname>
          </string-name>
          , Oxford Latin Dictionary, number
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <source>tional Linguistics</source>
          , Prague, Czech Republic,
          <year>2007</year>
          , pp.
          <source>Num. 1-4 in Oxford Latin Dictionary</source>
          , Clarendon
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          856-
          <fpage>863</fpage>
          . URL: https://aclanthology.org/P07-1108/. Press,
          <year>1982</year>
          . URL: https://books.google.it/books?id= [27]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          ,
          <source>WordNet: An electronic lexical H7HhzAEACAAJ.</source>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          database, MIT press,
          <year>1998</year>
          . [41]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lewis</surname>
          </string-name>
          <string-name>
            <surname>Charlton</surname>
          </string-name>
          ,
          <article-title>An elementary latin dictionary</article-title>
          , [28]
          <string-name>
            <given-names>H.</given-names>
            <surname>Naveed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. U.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saqib</surname>
          </string-name>
          , S. An- New York, Cincinnati, and Chicago. American Book
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>war</surname>
            , M. Usman,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mian</surname>
            ,
            <given-names>A Company</given-names>
          </string-name>
          (
          <year>1890</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <article-title>comprehensive overview of large language models</article-title>
          , [42]
          <string-name>
            <given-names>C. d. F.</given-names>
            <surname>Du</surname>
          </string-name>
          <string-name>
            <surname>Cange</surname>
          </string-name>
          , Glossarium mediae et infimae
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <given-names>ACM</given-names>
            <surname>Trans. Intell</surname>
          </string-name>
          . Syst. Technol. (
          <year>2025</year>
          ).
          <source>URL: https: latinitatis: AZ</source>
          , volume
          <volume>7</volume>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Favre</surname>
          </string-name>
          ,
          <year>1886</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          //doi.org/10.1145/3744746. doi:
          <volume>10</volume>
          .1145/3744746, [43]
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , S. Schulte im Walde, S. Eckmann,
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <string-name>
            <surname>just Accepted</surname>
          </string-name>
          .
          <article-title>Diachronic usage relatedness (DURel):</article-title>
          A frame[29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Li,
          <article-title>work for the annotation of lexical semantic change,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <article-title>meet nlp: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2405.12819 ings of the 2018 Conference of the North Amer-</source>
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          (
          <year>2024</year>
          ).
          <article-title>ican Chapter of the Association for Computational [</article-title>
          30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rezaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Pilehvar</surname>
          </string-name>
          , J. Camacho- Linguistics
          <source>: Human Language Technologies</source>
          , Vol-
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          <string-name>
            <surname>Collados</surname>
          </string-name>
          ,
          <article-title>Analysis and evaluation of language mod- ume 2 (Short Papers)</article-title>
          , Association for Computa-
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <article-title>els for word sense disambiguation</article-title>
          ,
          <source>Computational tional Linguistics</source>
          , New Orleans, Louisiana,
          <year>2018</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <source>Linguistics</source>
          <volume>47</volume>
          (
          <year>2021</year>
          )
          <fpage>387</fpage>
          -
          <lpage>443</lpage>
          .
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          . URL: https://aclanthology.org/N18-2027/. [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bevilacqua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raganato</surname>
          </string-name>
          , R. Navigli, doi:10.18653/v1/
          <fpage>N18</fpage>
          -2027.
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <article-title>Recent trends in word sense disambiguation: A sur-</article-title>
          [44]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <string-name>
            <surname>intelligence</surname>
          </string-name>
          ,
          <source>International Joint Conference on Ar- adaptation of large language models., ICLR</source>
          <volume>1</volume>
          (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <source>tificial Intelligence</source>
          , Inc,
          <year>2021</year>
          , pp.
          <fpage>4330</fpage>
          -
          <lpage>4338</lpage>
          . 3. [32]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabiddu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nikolaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fourtassi</surname>
          </string-name>
          , Comparing
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          <article-title>ceedings of the Annual Meeting of the Cognitive Figure 4: LLaMA-3.1-8B-instruct-ft confusion matrix (regression task).</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>