<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Tough Hoe to Row: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Idiom Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Debora Ciminari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Barrón-Cedeño</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università di Bologna</institution>
          ,
          <addr-line>Corso della Repubblica, 136, 47121, Forlì</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Idiomatic expressions (IEs) are a core part of language but exhibit considerable complexity and heterogeneity, posing significant challenges to natural language processing (NLP). Efective automatic idiom processing could enhance our understanding of language and could benefit downstream tasks such as machine translation. However, previous research fails to adopt a comprehensive approach and struggles to consider languages diferent from English and the rich variety of idiom types. We thus aim to develop a version of LLaMA 3.2 that is instruction fine-tuned on data in three languages - English, Italian, and Portuguese - and covering a wide range of IE types. Specifically, we build on already annotated corpora to create our instruction-formatted dataset, and we employ instruction fine-tuning on two tasks - sentence disambiguation and idiom identification. We then investigate the efectiveness of this approach and assess the impact of the instruction language on the model's performance. We release a multilingual instruction-formatted dataset for automatic idiom processing. Additionally, we show that fine-tuning might help the model disambiguate between literal and idiomatic sentences, while gains in idiom identification are limited and require further investigation. The F 1-measure also suggests that the choice of the instruction language significantly afects the results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;idiomatic expressions</kwd>
        <kwd>multilinguality</kwd>
        <kwd>sentence disambiguation</kwd>
        <kwd>idiom identification</kwd>
        <kwd>instruction fine-tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Such complexity makes it challenging to deal with</title>
        <p>IEs in the field of natural language processing (NLP).</p>
        <p>
          Idiomatic expressions (IEs) are a prominent component Given the pervasive presence of IEs in language,
efecof language and constitute a broad and heterogenous tive idiom processing is needed to gain a deeper and
category. The canonical definition describes IEs as ex- more comprehensive understanding of language.
Idiompressions whose meaning cannot be derived from the aware NLP can benefit downstream tasks, such as text
meanings of their subparts [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ]. The typical example summarisation, sentiment analysis, question answering,
is to kick the bucket, whose meaning ‘to die’ can- and machine translation [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ].
not be inferred from ‘kick’, ‘the’, or ‘bucket’. However, Most NLP applications focus on English, leaving
mulsome cases do not fit this definition. For instance, the tilingual idiom processing largely unexplored. Recent
meaning of to pull the strings (‘to use influence studies adopt encoder-based models [
          <xref ref-type="bibr" rid="ref5 ref7">7, 8, 5</xref>
          ], while
studor connections’) does bear a sort of (metaphorical) rela- ies on decoder-based ones remain relatively sparse.
Antion to its components. Another category of IEs can be other issue related to previous research is the models’
identified, i.e. potentially idiomatic expressions or PIEs [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], lack of a robust generalisation and its poor performance
which are expressions that can have a literal or an id- on unseen idioms [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ].
iomatic meaning, depending on the context. That is the To fill these gaps, we develop an instruction fine-tuned
case of the first idiom presented as example, to kick version of LLaMA 3.2 1B in three languages, English,
Italthe bucket, which can also take a literal meaning, as ian, and Portuguese, and on two tasks, sentence
disamin She got frustrated and kicked the bucket biguation and idiom identification:
of paint across the garage.
        </p>
        <p>
          In light of this diversity, the traditional definition has Task 1: Sentence Disambiguation. Framed as a
bibeen challenged in favour of a more complex, multi- nary text classification task, it aims at
discrimifaceted view that emphasises the heterogeneous nature of nating idiomatic from literal sentences.
idiomaticity, conceived of as a continuum where
expressions can be placed depending on multiple factors [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Task 2: Idiom Identification. Framed as a span
labelling task, the model must identify the sequence
of characters that correspond to an IE.
specific span constituting the IE. Figure 1 shows some 2. Related Work
examples of idiomatic and literal sentences.</p>
        <p>Given this interdependence, our data is designed to The need to develop ad hoc techniques for the automatic
address both tasks simultaneously. For instance, in the processing of idioms is widely acknowledged to acquire
ifrst example, the model’s answer is expected to be kiss a better understanding of language [12, 13, 14]. Multiple
of death, showing that the model correctly identified natural language understanding (NLU) tasks face
chalthe sentence as idiomatic and proceeded to detect the lenges related to IEs, despite the use of state-of-the-art
span where the idiom occurs. (SOTA) solutions. Among these tasks are sentiment
anal</p>
        <p>
          Starting from annotated corpora, we design our ysis [15], paraphrase generation [16], natural language
instruction-formatted data, comprising an instruction inference [17], dialog models [18], and machine
transla(the task description), the input (the sentence), and the tion [19, 20].
expected output [9]. Additionally, our dataset is multi- Recent approaches employ encoder-based models, like
lingual in that it comprises inputs in all three languages. BERT [21], and leverage their contextual language
emWhat difers is the instruction language, for which three bedding. Studies have found that this type of models
subsets are created. We then fine-tune LLaMA 3.2 1B on struggles with non-compositionality and has dificulty
a subset of our corpus and carry out evaluation based in disambiguating between literal and idiomatic
meanon the F1-measure. We thus examine the efectiveness ings [
          <xref ref-type="bibr" rid="ref7">22, 7, 23</xref>
          ]. Yu and Ettinger [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] explore the ability of
of instruction fine-tuning. Besides, we investigate the encoder-based models to handle semantic
compositionalimpact of the instruction language in scenarios where the ity. In particular, they use five models, such as BERT and
instruction language and the input language are the same
and scenarios where they difer. To date, such an impact
1The dataset and the implementation are both available at https:
//github.com/TinfFoil/MultIdiomLlama
some of its variants, to examine to what extent these can Previous work falls short of capturing the complexity
represent words in isolation and in phrases. They reach associated with IEs on multiple levels. On the one hand,
the conclusion that these models grasp the meaning of studies have mostly focused on English, leaving other
individual words but struggle to capture composed mean- languages aside. On the other hand, they have failed to
ing. Zeng and Bhat [8], instead, propose the iDentifier of cover a wide enough variety of idiom types. Furthermore,
Idiomatic expressions via Semantic Compatibility (DISC) studies agree on the limited ability of diferent models to
to perform extraction and identification of PIEs. Their handle and process unseen idioms.
framework leverages BERT to harness both the semantic
and the syntactic properties of PIEs, and extract and
identify all the expressions from a corpus. Results show that 3. Instruction Data Creation
their model is able to outperform SOTA baselines, even
in zero-shot settings, but it exhibits poor cross-domain Source Datasets. We start from three datasets to build
performance. In addition, while including a notable array our instruction-formatted data in English, Italian, and
of idiom types, it focuses on English data only. Portuguese: AStitchInLanguageModels [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], ID10M [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
        </p>
        <p>Some approaches take steps to include multilinguality. and MultiCoPIE [26].</p>
        <p>
          TMaoydyealsr, Madaadtaabsuets hini eEtnagll.is[5h]arnedlePaosertAugSutietsceh,inanLdanegxupaagned-
muAltSit-iwtcohrIdnLexapnrgeusasigoenM(oMdWelsE)isusaagdeaitnasEentgolifshidainodmPatoircit with Galician data for the SemEval-2022 Task 2 [24]. tuguese. It comprises examples containing PIEs in the
Working on the idiomaticity detection task, they employ form of noun compounds, annotated according to two
difmodels like BERT and XLNET [25] and conclude that ferent schemes. In the first one, sentences are labelled as
models do not benefit from the inclusion of the context having an idiomatic or a literal meaning. The second one
and that the zero-shot setting still produces poor results. is more fine-grained in that it provides a paraphrase of
This corpus represents the first significant attempt to in- the MWE’s meaning and labels each example into one of
clude multilinguality for the automatic idiom processing ifve categories: literal, idiomatic, non-idiomatic, proper
and provides baselines for languages other than English. noun, or meta usage. We use data labelled with the first
This dataset is, however, limited in that it only contains annotation scheme for the zero-shot scenario, with no
noun compounds, thus lacking diversity and failing to overlap of PIEs between the training and the test sets.
incorporate other types, such as verb and prepositional ID10M is a framework that introduces a
multilinphrases. Another attempt at multilingual idiom process- gual Transformer-based architecture for sentence
dising is Tedeschi et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]’s ID10M. They develop a frame- ambiguation and idiom identification and provides
anwork of systems and training and validation data for the notated datasets in multiple languages. It includes
goldidiom identification task in 10 languages. Their findings standard data in English, German, Italian, and Spanish,
confirm the distinction between zero-shot and few-shot and silver-standard data automatically annotated in 10
performance. languages: Chinese, Dutch, English, French, German,
Italian, Japanese, Polish, Portuguese, and Spanish. A list of
        </p>
        <p>Sentsova et al. [26] release the Multilingual Corpus of MWEs is compiled from the Wiktionary,2 and sentences
IPtoatleiannti,aallnydIdCiaotmalaatnic, Ewxhpircehssiinocnlsu(dMesulatdiCdoitPioIEn)alinliRnguussisiatinc, containing MWEs are collected from WikiMatrix [30],3
features, such as semantic compositionality, head part- a multilingual corpus in 83 languages with parallel
senof-speech, and English equivalents. By fine-tuning XML- tences retrieved from Wikipedia. The gold-standard data
RoBERTa, they explore cross-lingual transfer, which are curated by native professional annotators, while the
might benefit lower-resourced languages. Moreover, the silver-standard data are annotated based on the
Wikinclusion of idioms having an English equivalent in the tionary entry of MWEs: when the MWE is marked as
training set has proved helpful in disambiguating be- idiomatic, all occurrences of the MWEs are labelled as
tween literal and idiomatic usages. idiomatic, and vice versa. Since these annotations do</p>
        <p>Encoder-decoder models have also been used for not necessarily reflect the actual MWE usage in context,
the development of idiom-aware systems. Zeng and Tedeschi et al. develop a dual-encoder architecture to
Bhat [27] opt for the BART [28] sequence-to-sequence refine silver-standard data. They also incorporate a BIO
(seq2seq) model. Their Generation of Idiom Embedding tagging scheme [31] to identify the tokens belonging to
with Adapter (GIEA) model exhibits an improved ability the MWE, where B indicates the first token of a span, I
at representing idiomaticity, but it is limited to English signals the intermediate token(s), and O designates the
and does not show an enhanced generalisation capability. tokens out of any span.</p>
        <p>Other studies have examined the performance of large
language models (LLMs) [29, 11], finding that they fail</p>
      </sec>
      <sec id="sec-1-2">
        <title>2https://pypi.org/project/wiktextract/</title>
        <p>to handle idiomaticity and that they tend to be outper- 3https://github.com/facebookresearch/LASER/tree/main/tasks/
formed by other transformer-based models. WikiMatrix</p>
        <p>MultiCoPIE is a dataset annotated for sentence disam- Table 1
biguation and idiom identification in Russian, Italian, and Examples from the instruction dataset with the output
proCatalan. To build this dataset, a list of PIEs is compiled duced given an instruction and an input in diferent language
for each language from online resources, such as the combinations.</p>
        <p>Dizionario italiano De Mauro4, the Russian Wiktionary5, Examples
the Diccionari català-anglès/anglès-català de locucions Input (en): Although the encounter was bathed in
suni frases fetes6. PIEs with varying characteristics are in- shine, the match failed to reach boiling point.
cluded, specifically, PIEs with diferent parts of speech Instr. (en): Can you spot the idiomatic expressions
lurkas heads. For example, appeso a un filo (‘hung by a ing within this sentence? They are:
thread’) has the adjective appeso (‘hung’) as head, while Output: boiling point
con l’acqua alla gola (literally ‘with water up to Imnopsutrta(dpot)q:uNaodsriúcóltpi mteorossarneoasl,izmanuditoasmuanniovberrsaisdaaédreesatsê.m
the throat’, meaning ‘to be in serious dificulty’) is headed Instr. (it): Un’analisi della frase rivela la presenza delle
by the preposition con (‘with’). The dataset also covers seguenti costruzioni idiomatiche:
PIEs with diverse degrees of semantic compositionality. Output: Nessuna.
PIEs with a higher level of compositionality comprise at Input (en): After the day I had today , I feel like I could
least one cue to the meaning of the expression. An exam- walk on water.
ple is ammazzare il tempo (‘to kill time’), where the Instr. (pt): A frase contém as seguintes expressões
word tempo (‘time’) helps interpreting the expression as idiomáticas:
‘to spend time trying not to get bored’. On the other hand, Output: walk on water
essere al settimo cielo (‘to be on cloud nine’) is In recent years, many universities have demonstrated
quadmore opaque since it does not comprise any hints about cAonpatenraslypseisrfoofr mthiengseanetreinaclemraenveoaelusvtrhees.presence of the
followthe meaning ‘to be at the peak of happiness’. After se- ing idiomatic constructions:
lecting the PIEs, sentences are automatically extracted None.
from the Open Super-large Crawled Aggregated coRpus The sentence contains the following idiomatic expressions:
(OSCAR)7 [32], a multilingual corpus generated from
Common Crawl8, and refined through manual selection.</p>
        <p>The two surrounding sentences are included to provide since all our samples include an input sentence. Finally,
context. Opening and closing tags are also employed to we change the structure of the template. While the
Allocate the lexicalised components of PIEs. The tags are paca template11 organises the instruction in “Instruction”,
used to identify all PIEs present in the target sentence “Input”, and “Response”, we modify the order so that the
and the preceding and following sentences. input is first presented, followed by the instruction and
the response, since this order better fits language
modeling underlying LLMs. This order meshes well with the
left-to-right autoregressive nature of LLaMA: as shown
in Table 1, the instruction leaves an empty slot at the
end, where the model’s response is expected. Finally, the
‘input’ and ‘output’ keys are left empty to be filled in the
following step.</p>
        <p>Creation of the Instruction Templates. To create
a dataset of instruction-formatted instances, we design
instructions in English, Italian, and Portuguese. We first
translate a seed instruction written in English into Italian
and Portuguese using LLaMA 3.2 3B9 via ollama10. With
the same model, we generate three paraphrased versions
of the instructions. We design the prompts to produce
different writing styles and perspectives, ensuring a varied Creation of the Final Dataset. We then proceed with
dataset and a high linguistic diversity. These instructions the creation of the final dataset. We extract IEs and
exare then organised in empty templates. The starting point amples from the aforementioned datasets. For English
to construct such templates is the work by Taori et al. and Portuguese, we use ID10M and
AStitchInLanguage[33], who fine-tune LLaMA 7B on instruction-formatted Models, while, for Italian, we employ ID10M and
Multidemonstrations. They design a template in English to cre- CoPIE. The processing of the AStitchInLanguageModels
ate the instruction-formatted examples and carry out the mainly focuses on extracting the actual MWEs present in
ifne-tuning. We translate their template into Italian and the sentences since it includes the dictionary form. For
Portuguese. The ‘prompt no input’ option is discarded ID10M, we process the data by reconstructing full
sentences and identifying idiomatic spans. We then create a
training and test split combining data from both ID10M
and AStitchInLanguageModels, while ensuring that no
PIEs in the test set overlap with those in the training
4https://dizionario.internazionale.it/
5https://ru.wiktionary.org/wiki/
6https://visca.com/apac/dites/
7https://huggingface.co/oscar-corpus
8https://commoncrawl.org/
9https://huggingface.co/meta-llama/Llama-3.2-3B
10https://ollama.com/</p>
      </sec>
      <sec id="sec-1-3">
        <title>The label assignment can be represented as follows:</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental Settings</title>
      <sec id="sec-2-1">
        <title>Evaluation Framework. To account for both tasks,</title>
        <p>we propose a two-fold evaluation methodology, which
allows for a comprehensive understanding of the model’s
ability to handle both the classification and the
identification challenges.</p>
        <p>We design an evaluation framework to assess the
model’s performance on the sentence disambiguation
and idiom identification tasks across various language
combinations. For Task 1 we develop a labelling
mechanism that considers multiple linguistic markers. Such
markers are used for both ground truths and predictions
to determine the label (0 or 1) to assign to each example.
These keywords are language-specific and are:
• Portuguese: ‘nenhuma’, ‘não’, ‘ausente’;
• Italian: ‘nessuna’, ‘non’;
• English: ‘none’, ‘no idiom’, ‘not contain’, ‘not’.
 (,  ) =
(,  ) =
1 ∑︁</p>
        <p>∑︁
|| ∈ ∈,∈
1
∑︁</p>
        <p>∑︁
| | ∈ ∈,∈
| ∩ |</p>
        <p>||
| ∩ |
||
(2)
(3)
where  is the predicted span,  is the ground truth span,
 is the set of predicted spans,  is the set of gold
standard spans,  is a sample, and  represents the whole
dataset. The F1-measure is then computed as the
harmonic mean of precision and recall.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Settings. The instruction fine-tuning is implemented</title>
        <p>on a subset of our dataset. This subset comprises 18,397
samples and retains the balance of the instruction dataset.</p>
        <p>To optimise the fine-tuning, QLoRA [ 35] is also employed
to reduce computational cost and memory usage.</p>
        <p>
          As for the instruction fine-tuning, a set of default
hyperparameters is configured to implement the fine-tuning
of the LLaMA-3.2 1B model for the sentence
disambiguation and idiom identification tasks. The model is trained
with a batch size of 32 across 2 epochs, using a cutof
length of 128 tokens for input sequences. For parameter- the model to prefer instructions written in English when
eficient fine-tuning, LoRA [
          <xref ref-type="bibr" rid="ref8">36</xref>
          ] is employed with a rank disambiguating between literal and idiomatic sentences.
(r) of 8, alpha of 16, and dropout rate of 0.05, specifically
targeting the query and key projection matrices. The Idiom Identification Task. Table 4 shows F1 scores,
implementation of LoRA enables to update only 851,968 averaged over 3 runs, for Task 2, before and after
instrucout of more than 1 billion parameters. The optimisa- tion fine-tuning. We can see that, in general, the model
tion process uses 4-bit quantization with NF4 format to exhibits poor performance and struggles to identify the
reduce memory requirements. The learning process is idiom contained in the input sentence. In the idiom
idenmanaged with a learning rate of 3e-4, weight decay of tification task, the improvements produced by the
in0.01, and a warmup ratio of 0.1, using the Paged AdamW struction fine-tuning are mostly lower or non-existent.
32-bit optimizer and cosine learning rate schedule with The English inputs tend to benefit more from this
aprestarts. Gradient accumulation is set to 2 steps with proach, gaining 2 points almost with all languages.
Cona maximum gradient norm of 1.0, and gradient check- versely, the model seems to struggle on Italian data, and,
pointing is enabled to optimise memory usage. The train- when associated with Italian and English instructions, it
ing uses mixed-precision computation (FP16) and em- sufers from the fine-tuning, losing 1 point. When
dealploys early stopping. ing with Portuguese sentences, instead, the model
produces slightly improved results. Instruction fine-tuning,
5. Results and Discussion therefore, does not significantly and consistently help
the model in identifying idioms. However, we should
Sentence Disambiguation Task. Table 3 shows the consider that Task 2 is much more challenging in that
F1 scores for Task 1, averaged over 3 runs, for all com- it consists in the identification of the idiom contained
binations of instruction and input language, before and in a given sentence, at the character level. As for the
after the fine-tuning. When comparing our model against instruction language, unlike Task 1, the instruction
finethe baseline model without fine-tuning, we can see that tuning does not lead the model to favour English. Instead,
the best results are achieved after the instruction fine- Portuguese instructions seem to better help the model in
tuning: the performance gains more than 2 points across detecting the idiom.
all combinations, with the Portuguese monolingual pair
increasing by almost 3 points. These findings suggest Interactions between Instruction and Input
Lanthat the approach we adopted consistently enhances the guage. The results abovementioned provide insights
model’s performance, regardless of the instruction-input into the interactions between instruction and input
lanlanguage combination. Turning to the impact of the in- guage. For Task 1, English instructions seem to aid the
struction language, the baseline results indicate that En- model in distinguishing between idiomatic and literal
glish inputs tend to prefer English instructions. On the sentences. Sentence disambiguation represents a simpler
other hand, there seem to exist some sort of interplay task that requires a global understanding of the input
between Italian and Portuguese: a slight improvement sentence. English, on which the model is mostly
preis produced when Italian data are associated with Por- trained [
          <xref ref-type="bibr" rid="ref9">37</xref>
          ], might better allow LLaMA 3.2 to
compretuguese instructions and vice versa. Conversely, our hend the task to carry out. Idiom identification, instead,
results show that the model yields better F1 scores when is a much more complicated task requiring the model to
prompted with instructions in English across all language have a deeper and more precise comprehension, not only
combinations. This suggests that the fine-tuning leads at the sentence level, but also at the phrase level. This
entails a finer knowledge of the input language as well.
        </p>
        <p>Besides, when the instruction and the input language
difer, the model is prompted in one language and asked
to answer in another, which creates an additional layer
of complexity. Diferent types of interactions between
instruction and input language thus emerge, and future
research is needed to investigate such interactions based
on the languages involved and the task under study.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusions</title>
      <p>In this paper, we developed a fine-tuned LLaMA 3.2 1B
on two tasks: sentence disambiguation and idiom
identification. We adopted a multilingual approach in that
we considered three languages, English, Italian, and
Portuguese, and we employed instruction fine-tuning. To
carry out the fine-tuning, we first constructed a
multilingual dataset consisting of instruction-formatted data
designed for idiomatic expressions (IEs). We examined the
two tasks in a multilingual setting involving the
abovementioned languages, which were used as both
instruction and input languages, covering all possible
combinations. This fine-tuning provided some valuable insights.</p>
      <p>For the sentence disambiguation task, our
instructionbased approach yielded better F1 scores, compared to the
baseline results, which suggests that it aids the model in
distinguishing between idiomatic and literal meanings.
Nevertheless, after the fine-tuning, the models seemed
to favour English instructions across all input languages.
This might indicate that we can achieve satisfactory
results prompting models with English instructions [10],
and that we can limit instruction engineering to only one
language [38]. On the other hand, this can be
disadvantageous for other languages, potentially reducing model
performance and usability in multilingual contexts.</p>
      <p>For the idiom identification task, the model struggled
to correctly identify the idiom included in the sentence,
both before and after the fine-tuning. Our
instructionbased approach did not necessarily lead to a significantly
improved performance, and, in some cases, it produced
lower F1 scores. Unlike Task 1, Task 2 represents a far
more challenging task consisting in detecting IE at the
character level, which might explain such a poor
performance. Besides, the model did not exhibit a consistent
preference for one language and produced mixed results.</p>
      <p>Instruction fine-tuning might be beneficial for Task 1
but not necessarily for Task 2, and the instruction
language plays a crucial role in the model’s performance.</p>
      <p>However, further research is needed. From a
methodological perspective, we used a relatively small model,
and experiments with larger ones can be conducted.
Other LLMs beyond LLaMA could be fine-tuned as well,
not only to assess their performance but also to compare
encoder-based and encoder-decoder models on the same</p>
      <sec id="sec-3-1">
        <title>IE-related tasks. We did not implement hyperparame</title>
        <p>ter tuning and limited the fine-tuning to a small subset.</p>
        <p>Future research could explore optimised
hyperparameters to improve performance, as well as use a larger
dataset. Our study was also limited to three languages,
and the scope could be expanded to others, even from
different families, to gain a deeper understanding of
crosslinguistic interactions. Finally, a promising direction
would be the creation of datasets annotating
idiomaticity on a continuum rather than as a binary distinction,
aligning with more recent linguistic theories.
B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceed- //doi.org/10.1145/3366423.3380198. doi:10.1145/
ings of the 2020 Conference on Empirical Meth- 3366423.3380198.
ods in Natural Language Processing (EMNLP), As- [16] J. Zhou, Z. Zeng, H. Gong, S. Bhat, Idiomatic
sociation for Computational Linguistics, Online, Expression Paraphrasing without Strong
Supervi2020, pp. 4896–4907. URL: https://aclanthology.org/ sion, 2021. URL: https://arxiv.org/abs/2112.08592.
2020.emnlp-main.397/. doi:10.18653/v1/2020. arXiv:2112.08592.</p>
        <p>emnlp-main.397. [17] T. Chakrabarty, D. Ghosh, A. Poliak, S.
Mure[8] Z. Zeng, S. Bhat, Idiomatic expression identifica- san, Figurative language in recognizing
textion using semantic compatibility, Transactions tual entailment, in: C. Zong, F. Xia, W. Li,
of the Association for Computational Linguistics R. Navigli (Eds.), Findings of the Association for
9 (2021) 1546–1562. URL: https://aclanthology.org/ Computational Linguistics: ACL-IJCNLP 2021,
As2021.tacl-1.92/. doi:10.1162/tacl_a_00442. sociation for Computational Linguistics, Online,
[9] S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, 2021, pp. 3354–3361. URL: https://aclanthology.org/
J. Li, R. Hu, T. Zhang, F. Wu, G. Wang, Instruc- 2021.findings-acl.297/. doi: 10.18653/v1/2021.
tion Tuning for Large Language Models: A Sur- findings-acl.297.
vey, 2024. URL: https://arxiv.org/abs/2308.10792. [18] H. Jhamtani, V. Gangal, E. Hovy, T.
BergarXiv:2308.10792. Kirkpatrick, Investigating robustness of dialog
[10] N. Muennighof, T. Wang, L. Sutawika, A. Roberts, models to popular figurative language constructs,
S. Biderman, T. Le Scao, M. S. Bari, S. Shen, Z. X. in: M.-F. Moens, X. Huang, L. Specia, S. W.-t.
Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, Yih (Eds.), Proceedings of the 2021 Conference
K. Almubarak, S. Albanie, Z. Alyafeai, A. Web- on Empirical Methods in Natural Language
Proson, E. Raf, C. Rafel, Crosslingual generaliza- cessing, Association for Computational
Linguistion through multitask finetuning, in: A. Rogers, tics, Online and Punta Cana, Dominican Republic,
J. Boyd-Graber, N. Okazaki (Eds.), Proceedings 2021, pp. 7476–7485. URL: https://aclanthology.org/
of the 61st Annual Meeting of the Association 2021.emnlp-main.592/. doi:10.18653/v1/2021.
for Computational Linguistics (Volume 1: Long emnlp-main.592.</p>
        <p>Papers), Association for Computational Linguis- [19] M. Fadaee, A. Bisazza, C. Monz, Examining the
tics, Toronto, Canada, 2023, pp. 15991–16111. URL: tip of the iceberg: A data set for idiom translation,
https://aclanthology.org/2023.acl-long.891/. doi:10. in: N. Calzolari, K. Choukri, C. Cieri, T. Declerck,
18653/v1/2023.acl-long.891. S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J.
Mar[11] D. Phelps, T. Pickard, M. Mi, E. Gow-Smith, iani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis,
A. Villavicencio, Sign of the times: Evaluating the T. Tokunaga (Eds.), Proceedings of the Eleventh
Inuse of large language models for idiomaticity de- ternational Conference on Language Resources and
tection, 2024. URL: https://arxiv.org/abs/2405.09279. Evaluation (LREC 2018), European Language
RearXiv:2405.09279. sources Association (ELRA), Miyazaki, Japan, 2018.
[12] I. A. Sag, T. Baldwin, F. Bond, A. Copestake, URL: https://aclanthology.org/L18-1148/.</p>
        <p>D. Flickinger, Multiword expressions: A pain in [20] E. Liu, A. Chaudhary, G. Neubig, Crossing the
the neck for NLP, in: A. Gelbukh (Ed.), Compu- threshold: Idiomatic machine translation through
tational Linguistics and Intelligent Text Process- retrieval augmentation and loss weighting, in:
ing, Springer Berlin Heidelberg, Berlin, Heidelberg, H. Bouamor, J. Pino, K. Bali (Eds.),
Proceed2002, pp. 1–15. ings of the 2023 Conference on Empirical
Meth[13] A. Villavicencio, F. Bond, A. Korhonen, D. Mc- ods in Natural Language Processing, Association
Carthy, Editorial: Introduction to the special is- for Computational Linguistics, Singapore, 2023,
sue on multiword expressions: Having a crack pp. 15095–15111. URL: https://aclanthology.org/
at a hard nut, Comput. Speech Lang. 19 (2005) 2023.emnlp-main.933/. doi:10.18653/v1/2023.
365–377. URL: https://doi.org/10.1016/j.csl.2005.05. emnlp-main.933.</p>
        <p>001. doi:10.1016/j.csl.2005.05.001. [21] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
[14] T. Baldwin, S. N. Kim, Multiword Expressions, CRC Pre-training of deep bidirectional transformers for</p>
        <p>Press LLC, 2010, pp. 267–292. language understanding, in: J. Burstein, C.
Do[15] R. Biddle, A. Joshi, S. Liu, C. Paris, G. Xu, Lever- ran, T. Solorio (Eds.), Proceedings of the 2019
Conaging sentiment distributions to distinguish figu- ference of the North American Chapter of the
Asrative from literal health reports on twitter, in: sociation for Computational Linguistics: Human
Proceedings of The Web Conference 2020, WWW Language Technologies, Volume 1 (Long and Short
’20, Association for Computing Machinery, New Papers), Association for Computational
LinguisYork, NY, USA, 2020, p. 1217–1227. URL: https: tics, Minneapolis, Minnesota, 2019, pp. 4171–4186.
URL: https://aclanthology.org/N19-1423/. doi:10. pre-training for natural language generation,
trans18653/v1/N19-1423. lation, and comprehension, in: D. Jurafsky, J. Chai,
[22] N. Nandakumar, T. Baldwin, B. Salehi, How well do N. Schluter, J. Tetreault (Eds.), Proceedings of the
embedding models capture non-compositionality? 58th Annual Meeting of the Association for
ComA view from multiword expressions, in: A. Rogers, putational Linguistics, Association for
ComputaA. Drozd, A. Rumshisky, Y. Goldberg (Eds.), Pro- tional Linguistics, Online, 2020, pp. 7871–7880.
ceedings of the 3rd Workshop on Evaluating Vec- URL: https://aclanthology.org/2020.acl-main.703/.
tor Space Representations for NLP, Association doi:10.18653/v1/2020.acl-main.703.
for Computational Linguistics, Minneapolis, USA, [29] F. De Luca Fornaciari, B. Altuna, I. Gonzalez-Dios,
2019, pp. 27–34. URL: https://aclanthology.org/ M. Melero, A hard nut to crack: Idiom detection
W19-2004/. doi:10.18653/v1/W19-2004. with conversational large language models, in:
[23] M. Garcia, T. Kramer Vieira, C. Scarton, M. Idiart, D. Ghosh, S. Muresan, A. Feldman, T. Chakrabarty,
A. Villavicencio, Probing for idiomaticity in vector E. Liu (Eds.), Proceedings of the 4th Workshop
space models, in: P. Merlo, J. Tiedemann, R. Tsarfaty on Figurative Language Processing (FigLang 2024),
(Eds.), Proceedings of the 16th Conference of the Association for Computational Linguistics,
MexEuropean Chapter of the Association for Compu- ico City, Mexico (Hybrid), 2024, pp. 35–44. URL:
tational Linguistics: Main Volume, Association for https://aclanthology.org/2024.figlang-1.5/. doi: 10.
Computational Linguistics, Online, 2021, pp. 3551– 18653/v1/2024.figlang-1.5.
3564. URL: https://aclanthology.org/2021.eacl-main. [30] H. Schwenk, V. Chaudhary, S. Sun, H. Gong,
310/. doi:10.18653/v1/2021.eacl-main.310. F. Guzmán, Wikimatrix: Mining 135M parallel
[24] H. Tayyar Madabushi, E. Gow-Smith, M. Garcia, sentences in 1620 language pairs from Wikipedia,
C. Scarton, M. Idiart, A. Villavicencio, SemEval- Proceedings of the 16th Conference of the
Eu2022 Task 2: Multilingual Idiomaticity Detec- ropean Chapter of the Association for
Computation and Sentence Embedding, in: G. Emerson, tional Linguistics: Main Volume, Association for
N. Schluter, G. Stanovsky, R. Kumar, A. Palmer, Computational Linguistics., 2021, pp. 1351–1361.
N. Schneider, S. Singh, S. Ratan (Eds.), Proceed- doi:10.18653/v1/2021.eacl-main.115.
ings of the 16th International Workshop on Se- [31] L. Ramshaw, M. Marcus, Text chunking using
mantic Evaluation (SemEval-2022), Association transformation-based learning, in: Third
Workfor Computational Linguistics, Seattle, United shop on Very Large Corpora, 1995. URL: https:
States, 2022, pp. 107–121. URL: https://aclanthology. //aclanthology.org/W95-0107/.
org/2022.semeval-1.13/. doi:10.18653/v1/2022. [32] P. J. Ortiz Suárez, B. Sagot, L. Romary,
Asynsemeval-1.13. chronous pipelines for processing huge corpora
[25] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhut- on medium to low resource infrastructures,
Prodinov, Q. V. Le, XLNet: generalized autoregressive ceedings of the Workshop on Challenges in
pretraining for language understanding, Curran As- the Management of Large Corpora (CMLC-7)
sociates Inc., Red Hook, NY, USA, 2019. 2019. Cardif, 22nd July 2019, Leibniz-Institut
[26] U. Sentsova, D. Ciminari, J. V. Genabith, C. España- für Deutsche Sprache, Mannheim, 2019, pp. 9 –
Bonet, MultiCoPIE: A multilingual corpus of poten- 16. URL: http://nbn-resolving.de/urn:nbn:de:bsz:
tially idiomatic expressions for cross-lingual PIE dis- mh39-90215. doi:10.14618/ids-pub-9021.
ambiguation, in: A. K. Ojha, V. Giouli, V. B. Mititelu, [33] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li,
M. Constant, G. Korvel, A. S. Doğruöz, A. Rade- C. Guestrin, P. Liang, T. B. Hashimoto, Stanford
maker (Eds.), Proceedings of the 21st Workshop alpaca: An instruction-following LLaMA model,
on Multiword Expressions (MWE 2025), Associa- https://github.com/tatsu-lab/stanford_alpaca, 2023.
tion for Computational Linguistics, Albuquerque, [34] G. Da San Martino, A. Barrón-Cedeño,
New Mexico, U.S.A., 2025, pp. 67–81. URL: https: H. Wachsmuth, R. Petrov, P. Nakov, SemEval-2020
//aclanthology.org/2025.mwe-1.8/. Task 11: Detection of Propaganda Techniques in
[27] Z. Zeng, S. Bhat, Getting BART to ride the idiomatic News Articles, in: A. Herbelot, X. Zhu, A. Palmer,
train: Learning to represent idiomatic expres- N. Schneider, J. May, E. Shutova (Eds.), Proceedings
sions, Transactions of the Association for Computa- of the Fourteenth Workshop on Semantic
Evaluational Linguistics 10 (2022) 1120–1137. URL: https: tion, International Committee for Computational
//aclanthology.org/2022.tacl-1.65/. doi:10.1162/ Linguistics, Barcelona (online), 2020, pp. 1377–1414.
tacl_a_00510. URL: https://aclanthology.org/2020.semeval-1.186/.
[28] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, doi:10.18653/v1/2020.semeval-1.186.</p>
        <p>A. Mohamed, O. Levy, V. Stoyanov, L. Zettle- [35] T. Dettmers, A. Pagnoni, A. Holtzman, L.
Zettlemoyer, BART: Denoising sequence-to-sequence moyer, QLoRA: Eficient finetuning of quantized
Declaration on Generative AI</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fraser</surname>
          </string-name>
          ,
          <article-title>Idioms within a Transformational Grammar, Foundations of Language 6 (</article-title>
          <year>1970</year>
          )
          <fpage>22</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Chomsky</surname>
          </string-name>
          , Rules and Representations,
          <source>Behavioral and Brain Sciences</source>
          <volume>3</volume>
          (
          <year>1980</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          . 1017/s0140525x00001515.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Haagsma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, MAGPIE: A large corpus of potentially idiomatic expressions</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>287</lpage>
          . URL: https:// aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .35/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wulf</surname>
          </string-name>
          , Rethinking Idiomaticity:
          <article-title>A Usage-based Approach</article-title>
          , Research in Corpus and Discourse, Continuum, London and New York,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tayyar Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gow-Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio, AStitchInLanguageModels: Dataset and
          <article-title>Methods for the Exploration of Idiomaticity in Pre-Trained Language Models</article-title>
          , in: M.
          <article-title>-</article-title>
          <string-name>
            <surname>F. Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W.-t. Yih (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>3464</fpage>
          -
          <lpage>3477</lpage>
          . URL: https://aclanthology. org/
          <year>2021</year>
          .findings-emnlp.
          <volume>294</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .findings-emnlp.
          <volume>294</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tedeschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <source>ID10M: Idiom Identification in 10 Languages</source>
          , in: M.
          <string-name>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          <string-name>
            <surname>Meza Ruiz</surname>
          </string-name>
          (Eds.),
          <source>Findings of the Association for Computational Linguistics: NAACL</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>2715</fpage>
          -
          <lpage>2726</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-naacl.
          <volume>208</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          . findings-naacl.
          <volume>208</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ettinger</surname>
          </string-name>
          , Assessing Phrasal Representation and Composition in Transformers, in: LLMs,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.14314. A.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
          </string-name>
          , et al.,
          <source>The Llama 3 herd of models, arXiv:2305.14314</source>
          .
          <string-name>
            <surname>arXiv</surname>
          </string-name>
          (Cornell University) (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <year>arxiv</year>
          .
          <volume>2407</volume>
          .21783.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Chen, LoRA: Low-Rank [38]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          , Adaptation of Large Language Models, CoRR
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          , abs/2106.09685 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/ Y. Chen,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <volume>2106</volume>
          .09685. arXiv:
          <volume>2106</volume>
          .09685.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          , A survey of large
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            . Al- language
            <surname>models</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/ Dahle,
          <string-name>
            <given-names>A.</given-names>
            <surname>Letman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schelten</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Yang,
          <volume>2303</volume>
          .18223. arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>