<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Heredia);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Detection in Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maite Heredia</string-name>
          <email>maite.heredia@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeremy Barnes</string-name>
          <email>jeremy.barnes@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aitor Soroa</string-name>
          <email>asoroa@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HiTZ Center - Ixa, University of the Basque Country UPV/EHU</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We present our submission to the ADoBo 2025 Shared Task, part of the IberLEF shared evaluation campaign. The task focuses on detecting anglicisms in Spanish newswire texts. Our approach leverages the instruction-tuned language modelLlama 3.3 70B to identify spans containing anglicisms. To address certain shortcomings observed in the model's behavior, we experiment with zero- and few-shot strategies and explore the integration of additional model-based modules. However, the best performing system on the test set is a 5-shot model without auxiliary modules. We conclude with an analysis of the strengths and limitations of using large language models for anglicism detection.</p>
      </abstract>
      <kwd-group>
        <kwd>linguistic borrowing</kwd>
        <kwd>loanwords</kwd>
        <kwd>anglicisms</kwd>
        <kwd>loanword detection</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
Workshop
ISSN1613-0073
42
(2.1%) 214
(10.8%)
(21430.1%)
Categories
0 Anglicisms
1 Anglicism
&gt;1 Anglicisms
1 Anglicism
2 Anglicisms
1728
(87.1%)</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The task of borrowing detection has been previously studied in several languages, e.g., German9][ and
Norwegian [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Regarding borrowing detection in Spanish, the previous edition of this shared task,
held in 2021, marked an important step toward borrowing detection in Spanish newswire texts. That
edition focused particularly on borrowing detection with an emphasis on anglicisms1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition,
there is other research that deals with the study of borrowings and anglicisms in Spanish from diferent
perspectives [12].
      </p>
      <p>Beyond monolingual studies, there has also been significant work on multilingual borrowing and
anglicism detection. Recent approaches include those by Nath et al[1.3] and Miller and List[14], which
address the problem from a cross-linguistic perspective, highlighting the importance of generalized
models for lexical borrowing detection.</p>
      <p>
        Loanword detection is also relevant to sociolinguistic research. It provides valuable insights into
language contact phenomena, lexical change, and linguistic influence1[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>The proposed task focuses on the detection of unassimilated anglicisms in Spanish newswire texts. In the
annotation guidelines, linguistic borrowings are defined as “the incorporation of single lexical units from
one language (the donor language) into another language (the recipient language) usually accompanied by
morphological and phonological modification to conform with the patterns of the recipient language” . In
particular, for this edition of the shared task, the focus is onunassimilated anglicisms in Spanish, i.e.,
words from English origin that have not been assimilated ortographically nor morphologically into
Spanish.</p>
      <p>The task is framed as sequence labelling, and the output must contain the span(s) of text detected
for each instance, instead of a classic BIO approach where there must be one tag per token, although
both formats are easily compatible. There is no training set provided as part of the shared task, only a
development and a test set, which was blind for the duration of the shared task. Nonetheless, participants
are allowed and encouraged to use any previously available datasets, such as the COALAS data1s6e]t [
which was provided as part of the previous edition of the shared task.</p>
      <sec id="sec-3-1">
        <title>3.1. Development and Test Data</title>
        <p>The development and test sets that have been provided as part of the shared task contain 1984 and 1836
total instances, respectively. Each instance is composed of a sentence and up to 5 annotated anglicisms,
in a CSV format.</p>
        <p>Figure 1 shows the distribution of anglicisms per sentence in both splits. As shown in the figure,
the distribution of anglicisms difers significantly between the two sets. In the development set, the
majority of instances (87.1%) do not contain any anglicisms. Among the remaining instances, most of
them contain only 1 anglicism (10.8%), and only a small proportion (2.1%) have 2, 3 or 5 anglicisms. In
contrast, all instances of the test set contain at least one anglicism, and 13.1% instances contain two. The
development set includes a total of 304 anglicisms, comprising 269 unique forms. The test set includes
2,076 anglicisms across 373 unique ones. Only eight anglicisms are shared between the development
and test sets.</p>
        <p>Evaluation is computed using standard metrics: precision, recall and F1-score as the harmonic mean
of both. For scoring purposes, diferences in casing and the presence or absence of quotation marks are
ignored.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <p>In this section we present the diferent models and configurations that we fine-tune or prompt for the
task of anglicism detection. When our experiments involve fine-tuning, we use the COALAS dataset
[16] train split as training data. We test all models and settings with the development set provided for
the shared task.</p>
      <sec id="sec-4-1">
        <title>4.1. Encoder-Only Models</title>
        <p>Given that the task can be formulated as a sequence labeling problem, it is appropriate to test the
capabilities of encoder-only architectures as baselines. For this purpose, we have fine-tuned 5
encoderonly models: ModernBERT [17] and BETO [18], which are monolingual models in English and Spanish
respectively; IXAmBERT [19], a multilingual model that focuses on Spanish, English and Basque; and
XLM-RoBERTa large 2[0] and mdeBERTa v3 [21], massively multilingual state-of-the-art encoders. All
models have been fine-tuned for five epochs with a default learning rate of  = 2−5 , and a batch size
of 32.</p>
        <p>These models are not part of our final submission, because they were not the focus of our
experimentation. We have not performed an exhaustive hyperparameter tuning, nor are their results on the
development set on par with those of the decoder-only models. Nonetheless, it can still be insightful to
observe the performance of smaller models that are much faster to deploy and require less computational
resources.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Decoder-Only Models</title>
        <p>
          We leverage the modelLlama 3.3 70B [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] using constrained decoding [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for prompting, implemented
using the vLLM library for LLM inference [22]. We constrain the output to follow a JSON structure,
where the model can only fill the fields “text”, “start” and “end”. For instance, given the input sentence:
Receta para preparar una carrot cake vegan friendly, the output should look like this:
{"anglicisms": [{"text": "carrot cake", "start": 26, "end": 36}, {"text": "vegan
friendly", "start": 38, "end": 51}]}
{"NER": [{"text": "Google", "start": 26, "end": 36}, {"text": "Microsoft", "start": 38,
"end": 51}]}
        </p>
        <p>As per parameter- and prompt-tuning, the following variables have been tested against each other in
diferent settings:
• Prompt: We have tried diferent levels of informativeness for the prompt, from a naive simple
approach where the model is only asked to retrieve unassimilated anglicisms to the final prompt
shown in Figure 2, which features a summary of the guidelines used to annotate the corpus and
thus allows for a more accurate and task-appropriate detection.</p>
        <p>Actúa como un experto lingüista especializado en detección de préstamos lingüísticos. Tu tarea es analizar un
fragmento de texto en español y etiquetar todos los anglicismos no asimilados, según siguientes reglas: Solo marcas
préstamos recientes del inglés que no hayan sido adaptados ortográfica ni morfológicamente al español (por ejemplo:
smartphone, influencer, look, reality show, hype). Ignora los préstamos ya adaptados como tuit, líder, fútbol, espoiler,
incluso si provienen del inglés. Excluye nombres propios, marcas, lugares, instituciones, fechas, eventos, hashtags,
acrónimos y citas literales. Incluye expresiones multi-palabra si son préstamos completos como reality show, total
look o tech bro. Si la palabra aparece en el Diccionario de la lengua española (DLE) sin comillas ni cursiva y con ese
significado, no debe etiquetarse. No etiquetes calcos, traducciones literales ni palabras derivadas de raíces inglesas
pero que siguen patrones del español como hacktivista, randomizar o shakespeariano. Etiqueta pseudoanglicismos
como footing, balconing.</p>
        <p>(a) Prompt in Spanish
(b) Prompt in English
Act as an expert linguist specializing in loanword detection. Your task is to analyze a fragment of text in Spanish
and tag all unassimilated anglicisms, according to the following rules: Only tag recent English loanwords that have
not been orthographically or morphologically adapted to Spanish (for example: smartphone, influencer, look, reality
show, hype). Ignore already adapted loanwords such as tuit, líder, fútbol, espoiler, even if they come from English.
Exclude proper nouns, brands, places, institutions, dates, events, hashtags, acronyms, and literal quotes. Include
multi-word expressions if they are complete loanwords such as reality show, total look, or tech bro. If the word
appears in the Diccionario de la Lengua Española (DLE) without quotation marks or italics and with that meaning, it
should not be tagged. Don’t tag calques, literal translations, or words derived from English roots that follow Spanish
patterns, such as hacktivist, randomize, or Shakespearean. Tag pseudo-Anglicisms like footing and balconing.</p>
        <p>• Examples: A zero-shot and 5-shot approach have been tested. The 5 examples have been manually
selected to be representative of some common errors of the model on the zero-shot setting
(detecting named entities, slogans or acronyms). Although they do not avoid these errors completely,
the 5-shot strategy obtains better results.
• Language: We have tried to prompt the model in both English and Spanish, with the latter
obtaining better results.
• Temperature: We have run the inference with temperature values of0, 0.5 and 1. The value that
yields the best results is0.5.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Detection Module</title>
          <p>The initial results show a reasonably high recall but a very low precision, indicating that the model
is generating a large number of false positives. A manual inspection of its outputs confirms this
trend: the model frequently overgenerates, attempting to identify at least one anglicism per sentence.
This behavior appears to stem from the model’s limited abstention capabilities7[], which prevents it
from refraining from making a prediction when uncertain. As a result, the model’s performance is
significantly impacted, especially given the distribution of the development set described in Section
3.1. In many cases, it even misclassifies clearly Spanish words in sentences that are entirely in Spanish.
Although we experimented with prompt-based strategies to mitigate this behaviour, they led to only
marginal improvements.</p>
          <p>To address this issue, we introduce a previous module to the inference step that performs a preliminary
binary classification to determine whether a sentence contains any anglicisms. For this task, we
finetuned several discriminative models on the COALAS training set, adapting the original labels to a binary
format (0 for no anglicisms, 1 for presence of anglicisms). Among the models evaluated, which are
the same in Section 4.1) mDeBERTa achieved the best performance, with an F1-score of 0.99 on the
development set.
ModernBERT</p>
          <p>Beto</p>
          <p>IXAmBERT
XLM-RoBERTa
mDeBERTa</p>
          <p>We integrate this binary classifier into our pipeline by first filtering sentences based on its predictions.
Only those classified as containing anglicisms are passed to the LLM for fine-grained identification.
This two-stage approach significantly improves precision (cf. Section 5.1) and also reduces inference
time on the development set.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. NER Module</title>
          <p>Similarly to the previous strategy, we experimented with a pipeline that begins by detecting Named
Entities, as we observed that the model frequently misclassifies them as anglicisms—even when explicitly
instructed not to. Initially, we used a NER model8[] to identify and exclude Named Entities from the list
of potential anglicisms. However, we ultimately opted to prompt the model to identify Named Entities
directly as part of the anglicism detection task, as this approach yielded more accurate results.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Final Results</title>
      <p>In this section, we report the results for the development and test sets provided for the shared task.
We evaluate the encoder-only models and 4 diferent settings of the decoder-only models using the
development set. Based on the results of the experiments, we submit 3 runs in total: (1) a few-shot
decoder-only model, (2) a few-shot model with a detection module and (3) a few-shot model prompted
to also detect named entities, and report the results of these 3 runs on the test set. Likely due to a
diference in distribution between both splits, the performance of the models and the model ranking
change drastically from one set to the other. For this reason, we first report the development set results,
as they have guided some decisions taken for the experiments, and the results of the final submission
on the test set.</p>
      <sec id="sec-5-1">
        <title>5.1. Development Set</title>
        <p>The results of the discriminative models on the development set can be seen in Table1. The top-3
models are the multilingual models, and both monolingual models perform notably worse, suggesting
that having knowledge of both Spanish and English is essential to be able to detect anglicisms in Spanish.
The model that performs best for all metrics is mDeBERTa, even if it is not as large in size as
XLMRoBERTa, suggesting the importance of the pre-training architecture of the models for downstream-task
performance.</p>
        <p>The results of the diferent experiments performed with the Llama 3.3 70B model on the development
set are presented in Table2. These results highlight the importance of the detection module when there
is a high proportion of sentences that do not contain any anglicisms. This module avoids over-detection,
which is reflected in the precision. Enriching the prompt with 5-shot and NER both improve the results
of the models, suggesting that prompt-tuning has a notable impact in the performance of the model.</p>
        <p>The results of the best encoder-only model are not on par of those of the best decoder-only based
pipeline, but they are more balanced than those of a 5-shot model with no additional modules, as well
as faster to deploy.
Zero Shot</p>
        <p>5-shot
5-shot + Detection
5-shot + NER</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Test Set</title>
        <p>We have submitted a total of three systems for the task, whose results on the test set can be seen in
Table3. The best performing system is the 5-shot prompted model, without any of the modules. In
both cases, adding the modules greatly decreases the recall.</p>
        <p>It is clear that there is a large diference in performance between the development and test sets, which
we hypothesize is due to the diferent distribution of both sets, which is likely why the recall is much
lower with modules aimed at improving precision in the development set.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Error Analysis</title>
        <p>The best-performing configuration on the test set—a 5-shot model without additional modules—achieves
an F1-score of 93.33, with precision and recall at comparable levels. This indicates a balanced rate of
false positives (non-anglicisms incorrectly identified as anglicisms) and false negatives (anglicisms that
go undetected).</p>
        <p>The test set includes multiple instances of the same anglicisms presented with variations in casing
and quotation marks, likely to assess whether models rely on these formatting cues for detection. A
manual analysis of the system’s errors reveals that its misclassifications are consistent across diferent
formats, suggesting that it does not rely on superficial format-based heuristics. Instead, the errors
appear to stem from a conceptual misinterpretation of what constitutes a borrowing. Common mistakes
include mislabeling named entities resembling English expressions (e.g.B, ig Little Lies or Prision Break)
and incorrectly handling composition of anglicism phrases such alsook and total black, that are treated
as multiple anglicisms due to their syntactic integration into Spanish. These are often identified as a
single span by the model but are annotated as separate spans in the gold standard, negatively impacting
both precision and recall.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion &amp; Discussion</title>
      <p>In this paper, we report our experiments and submissions for the second edition of the ADoBo Shared
Task, as part of the IberLEF 2025 evaluation campaign. The task at hand consists of unassimilated
anglicism detection in Spanish newswire texts. We have based our contributions on the exploit of
LLMs’ capabilities and implicit knowledge aided with smaller models to make its results more robust.
Although our best performing approach has consisted on 5-shot prompting, where the only tuning has
been performed on the prompt for it to be as informative and rigorous as possible, it is still likely that
the other approaches and findings that we have presented, namely, the use of smaller encoder-only
models as a pre-classification step, can be useful for other corpora with diferent distributions, as proven
with the evaluation performed on the development set. What is more, a few-shot prompted model has
the advantage of avoiding overfitting on a training set or learning artifacts for classification, such as
casing or quotation marks, as we show in the error analysis, which is sure to improve the results in
unseen distributions.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work is supported by the European Union under Horizon Europe (Project LUMINOUS, grant
number 101135724) and by the Basque Government (IXA excellence research group IT1570-22). Maite
Heredia is supported by the UPV/EHU PIF23/218 predoctoral grant.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. After using these tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[11] E. Álvarez Mellado, Extracting English lexical borrowings from Spanish newswire, in: A. Ettinger,
E. Pavlick, B. Prickett (Eds.), Proceedings of the Society for Computation in Linguistics 2021,
Association for Computational Linguistics, Online, 2021, pp. 384–386. URLh:ttps://aclanthology.
org/2021.scil-1.40/.
[12] E. Alvarez-Mellado, C. Lignos, Borrowing or codeswitching? annotating for finer-grained
distinctions in language mixing, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck,
S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, S. Piperidis (Eds.), Proceedings of
the Thirteenth Language Resources and Evaluation Conference, European Language Resources
Association, Marseille, France, 2022, pp. 3195–3201. URLh:ttps://aclanthology.org/2022.lrec-1.342./
[13] A. Nath, S. Mahdipour Saravani, I. Khebour, S. Mannan, Z. Li, N. Krishnaswamy, A generalized
method for automated multilingual loanword detection, in: N. Calzolari, C.-R. Huang, H. Kim,
J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi,
P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, S.-H. Na (Eds.), Proceedings
of the 29th International Conference on Computational Linguistics, International Committee
on Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 4996–5013. URL:https:
//aclanthology.org/2022.coling-1.442./
[14] J. E. Miller, J.-M. List, Detecting lexical borrowings from dominant languages in multilingual
wordlists, in: A. Vlachos, I. Augenstein (Eds.), Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics, Association for Computational
Linguistics, Dubrovnik, Croatia, 2023, pp. 2599–2605. URL:https://aclanthology.org/2023.eacl-main.190./
doi:10.18653/v1/2023.eacl-main.190.
[15] I. Stewart, D. Yang, J. Eisenstein, Tuiteamos o pongamos un tuit? investigating the social constraints
of loanword integration in Spanish social media, in: A. Ettinger, E. Pavlick, B. Prickett (Eds.),
Proceedings of the Society for Computation in Linguistics 2021, Association for Computational
Linguistics, Online, 2021, pp. 286–297. URL:https://aclanthology.org/2021.scil-1.26./
[16] E. Álvarez-Mellado, C. Lignos, Detecting unassimilated borrowings in Spanish: An annotated
corpus and approaches to modeling, in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings
of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 3868–3888. URL:
https://aclanthology.org/2022.acl-long.26.8d/oi:10.18653/v1/2022.acl-long.268.
[17] B. Warner, A. Chafin, B. Clavié, O. Weller, O. Hallström, S. Taghadouini, A. Gallagher, R. Biswas,
F. Ladhak, T. Aarsen, N. Cooper, G. Adams, J. Howard, I. Poli, Smarter, better, faster, longer: A
modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference,
2024. URL: https://arxiv.org/abs/2412.13663. arXiv:2412.13663.
[18] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
evaluation data, in: PML4DC at ICLR 2020, 2020.
[19] A. Otegi, A. Agirre, J. A. Campos, A. Soroa, E. Agirre, Conversational question answering in low
resource scenarios: A dataset and case study for basque, in: Proceedings of The 12th Language
Resources and Evaluation Conference, 2020, pp. 436–442.
[20] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, CoRR
abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116. arXiv:1911.02116.
[21] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with
gradient-disentangled embedding sharing, 2021.arXiv:2111.09543.
[22] W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, I. Stoica, Eficient
memory management for large language model serving with pagedattention, in: Proceedings of
the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poplack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sankof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>The social correlates and linguistic processes of lexical borrowing and assimilation</article-title>
          ,
          <source>Linguistics</source>
          <volume>26</volume>
          (
          <year>1988</year>
          )
          <fpage>47</fpage>
          -
          <lpage>104</lpage>
          . URL:https://doi.org/10.1515/ling.
          <year>1988</year>
          .
          <volume>26</volume>
          .1.47. doi:doi:10.1515/ling.
          <year>1988</year>
          .
          <volume>26</volume>
          .1.47.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Furiassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pulcini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. R.</given-names>
            <surname>González</surname>
          </string-name>
          (Eds.),
          <source>The Anglicization of European Lexis</source>
          , John Benjamins Publishing Company, Netherlands,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Álvarez-Mellado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Porta-Zamorano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lignos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , Overview of ADoBo at IberLEF 2025:
          <article-title>Automatic Detection of Anglicisms in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Letman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schelten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models, arXiv preprint arXiv:2407.21783</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Willard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <article-title>Eficient guided generation for large language models</article-title>
          ,
          <year>2023</year>
          . URL:https: //arxiv.org/abs/2307.09702. arXiv:
          <volume>2307</volume>
          .
          <fpage>09702</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Madhusudhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Madhusudhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hashemi</surname>
          </string-name>
          ,
          <article-title>Do LLMs know when to NOT answer? investigating abstention abilities of large language models</article-title>
          , in: O.
          <string-name>
            <surname>Rambow</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Apidianaki</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Al-Khalifa</surname>
            ,
            <given-names>B. D.</given-names>
          </string-name>
          <string-name>
            <surname>Eugenio</surname>
          </string-name>
          , S. Schockaert (Eds.),
          <source>Proceedings of the 31st International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Abu Dhabi,
          <string-name>
            <surname>UAE</surname>
          </string-name>
          ,
          <year>2025</year>
          , pp.
          <fpage>9329</fpage>
          -
          <lpage>9345</lpage>
          . URL: https://aclanthology.org/
          <year>2025</year>
          .coling-main.
          <volume>62</volume>
          <fpage>7</fpage>
          ./
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>U.</given-names>
            <surname>Zaratiana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holat</surname>
          </string-name>
          , T. Charnois,
          <article-title>GLiNER: Generalist model for named entity recognition using bidirectional transformer</article-title>
          , in: K. Duh,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , S. Bethard (Eds.),
          <source>Proceedings of the</source>
          <year>2024</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>5364</fpage>
          -
          <lpage>5376</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>30</volume>
          .0/ doi:10.18653/v1/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Leidig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schlippe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schultz</surname>
          </string-name>
          ,
          <article-title>Automatic detection of anglicisms for the pronunciation dictionary generation: A case study on our german it corpus</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <article-title>Semi-automatic approaches to anglicism detection in norwegian corpus data</article-title>
          , in: C.
          <string-name>
            <surname>Furiassi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Pulcini</surname>
            ,
            <given-names>F. R.</given-names>
          </string-name>
          <string-name>
            <surname>González</surname>
          </string-name>
          (Eds.),
          <source>The Anglicization of European Lexis</source>
          , John Benjamins,
          <year>2012</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>