<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Low-Resource Spanish Clinical Encoders: Architectures, Adaptation, and Evaluation under Computational Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guillem G. Subies</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The rise of large-scale language models (LLMs) has revolutionized Natural Language Processing (NLP), but their high computational demands have created significant barriers to entry for researchers operating under limited hardware budgets. This disparity is especially pronounced in specialized domains such as clinical NLP, where language, data, and resource limitations intersect. In this work, we propose a resource-eficient methodology for designing, training, and evaluating compact Spanish clinical encoder models that leverage recent architectural advances, parameter-eficient fine-tuning strategies, and domain-specific adaptation. We present a multi-stage approach that prioritizes reproducibility, open-source compatibility, and computational eficiency. Our models are developed using ClinText-SP, the largest available corpus of Spanish clinical texts, and evaluated against both general-purpose LLMs and specialized encoders across key clinical NLP tasks. The aim is to show that with careful design and judicious use of compute, low-resource encoder models can match or exceed the performance of larger systems, thereby enabling equitable access to domain-specific NLP technologies in under-resourced settings. This thesis contributes both practical tools and critical insights to the evolving field of low-resource clinical NLP in Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Clinical NLP</kwd>
        <kwd>Spanish Language Models</kwd>
        <kwd>Low-Resource Learning</kwd>
        <kwd>Encoder Architectures</kwd>
        <kwd>Domain Adaptation</kwd>
        <kwd>Parameter-Eficient Fine-Tuning</kwd>
        <kwd>Open Science</kwd>
        <kwd>Benchmarking</kwd>
        <kwd>Transformer Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid advancement of Natural Language Processing (NLP) over the past decade has been driven
by two synergistic developments. First, the introduction of the Transformer architecture by Vaswani
et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and its encoder–decoder variants, such as BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and GPT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], revolutionized sequence
modeling by enabling scalable attention mechanisms. Second, the exponential growth in
computational power—largely enabled by NVIDIA’s CUDA platform [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]—provided the hardware foundation for
training ever-larger models. Together, these innovations precipitated a paradigm shift in AI, yielding
breakthroughs not only in NLP but also in related domains such as speech recognition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and computer
vision [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>
        Despite these achievements, the growing reliance on large, resource-intensive generative models has
marginalized the development and study of more compact encoder-only architectures. Training and
ifne-tuning giant models require high-end GPUs and extensive budgets, putting them out of reach for
many research institutions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This resource gap is particularly problematic for specialized domains,
where data is scarce and domain-specific performance is critical. In this work, we propose to bridge this
gap by adapting recent architectural and methodological advances from large language models (LLMs)
to the design of eficient, task-specific encoder models. Specifically, the thesis will focus on Spanish
clinical encoder models.
      </p>
      <p>
        Clinical NLP plays a vital role in improving patient care and facilitating medical research by extracting
structured information from unstructured clinical texts. Prior studies have demonstrated that
domainadapted models, such as ClinicalBERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], significantly outperform general-purpose counterparts on
clinical information extraction and decision support tasks. Moreover, addressing underrepresented
languages like Spanish is crucial to ensuring equitable access to NLP technologies, as highlighted by the
scarcity of high-quality Spanish clinical corpora and models [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. By focusing on Spanish clinical
encoder models, our research aims to reduce language and resource barriers in clinical NLP applications.
      </p>
      <p>In this thesis, we want to describe a methodology for developing low-resource Spanish clinical
encoders by leveraging transfer learning from multilingual and general-domain LLMs, employing
parameter-eficient fine-tuning techniques, and designing targeted pretraining objectives. We will
evaluate our models on a suite of clinical NLP benchmarks. Furthermore, we identify key challenges
and open issues for discussion at the Doctoral Symposium, such as the trade-ofs between model size,
eficiency, and domain specificity.</p>
      <p>The remainder of this paper is organized as follows: Section 2 reviews related work in domain-specific
and low-resource NLP; Section 3 outlines our thesis proposal and its main hypothesis; Section 4 presents
experimental methodology that will be followed and Section 5 highlights the open issues and concludes
the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and related work</title>
      <p>
        Our preliminary Survey [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] highlighted a marked performance gap between general-domain Spanish
and multilingual encoder models versus those trained on clinical Spanish texts. State-of-the-art general
Spanish models (RigoBERTa [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], BETO [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and MarIA [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]) and multilingual encoders like DeBERTa
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and XLM-RoBERTa [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] consistently outperform Spanish Clinical Models such as Galén [17] and
bsc-bio-ehr [18]. We attribute this disparity to the scarcity of in-domain data and the challenges of
training from scratch under low-resource conditions. A similar—and also conunterintuitive—efect can
be observed also in decoder LLMs [19].
      </p>
      <p>To address these issues, we gathered ClinText-SP1 and adopted RigoBERTa as our base to perform
domain adaptation, resulting in RigoBERTa Clinical2 [20]. ClinText-SP is the largest publicly available
Spanish clinical corpus, aggregating texts from medical journals, shared-task annotations, and
supplementary sources (e.g., Wikipedia, medical textbooks). It comprises 35,996 documents (average length:
~700 tokens) and totals 25.62 million tokens. The corpus balances long, structured clinical case reports
with shorter schematic ones, making it suitable for a range of clinical NLP tasks.</p>
      <p>Beyond corpus curation and domain-adaptive pretraining, recent architectural and methodological
innovations ofer further avenues to enhance clinical encoder models. FlashAttention [ 21] enables
exact, memory-eficient computation of scaled dot-product attention, making it particularly suitable for
long-context processing. ModernBERT [22] introduces optimizations that improve training stability
and performance over traditional BERT-style encoders. GLiNER [23] presents a generalist named entity
recognizer that performs well across domains without domain-specific tuning. Parameter-eficient
finetuning approaches [24], such as adapter layers, prefix tuning, and in particular Low-Rank Adaptation
(LoRA) [25], have demonstrated significant performance gains while drastically reducing the number
of trainable parameters. In addition, model compression techniques like knowledge distillation and
quantization [26] enable the deployment of lightweight models that retain competitive performance,
which is especially important in clinical environments with constrained computational resources.
Posttraining quantization to 8-bit representations maintains accuracy with minimal overhead [27], whereas
quantization-aware training at ternary or binary precision—as in BitNet [28]—achieves competitive
results by training low-precision weights from scratch. Finally, Gemma Encoder [29] shows how
to systematically convert a decoder-only model into an encoder through architectural tweaks and
re-training regimes, which could yield extremely powerful encoder models without the hurdle of
pre-training them.</p>
      <p>These developments will guide future extensions of our Spanish clinical encoder suite, balancing
eficiency, domain specificity, and real-world applicability.
1https://huggingface.co/datasets/IIC/ClinText-SP
2https://huggingface.co/IIC/RigoBERTa-Clinical</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposal and Main Hypotheses</title>
      <p>Building on the observations outlined in Section 2, the central hypothesis of this thesis is:
H1 Compact, task-specific Spanish clinical encoder models—carefully engineered with recent architectural
and tuning advances—can match or exceed the performance of large, general-purpose LLMs on
supervised clinical NLP tasks, while greatly reducing computational cost and environmental impact.</p>
      <sec id="sec-3-1">
        <title>This overarching hypothesis decomposes into four supporting hypotheses:</title>
        <p>H2 Eficiency Hypothesis. For supervised, domain-specific tasks, lightweight encoder-only models
with parameter-eficient fine -tuning deliver comparable performance to full fine -tuning of large
LLMs, at a fraction of the compute and energy requirements.</p>
        <p>H3 Domain Adaptation Hypothesis. Integrating decoder architectural innovations into
encoder-only pretraining yields significantly improved representations for clinical Spanish text,
narrowing the gap with models trained on massive corpora.</p>
        <p>H4 Generative versus Discriminative Hypothesis. While generative LLMs excel at open-ended
text generation, specialized encoder models can outperform them on focused extraction and
classification tasks in the clinical domain.</p>
        <p>H5 Language Equity Hypothesis. Targeted domain adaptation and low-resource strategies can
mitigate the disparity between Spanish clinical models and their English counterparts, providing
high-quality tools for Spanish-speaking clinical NLP communities.</p>
      </sec>
      <sec id="sec-3-2">
        <title>To validate these hypotheses, the thesis will pursue the following objectives:</title>
        <p>O1 Model Development. Design and implement a family of Spanish clinical encoder models that
incorporate (i) domain-adaptive pretraining on ClinText-SP, (ii) memory-eficient attention and
encoder optimizations, and (iii) parameter-eficient fine-tuning or compression techniques.
O2 Rigorous Evaluation. Benchmark the proposed models against state-of-the-art Spanish and
multilingual clinical encoders, as well as general-purpose LLMs, across standard clinical NLP
tasks (e.g., named entity recognition, relation extraction, document classification).</p>
        <p>O3 Practical Validation. Demonstrate real-world utility by integrating the best-performing model
into at least two clinical use cases to quantify improvements in accuracy, latency, and resource
consumption compared to LLM-based solutions.</p>
        <p>O4 Reproducibility and Open Science. Release all code, trained checkpoints, and evaluation scripts
under an open-source license to foster transparency, community adoption, and further research
in low-resource clinical NLP.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology and Proposed Experiments</title>
      <p>Developing high-performance Spanish clinical encoder models under strict hardware constraints
demands a rigorous, systematic methodology. In our case, “GPU poverty” is not merely a figure
of speech but a lived reality: training large models on limited memory and compute forces careful
budgeting of every GPU-hour and careful selection of experiments that yield maximal insight for
minimal cost.</p>
      <p>Concretely, the computational resources available for this thesis are limited to a small shared pool
consisting of one NVIDIA A100 80GB GPU and two NVIDIA RTX 3090 GPUs. The total number of
GPU-hours remains uncertain and will vary over time, but individual experiments must be designed
to complete within a few hours each. This restricts the size and complexity of the models we can
realistically train and evaluate. While 8B-parameter models are at the upper limit of what we can
handle, our goal is to obtain strong performance from models with fewer than 1B parameters—an
achievable target given that many high-quality encoder models tend to be smaller than their generative
counterparts.</p>
      <p>Balancing ambition with feasibility, our methodology emphasizes open, eficient, and well-validated
techniques at each step, focusing on the maximum return per unit of computation.</p>
      <sec id="sec-4-1">
        <title>4.1. Challenge of Limited Computing Resources</title>
        <p>Life on a GPU-poor budget entails long queued jobs, frequent checkpoint pruning, and constant
tradeofs between batch size, sequence length, and model complexity. With our computational budget, we
cannot indiscriminately pretrain or fine-tune dozens of large variants. Instead, we must:
• Prioritize eficiency: Favor models and techniques explicitly designed for memory-eficient
attention or parameter-eficient tuning.
• Exploit transfer learning: Leverage strong multilingual or general-domain checkpoints (e.g.,</p>
        <p>RigoBERTa) as a starting point, reducing the need for full pretraining.
• Optimize hyperparameters conservatively: Use small-scale pilot runs to identify promising
configurations before scaling up.</p>
        <p>Acknowledging this constraint not only shapes our experimental choices but also reflects the
realworld conditions of many academic and clinical research labs.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Candidate Feature Selection Pipeline</title>
        <p>To distinguish genuine advancements from “shiny object syndrome,” we will implement a multi-stage
ifltering process:
1. Literature and code review: Survey recent encoder and encoder–decoder innovations (see</p>
        <p>Section 2), cataloging techniques that claim state-of-the-art gains.
2. Open-source viability check: Verify that each candidate is available under a suitable open
license and has an actively maintained implementation (e.g., GitHub repo with recent commits
and community adoption).
3. Hardware footprint assessment: Estimate memory and compute requirements, discarding any
feature whose resource demand exceeds our hardware budget.
4. Empirical sanity check: For borderline cases, inspect small-scale reproducibility reports or
replicate a quick experiment on a toy dataset to confirm baseline eficacy.</p>
        <p>This pipeline ensures we only invest scarce GPU cycles in approaches that are both credible and
implementable.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Ablation Study and Incremental Validation</title>
        <p>Once a shortlist of viable features is assembled, we perform targeted ablation experiments similar to
the onces performed for the training of RigoBERTa Clinical [20]:
• Controlled pilot runs: Incorporate each individual feature into the RigoBERTa Clinical baseline
and fine-tune on a representative clinical task.
• Performance versus cost analysis: Record not only improvement in primary metrics (e.g., F1
score) but also additional GPU-hours and memory usage.</p>
        <p>Features that yield meaningful performance gains will be retained; those that fail to clear this bar are
discarded.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Final Model Assembly and Benchmarking</title>
        <p>
          With n validated improvements in hand, we assemble the final Spanish clinical encoder:
1. Integrated training regimen: Combine all selected architectural tweaks, attention optimizations,
and tuning strategies into a unified pretraining and fine-tuning pipeline.
2. Comprehensive evaluation suite: Measure performance across multiple clinical tasks and
datasets. The benchmarking suite from our Survey [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] will be used.
3. Comparative analysis: Benchmark against (i) best-in-class Spanish/multilingual encoders (e.g.,
BETO, XLM-RoBERTa), and (ii) a closed, general-purpose LLM via API calls (e.g., GPT-style
model), tracking accuracy, latency, and cost per query.
4. Resource and environmental reporting: Document total GPU-hours, peak memory usage,
and estimated carbon footprint savings relative to LLM baseline.
        </p>
        <p>This methodological framework not only tests our central hypotheses under realistic constraints but
also produces a reproducible, openly licensed research artifact that can directly inform both academic
and industrial clinical NLP deployments.</p>
        <p>Open Science Commitment. Aligned with the principles of transparency and reproducibility, we will
publish all code, data splits, model checkpoints, and evaluation scripts under permissive open-source
licenses. This fully documented release is intended to (i) enable fair comparisons and rapid adoption
in Spanish clinical NLP, (ii) support institutions with limited resources, and (iii) foster collaborative
improvements by the broader research community.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>This thesis raises several broader issues that merit reflection, both within the scope of our work and in
the wider research landscape.</p>
      <p>First and foremost is the growing disparity in research accessibility. The current trajectory of NLP
favors massive, resource-intensive models maintained by a handful of well-funded organizations. These
models often operate as black-box APIs and are prohibitively expensive to replicate or fine-tune in
resource-constrained environments. This trend poses a serious challenge to the ideals of Open Science
and equitable technological access. Our work is, in part, a reaction to this imbalance—an attempt to
demonstrate that it is still possible to perform meaningful, high-quality NLP research with modest
resources, provided that methodology and tooling are approached critically and creatively.</p>
      <p>Another area of ongoing concern is the gap between academic NLP and real-world clinical needs.
While we believe Spanish clinical encoders have the potential to contribute significantly to healthcare
settings, forging meaningful collaborations with hospitals and healthcare providers remains dificult.
Reaching clinicians, understanding their specific challenges with unstructured data, and aligning our
tools with their workflows are non-trivial eforts. This points to the need for more interdisciplinary
bridges between NLP research and the healthcare sector—particularly in Spanish-speaking countries,
where resource gaps are even more pronounced.</p>
      <p>We are also acutely aware that our focus on encoder-only architectures goes somewhat against
current trends, which are heavily biased toward decoder-based generative models. These models
dominate headlines and benchmarks, but they often come at immense computational and financial cost
to operate. In contrast, our belief is that encoder models remain highly competitive for many structured
clinical tasks, ofering a far more sustainable alternative when used efectively.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the Instituto de Ingeniería del Conocimiento and Grant
PID2023-148577OB-C21 (Human-Centered AI: User-Driven Adapted Language Models) by MICIU/AEI/
10.13039/501100011033 and by FEDER/UE.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used IIC/RigoChat-7b-v2 in order to: Grammar and
spelling check, Paraphrase and reword. After using these tool(s)/service(s), the author(s) reviewed
and edited the content as needed and take(s) full responsibility for the publication’s content.
ation for Computational Linguistics, Online, 2020, pp. 8440–8451. URL: https://www.aclweb.org/
anthology/2020.acl-main.747. doi:10.18653/v1/2020.acl-main.747.
[17] G. López-García, J. M. Jerez, N. Ribelles, E. Alba, F. J. Veredas, Transformers for clinical coding in
spanish, IEEE Access 9 (2021) 72387–72397. doi:10.1109/ACCESS.2021.3080085.
[18] C. P. Carrino, J. Llop, M. Pàmies, A. Gutiérrez-Fandiño, J. Armengol-Estapé, J. Silveira-Ocampo,
A. Valencia, A. Gonzalez-Agirre, M. Villegas, Pretrained biomedical language models for
clinical NLP in Spanish, in: Proceedings of the 21st Workshop on Biomedical Language
Processing, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 193–199. URL:
https://aclanthology.org/2022.bionlp-1.19. doi:10.18653/v1/2022.bionlp-1.19.
[19] D. P. Jeong, S. Garg, Z. C. Lipton, M. Oberst, Medical adaptation of large language and
visionlanguage models: Are we making progress?, arXiv preprint arXiv:2411.04118 (2024).
[20] G. G. Subies, Álvaro Barbero Jiménez, P. M. Fernández, Clintext-sp and rigoberta clinical: a
new set of open resources for spanish clinical nlp, 2025. URL: https://arxiv.org/abs/2503.18594.
arXiv:2503.18594.
[21] T. Dao, D. Y. Fu, S. Ermon, A. Rudra, C. Ré, Flashattention: Fast and memory-eficient exact
attention with io-awareness, 2022. URL: https://arxiv.org/abs/2205.14135. arXiv:2205.14135.
[22] B. Warner, A. Chafin, B. Clavié, O. Weller, O. Hallström, S. Taghadouini, A. Gallagher, R. Biswas,
F. Ladhak, T. Aarsen, N. Cooper, G. Adams, J. Howard, I. Poli, Smarter, better, faster, longer: A
modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference,
2024. URL: https://arxiv.org/abs/2412.13663. arXiv:2412.13663.
[23] U. Zaratiana, N. Tomeh, P. Holat, T. Charnois, Gliner: Generalist model for named entity recognition
using bidirectional transformer, 2023. URL: https://arxiv.org/abs/2311.08526. arXiv:2311.08526.
[24] Z. Han, C. Gao, J. Liu, J. Zhang, S. Q. Zhang, Parameter-eficient fine-tuning for large models: A
comprehensive survey, 2024. URL: https://arxiv.org/abs/2403.14608. arXiv:2403.14608.
[25] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
adaptation of large language models, 2021. URL: https://arxiv.org/abs/2106.09685. arXiv:2106.09685.
[26] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller, faster,
cheaper and lighter, 2020. URL: https://arxiv.org/abs/1910.01108. arXiv:1910.01108.
[27] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization
and training of neural networks for eficient integer-arithmetic-only inference, 2017. URL: https:
//arxiv.org/abs/1712.05877. arXiv:1712.05877.
[28] H. Wang, S. Ma, L. Dong, S. Huang, H. Wang, L. Ma, F. Yang, R. Wang, Y. Wu, F. Wei, Bitnet:
Scaling 1-bit transformers for large language models, 2023. URL: https://arxiv.org/abs/2310.11453.
arXiv:2310.11453.
[29] P. Suganthan, F. Moiseev, L. Yan, J. Wu, J. Ni, J. Han, I. Zitouni, E. Alfonseca, X. Wang, Z. Dong,
Adapting decoder-based language models for diverse encoder downstream tasks, arXiv preprint
arXiv:2503.02656 (2025).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is all you need,
          <year>2023</year>
          . URL: https://arxiv.org/abs/1706.03762. arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Improving language understanding by generative pre-training</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nickolls</surname>
          </string-name>
          , I. Buck,
          <string-name>
            <given-names>M.</given-names>
            <surname>Garland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Skadron</surname>
          </string-name>
          ,
          <article-title>Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?, Queue 6 (</article-title>
          <year>2008</year>
          )
          <fpage>40</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          , T. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Brockman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McLeavey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Robust speech recognition via large-scale weak supervision</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2212.04356. arXiv:
          <volume>2212</volume>
          .
          <fpage>04356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pavlov</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Zero-shot text-to-image generation</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2102.12092. arXiv:
          <volume>2102</volume>
          .
          <fpage>12092</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2112.10752. arXiv:
          <volume>2112</volume>
          .
          <fpage>10752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          , E. Agirre,
          <article-title>Lessons learned from the evaluation of spanish language models</article-title>
          ,
          <source>arXiv preprint arXiv:2212.08390</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Altosaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ranganath</surname>
          </string-name>
          ,
          <source>Clinicalbert: Modeling clinical notes and predicting hospital readmission</source>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>1904</year>
          .05342. arXiv:
          <year>1904</year>
          .05342.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Báez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Villena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Durán</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Dunstan,</surname>
          </string-name>
          <article-title>The Chilean waiting list corpus: a new resource for clinical named entity recognition in Spanish</article-title>
          , in: A.
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
          </string-name>
          , T. Naumann (Eds.),
          <source>Proceedings of the 3rd Clinical Natural Language Processing Workshop</source>
          , Association for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>291</fpage>
          -
          <lpage>300</lpage>
          . URL: https://aclanthology. org/
          <year>2020</year>
          .clinicalnlp-
          <volume>1</volume>
          .32/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .clinicalnlp-
          <volume>1</volume>
          .
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>García</surname>
          </string-name>
          <string-name>
            <surname>Subies</surname>
          </string-name>
          , Á. Barbero Jiménez,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martínez Fernández</surname>
          </string-name>
          ,
          <article-title>A comparative analysis of spanish clinical encoder-based models on ner and classification tasks</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>31</volume>
          (
          <year>2024</year>
          )
          <fpage>2137</fpage>
          -
          <lpage>2146</lpage>
          . URL: https://doi.org/10.1093/jamia/ocae054. doi:
          <volume>10</volume>
          .1093/jamia/ocae054. arXiv:https://academic.oup.com/jamia/article-pdf/31/9/2137/58868058/ocae054.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Subies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Zamorano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Samy</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. B. Sanchez</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Sandoval</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Nieto</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          <string-name>
            <surname>Jimenez</surname>
          </string-name>
          ,
          <article-title>Rigoberta: A state-of-the-art language model for spanish</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2205</volume>
          .
          <fpage>10233</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cañete</surname>
          </string-name>
          , G. Chaperon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          ,
          <source>in: PML4DC at ICLR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Maria: Spanish language models</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          ). URL: https://upcommons.upc.edu/handle/2117/367156#.YyMTB4X9A-0. mendeley. doi:
          <volume>10</volume>
          .26342/2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradientdisentangled embedding sharing</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2111</volume>
          .
          <fpage>09543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , Associ-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>