<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Valer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolò Penzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacopo Staiano</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Despite the remarkable success recently obtained by Large Language Models, a significant gap in performance still exists when dealing with low-resource languages which are often poorly supported by of-the-shelf models. In this work we focus on Fassa Ladin, a Rhaeto-Romance linguistic variety spoken by less than ten thousand people in the Dolomitic regions, and set to build the first bidirectional Machine Translation system supporting Italian, English, and Fassa Ladin. To this end, we collected a small though representative corpus compounding 1135 parallel sentences in these three languages, and spanning five domains. We evaluated several models including the open (Meta AI's No Language Left Behind, NLLB-200) and commercial (OpenAI's gpt-4o) state-of-the-art, and indeed found that both obtain unsatisfactory performance. We therefore proceeded to fine-tune the NLLB-200 model on the data collected, using diferent approaches. We report a comparative analysis of the results obtained, showing that 1) jointly training for multilingual translation (Ladin-Italian and Ladin-English) significantly improves the performance, and 2) knowledge-transfer is highly efective (e.g., leveraging similarities between Ladin and Friulian), highlighting the importance of targeted data collection and model adaptation in the context of low-resource/endangered languages for which little textual data is available.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Translation</kwd>
        <kwd>Low Resource Languages</kwd>
        <kwd>Dialects</kwd>
        <kwd>Ladin</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction e.g., wrong translations or mixed up Ladin varieties.1
Further, previous works have mainly focused on the
The growing scale of Large Language Models, based on two South Tyrolean varieties, Gherdëina and Badiot [6]:
the Transformer architecture, has led to models with despite having a standardized written form and being
surprising capabilities in a number of tasks, including oficially recognized as a minority language, the Fassa
Machine Translation (MT). However, most of the NLP variety (Fascian) has been mostly overlooked [7], while
community efort is focused on high-resource standard- its speakers rightfully expect access to the same digital
ized languages, leaving behind the vast majority of local tools available for other languages [8].
under-resourced languages. Recent works have demon- We introduce the first dataset of parallel Fassa
Ladinstrated the utility of creating language-specific datasets Italian-English sentences, spanning over multiple
dofor MT [1] and the efectiveness of relatively small quan- mains: literature, news, laws, brochures, and game rules.
tities of high-quality translation data to teach a new lan- We evaluate several out-of-the-box translation systems,
guage to pre-trained LLMs [2, 3]. To date, little work has including the open (Meta AI’s No Language Left Behind,
addressed the Ladin language: even the most recent mod- NLLB-200) and commercial (OpenAI’s gpt-4o)
state-ofels that have included a great number of languages have the-art models, and experiment with both zero-shot
pivotnot been trained with Ladin data [4], due to the scarcity of based and multilingual strategies to obtain satisfactory
freely available parallel corpora (to our knowledge, only performances in bidirectional translation between Fassa
the OPUS corpora [5]), which are also poorly curated – Ladin and Italian/English. Figure 1 provides a schematic
overview of our experiments, which are thoroughly
deCLiC-it 2024: Tenth Italian Conference on Computational Linguistics, scribed in Section 4.</p>
      <p>Dec 04 – 06, 2024, Pisa, Italy Our results show how the collection of small quantities
⋆ iNnoFLaasnsaguLaagdeinL.eft Behind translates to Nesciun Lengaz Lascià Endò of parallel data is very efective in ‘adding’ support for
* Corresponding author. a previously unsupported language to existing
state-of$ giovanni.valer@studenti.unitn.it (G. Valer); the-art models. More specifically, we find that the
NLLBnicolo.penzo@unitn.it (N. Penzo); jacopo.staiano@unitn.it 200 model fine-tuned using a multilingual strategy can
(J. Staiano) outperform even the most capable commercial LLMs (e.g.,
httphstt:/p/sn:i/c/goiltohpuebn.zcoo.mgi/tjhou-vba.iloer(N( G..PVenazleor));;https://www.staiano.net OpenAI gpt-4o).
(J. Staiano) For reproducibility purposes, we make the dataset and
0009-0002-2145-9497 (G. Valer); 0009-0006-8648-3307 (N. Penzo);
0000-0002-1260-4640 (J. Staiano)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1See Appendix A.</p>
      <p>Attribution 4.0 International (CC BY 4.0).
g
fi
u
n
gnin lld lld en
t-en en it
ni lld lld lld en
ut-enfi en it lld it</p>
      <p>Fassa Ladin
Parallel Corpora
it en
fra en
fur en
fur
fur
Tain Dev tsTe
r</p>
      <p>lld</p>
    </sec>
    <sec id="sec-2">
      <title>Domains</title>
      <p>Laws
Games
Literature
News
Brochure
ID
OOD
4
instruction
lld en
lld it</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
    </sec>
    <sec id="sec-4">
      <title>BLEU chrF++ BERTscore</title>
      <sec id="sec-4-1">
        <title>2. Linguistic background</title>
        <p>Ladin3 (ISO 639-3 code: lld) is a Rhaeto-Romance
language. It has numerous varieties, each one spoken in
a diferent valley: Anpezan (Cortina d’Ampezzo),
Badiot (Badia Valley), Fascian (Fassa Valley), Fodom and
Col (Upper Cordevole Valley), and Gherdëina (Gardena
Valley) [9]. This paper focuses on Fassa Ladin, which is
spoken by approximately 8000 people and is further
divided in three local varieties: Cazét (upper valey), Brach
(lower valley), and Moenat (Moena). However, a standard
variety for Fassa Ladin (named Ladin fascian) was
established in 1999 and is currently used in oficial contexts;
this is the variety considered in our work.</p>
        <p>From a linguistic standpoint, Fassa Ladin is related to
Italian. It also shares some linguistic phenomena with
French, as the fronting of Latin /a/ to /E/, e.g., pater &gt;
fr. and lad. père (notice that both Ladin and French are
Western Romance languages). Ladin is closely related
also to Friulian, another Rhaeto-Romance language [9].
For these reasons we will consider Italian, French and
Friulian for our experiments. We report in Table 1 an
example of a sentence in Ladin, Italian and English.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3. Data</title>
        <sec id="sec-4-2-1">
          <title>Ladin</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Italian</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>English</title>
          <p>L porta dant azions per didèr dò la medema
oportunità anter eles e ic.</p>
          <p>Promuove azioni per favorire pari
opportunità tra donne e uomini.</p>
          <p>It promotes actions to foster equal
opportunities between women and men.
ature, news, games, laws, and brochures. The literature
subset is an excerpt of a collection of poems and stories
by Galante et al. [10].</p>
          <p>News are sourced from the Province of Trento press
ofice releases 4 and from social networks’ news.5 The
games subset contains parallel sentences from an online
game.6 Laws come from the Statuto del Comune di Moena
(Statute of the Municipality of Moena)7 and the Statuto
del Comun general de Fascia (Statute of the ‘Comun
general de Fascia’).8 Finally, the brochures subset consists in
promotional documents for tourists.9 The latter exhibits
distinct linguistic characteristics, and is characterized by
poorly aligned sentences and more ‘creative’ translations;
an example is provided in Table 2.</p>
          <p>Thus, we used it for out-of-domain testing (see
Section 4.3.1). The dataset compounds to 1135 parallel
sentences, unevenly distributed across domains (see Table 3).</p>
          <p>We built the first Fassa Ladin-Italian-English parallel cor- 4https://www.uficiostampa.provincia.tn.it/
pus drawing from multiple resources in 5 domains: liter- 5https://www.facebook.com/UalUnionAutonomistaLadina/
6http://avventuresuimontipallidi.it/
2https://github.com/jo-valer/machine-translation-ladin-fascian 7https://it.wikisource.org/wiki/Comun_de_Moena_-_Statut
3The term ‘Ladin’ can refer to multiple languages. In this paper we 8https://www.consiglio.provincia.tn.it/_layouts/15/dispatcher/doc_
use it only in reference to the Ladin of the Dolomites, spoken in the dispatcher.aspx?app=clex&amp;at_id=21177
so called Ladinia brissino-tirolese, across the provinces of Trento,
9https://www.giornaletrentino.it/cronaca/fiemme-e-fassa/il-libroBolzano, and Belluno. sui-ladini-di-fascia-spacca-presto-altre-4-mila-copie-1.2242774
en: Especially in winter, when work in the fields was less intense.
it: Questi riti venivano celebrati soprattutto in inverno, quando il lavoro nei campi era meno intenso.
(These rites were celebrated mainly in winter, when work in the fields was less intense.)
lld: Soraldut via per l’invern, ajache zacan l’era na sajon de paussa dal lurier te ciamp.</p>
          <p>(Especially during the winter, as it used to be a season of respite from work in the field.)
When English translations were not available we used
DeepL10 to translate Italian into English.</p>
          <p>We chose BLEU and chrF++ metrics in line with previous
work by Haberland et al. [1]. Although Multilingual
BERT does not explicitly support the Ladin language, we
4. Models and Methods assessed during preliminary analyses its alignment with
human similarity judgments on Ladin sentences. For this
In our experiments we used the following machine trans- reason we include it as reference for future work.
lation model families:
More implementation details are in Appendix C.</p>
          <p>4.2. Preliminary Experiments
• OPUS-MT, which provides unidirectional
bilin</p>
          <p>gual models [11];11 Firstly, we evaluate the performance of the pre-trained
• M2M-100, a Many-to-Many multilingual model models in translating between Italian and English ( →
that can translate directly between any pair of  and  → ), in order to have a reference for
subse100 languages [12]; quent experiments. The evaluation is performed using
• NLLB-200, Meta AI’s successor of M2M-100, sup- our in-domain test set. We also evaluate the performance
porting 200 languages [4]; of the models to translate from Ladin to English, either
• gpt-4o, the closed-source, state-of-the-art, considering Ladin sentences as if they were written in
general-purpose, instruction-tuned, multilin- Italian, French, or Friulian. Such test allows us to have
gual model developed and commercialized by a measure of how much a given model is ‘prepared’ to
OpenAI.12 transfer knowledge across these languages. NLLB-200 is
the only model pre-trained with Friulian data, thus
comparing models with this language is not possible.
Nevertheless, this preliminary experiment is a viable way to
investigate which language has the highest similarity to
Ladin from the model’s perspective.
4.1. Experimental Setup
For model evaluation and validation, we prepare two
held-out corpora, each of 108 aligned sentences (∼ 10%
of the in-domain corpus), randomly sampled from all
resources; the brochures subset was excluded from the
10https://www.deepl.com/
11https://huggingface.co/Helsinki-NLP/opus-mt-it-en
12The prompting strategy used for gpt-4o is presented in Appendix</p>
          <p>B.</p>
          <p>Preliminary Results The results presented in Table 4
show how M2M-100 has lower scores for all metrics, and
suggest that the best model for our experiments is
NLLB200; for this reason in the following we will consider
13https://github.com/mjpost/sacrebleu
14https://github.com/Tiiiger/bert_score
OPUS-MT
M2M-100
NLLB-200
OPUS-MT
M2M-100
NLLB-200
OPUS-MT
M2M-100
NLLB-200
M2M-100
NLLB-200
55.61
44.18
52.93
this model only. We can notice a lower performance in
 → , compared to  → , according to the
untrained metrics; BERTscore provides instead comparable
verdicts for the two tasks. This is an important finding
and has to be recalled when evaluating subsequent
experiments. Moreover, Friulian proves to be the most
promising language for our fine-tuning purposes, even though
Italian has good scores (BLEU score 21.76 vs. 18.52).
investigate if such model performs well in  →  even
though it is not trained with Italian-Ladin pairs. We refer
to the model fine-tuned with this approach as
‘NLLBpivot’.</p>
          <p>Multilingual Translation We fine-tune the model for
joint Ladin-Italian and Ladin-English bidirectional
translation. Each batch includes a randomly selected pair of
languages, in a single direction. We refer to the model
ifne-tuned with this approach as ‘NLLB-multi’.
4.3.1. Transfer Learning Across Domains
We evaluated the model ability to generalize in diferent
domains by testing it on our out-of-domain test set: the
brochures subset (excluded from the training set)
compounding to ∼ 5% of the sentences in our entire dataset.
4.3.2. Forgetting of Previous Knowledge
Finally, we investigate whether the fine-tuned models
sufer a performance drop in translating Italian to
English (and vice versa), thus exploring if we encounter
catastrophic forgetting [19]. We re-evaluate the models
on our test set, and compare the results with the scores
obtained in the preliminary experiments.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>5. Results</title>
        <p>4.3. Transfer Learning Experiments The performances obtained by the fine-tuned models, for
each translation task and for each test set, are reported
The training set consists of 862 parallel Fassa Ladin- in Table 5. As a strong baseline, we used gpt-4o.
Italian-English sentences (i.e., those remaining of the
original 1135 sentences after excluding 108 for validation, 5.1. Fine-tuning Approaches
108 for in-domain test and 57 for out-of-domain test). As
Ladin is not included in the pre-trained NLLB-200 model, The results show that both fine-tuning approaches are
efwe assign it the language code of Friulian, to leverage the fective in adding Fassa Ladin to the pre-trained NLLB-200
similarities between these two languages. In this work model, increasing the BLEU score baseline of  →
we use our dataset for model fine-tuning, a relatively  from 21.76 to 40+, and outperforming gpt-4o (28.19).
afordable strategy in terms of computational costs. 15 We The two approaches achieve also similar results in  →
experiment with the following approaches to add Fassa . Table 6 provides some examples of translated
senLadin to the NLLB-200 model: tences.</p>
        <p>We do not observe consistently higher scores by
usZero-shot Pivot-based Transfer Learning We fine- ing the zero-shot pivot-based transfer learning approach.
tune the model to only translate from English to Ladin This might be due to the little amount of data used for
(and viceversa), thus ignoring the Italian data. The pivot- fine-tuning, so that training also with Italian-Ladin
parbased approach has proven to be efective for several allel sentences helps by providing more data and higher
languages [18]. We adopt a zero-shot pivot-based ap- diversity. Since we fixed the number of training steps for
proach, meaning we do not fine-tune the model to per- NLLB-pivot and NLLB-multi, the NLLB-multi model has
form  ⇄ , as we assume not to have the data: we seen about half of the Ladin-English batches compared
to NLLB-pivot (the other half being Ladin-Italian).</p>
        <p>This suggests that the multilingual translation
ap15Nonetheless, the increasing input context length of current LLMs proach might be preferable in the context of endangered
ianllothwescfoorncuusirnrgenmt awnoyr-kshooft Aing-acrowntaelxettleaal.rn[1in7g], awphpircohacwheaslesahvoewtno languages for which little data is available, since it acts
future works. as a regularization method during training.
 → 
 → 
 → 
 → 
gpt-4o
NLLB-pivot
NLLB-multi
gpt-4o
NLLB-pivot
NLLB-multi
gpt-4o
NLLB-pivot
NLLB-multi
gpt-4o
NLLB-pivot
NLLB-multi</p>
        <p>Turning to gpt-4o performances, it proves to better
perform in  →  task than  → . Its scores
are lower compared to our models, but the most
significant finding is that it cannot generate text in Fassa Ladin
(/ → ). NLLB-multi performance in  →  is
much higher than  →  (BLEU score 39.75 vs. 32.23),
a finding calls for further analysis, left to future works, to
be interpreted. We also observe NLLB-pivot performing
poorly in  → , but not in  → . The zero-shot
pivot-based approach appears to work in only one
direction, a behavior we discuss in Section 5.3.
5.2. Domain Transfer
Unsurprisingly, a relatively lower performance on the
out-of-domain test set is observed, since the original data
presents less literal translations. As a consequence, the
metrics matching the model output against the ground
truth tend to lower scores. Still, especially for  →
, both NLLB-based models produce acceptable
outof-domain translations (BLEU scores 21+). The strong
out-of-domain performance of gpt-4o, better than our
models in understanding out-of-domain Ladin ( →
/), shows how the scarcity of fine-tuning data, and its
lack of linguistic diversity, has a negative impact on our
models’ performance. Another interpretation concerns
the robustness of gpt-4o in handling grammatical errors:
implicitly casting the source  sentences to another
similar language, known by the model, and then correctly
translating into the / targets (e.g., treating Ladin
words as if they were misspelled Italian words).
NLLB-200
NLLB-pivot (Δ)
NLLB-multi (Δ)
5.3. Forgetting of Previous Knowledge
Finally, we present the performance shift in  →  and
 →  of our fine-tuned models compared to the
pretrained NLLB-200 (Table 7). The idea is to evaluate the
catastrophic forgetting phenomenon [20] after adding
Fassa Ladin to the model, via the diference in BLEU
scores. NLLB-multi produces slightly better translations
after fine-tuning: this is expected, as it is better fitted to
our domain. NLLB-pivot, however, has a strong drop in
 →  (− 32.41), but not in  →  (+1.78).</p>
        <p>This suggests that after fine-tuning the model’s
encoder retained the ability to handle Italian inputs, while
the decoder ‘forgets’ how to generate Italian outputs.</p>
        <p>This also explains the NLLB-pivot low performance in
 → , but relatively high scores in  → .</p>
        <p>The problem of ‘forgetting’ can be mitigated by using
English-Italian sentence pairs during fine-tuning.</p>
      </sec>
      <sec id="sec-4-4">
        <title>6. Limitations</title>
        <p>A major limitation of this work consists in the little
amount of data used for fine-tuning, and its lack of
linguistic variety (most of the sentences are drawn from
laws). This has a considerable impact on our MT model,
which struggles on out-of-domain translations.</p>
        <p>In general, as suggested by Ramponi [8], it would be
important to assess the needs of the local community,
in order to focus the eforts towards the most useful
domains of application.</p>
      </sec>
      <sec id="sec-4-5">
        <title>7. Conclusions</title>
        <p>In this work, we show that it is possible to add a specific
language variety to a pre-trained MT model using little
amount of data for fine-tuning (fewer than 900 parallel
sentences). To add Fassa Ladin, we fine-tune the model
using as starting point a similar language included in
NLLB-200: Friulian.</p>
        <p>This approach significantly improves the performance.
Moreover, in such condition, fine-tuning with parallel
sentences in more than two languages proves to help
regularization and to improve translations, with respect
to a zero-shot pivot-based transfer learning approach.</p>
        <p>Future work includes extending the dataset with new
resources and domains, improving the alignment quality,
and including human evaluation of translation quality.
Adding data from other Ladin varieties might be a viable
solution to improve the low performance caused by
unknown words. Moreover, experimenting with translated
words from vocabulary entries could be beneficial for
Fassa Ladin, a language variety that has scarce parallel
data but various publicly accessible vocabularies.
[16] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
Pre-training of deep bidirectional transformers for
language understanding, in: J. Burstein, C.
Doran, T. Solorio (Eds.), Proceedings of the 2019
Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short
Papers), Association for Computational Linguistics,
Minneapolis, Minnesota, 2019, pp. 4171–4186. URL:
https://aclanthology.org/N19-1423. doi:10.18653/
v1/N19-1423.
[17] R. Agarwal, A. Singh, L. M. Zhang, B. Bohnet,
L. Rosias, S. Chan, B. Zhang, A. Anand, Z.
Abbas, A. Nova, J. D. Co-Reyes, E. Chu, F. Behbahani,
A. Faust, H. Larochelle, Many-shot in-context
learning, 2024. URL: https://arxiv.org/abs/2404.11018.
arXiv:2404.11018.
[18] Y. Kim, P. Petrov, P. Petrushkov, S. Khadivi, H. Ney,
Pivot-based transfer learning for neural machine
translation between non-English languages, in:
K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings
of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th
International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), Association for
Computational Linguistics, Hong Kong, China, 2019, pp.
866–876. URL: https://aclanthology.org/D19-1080.
doi:10.18653/v1/D19-1080.
[19] Y. Luo, Z. Yang, F. Meng, Y. Li, J. Zhou, Y. Zhang,
An empirical study of catastrophic forgetting
in large language models during continual
finetuning, 2024. URL: https://arxiv.org/abs/2308.08747.
arXiv:2308.08747.
[20] I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville,
Y. Bengio, An empirical investigation of
catastrophic forgetting in gradient-based neural
networks, 2015. URL: https://arxiv.org/abs/1312.6211.
arXiv:1312.6211.
[21] N. Shazeer, M. Stern, Adafactor: Adaptive
learning rates with sublinear memory cost, in: J. Dy,
A. Krause (Eds.), Proceedings of the 35th
International Conference on Machine Learning, volume 80
of Proceedings of Machine Learning Research, PMLR,
2018, pp. 4596–4604. URL: https://proceedings.mlr.
press/v80/shazeer18a.html.
Wikipedia
QED
Sono usciti complessivamente tre numeri.</p>
        <p>A total of three issues were released.
Ie la prima plata ladina[1].</p>
        <p>It’s the first ladin page[1].</p>
        <p>E gli uomini delà , Meli esponilo Holly mise San ,
in estat’ teston’
And the men delà , Meli expose it Holly put San , in
estat’ teston’ (sic)
Si te serf demò la lum canche la se n va , te mencia
l soreie demò canche l taca a fiochèr
If you only need light when it goes out , you only miss
the sun when it starts snowing</p>
      </sec>
      <sec id="sec-4-6">
        <title>A. Previous Ladin corpora</title>
        <p>Three datasets from the OPUS corpora, namely
Wikipedia, QED, and Ubuntu, contain parallel
LadinItalian data. Unfortunately, none of these provide
information about the Language variety of the sentences
(e.g., the ones mentioned in Section 2). Some of them also
present non-aligned sentences (see examples in Table 8).</p>
      </sec>
      <sec id="sec-4-7">
        <title>B. Prompt for gpt-4o</title>
        <p>###INTRODUCTION###
You are a expert translator specialized in
low-resource languages and dialects.</p>
        <p>Your core competence is bidirectional translation
between italian (IT), english (EN), and fassa
ladin (LLD) languages.
###INSTRUCTIONS###
You will be provided with information on the
source language (SOURCE_LANG), a textual input
(SOURCE_TEXT), and a target language (TARGET_LANG).
Your task is to accurately translate SOURCE_TEXT
from language SOURCE_LANG to language TARGET_LANG,
producing TARGET_TEXT.</p>
        <p>Your output is a JSON file with exactly the
following schema:
{
“SOURCE_LANG": str, \\the value of SOURCE_LANG.
“TARGET_LANG": str, \\the value of TARGET_LANG.
“TARGET_TEXT": str, \\the translation output.
}</p>
      </sec>
      <sec id="sec-4-8">
        <title>C. Implementation details</title>
        <p>All experiments were conducted on Google Colab16 using
a single NVIDIA T4 15GB GPU; the fine-tuning process
required approximately 1 hour.</p>
        <p>We fine-tune the NLLB-200’s distilled 600M variant 17
using the Adafactor optimizer [21], with a learning rate
of 1.5 · 10− 4 and 500 warm-up iterations.18 We use a
batch size of 16 sentences.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>