<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SIGIR Workshop on eCommerce, Jul</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Machine Translation for Scalable, Context-Aware Cross-Lingual E-Com merce Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicole McNabb</string-name>
          <email>nicole.mcnabb@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dayron Rizo-Rodriguez</string-name>
          <email>dayron.rizo.rodriguez@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesus Perez-Martin</string-name>
          <email>jesus.perez-martin@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuanliang Qu</string-name>
          <email>yuanliang.qu0@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ClementRuin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alina Sotolongo</string-name>
          <email>alina.sotolongo@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pankaj Adsul</string-name>
          <email>pankaj.adsul@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Lezcano</string-name>
          <email>leonardo.lezcano@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Cross-lingual Search, Large Language Models, Entity-Aware Translation, Cross-lingual Ambiguity, Translatability,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Walmart Global Tech</institution>
          ,
          <addr-line>Sunnyvale, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>17</volume>
      <issue>2025</issue>
      <abstract>
        <p>E-commerce search in the US and Canada presents a unique opportunity for Cross-Lingual Information Retrieval (CLIR), allowing non-English-speaking customers to benefit from English-language search systems. Machine Translation (MT) enhances search performance by translating customer queries into English before processing them. However, traditional MT systems face challenges in this domain, including polysemy, high latency, limited contextual information in queries, and the presence of non-translatable entities such as brand names, making generic MT approaches suboptimal. We present a scalable three-step MT system for CLIR that delivers precise, context-aware translations for multilingual users across markets. First, we construct an LLM-powered Translation Memory that leverages product and search session data to generate accurate translations for context-scarce queries and those with non-translatable entities such as brand names. Second, we show the efectiveness of customizing translatability by language and locale. Third, we introduce an 8-bit quantized Neural Machine Translation (NMT) model enhanced with an LLM-driven contextual rule engine, achieving 3x higher throughput, 40%+ lower latency, and 58% lower inference cost than previous NMT approaches without compromising translation quality. Deployed towww.walmart.ca/fr, our system shows a statistically significant increase in customer conversion rate, +8.2% weighted nDCG, and +3.3% precision in search results compared to monolingual search.</p>
      </abstract>
      <kwd-group>
        <kwd>Neural Machine Translation (NMT)</kwd>
        <kwd>Integer Quantization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        There is a growing demand for B2C e-commerce search engines to address language barriers and cultural
diferences [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To improve query understanding, as well as search precision and recal2l,[
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ], recent
approaches [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] have explored automatic query translation as an early step in the search process. This
strategy is especially important for platforms serving a global audience, where the ability to process a
wide range of languages and cultural contexts is essential. This is particularly relevant for online stores
and marketplaces in the US, where 13% of the population speaks Spanish as a first languag8e],[and
Canada, where 22% of the population speaks French as a first language (primarily in Quebec)9[].
      </p>
      <p>
        While physical stores allow customers to visually find products, navigating an e-commerce site
often requires proficiency in the store’s native language. For example, 38% of Québécois citizens speak
only French [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and may struggle to shop on English-only e-commerce sites. Additionally, 40% of
customers avoid purchasing from websites not in their native langua1g]e. T[ hese findings highlight
the importance of multilingual support in modern e-commerce search engines to broaden market reach.
      </p>
      <p>Cross-Lingual Information Retrieval (CLIR) systems for e-commerce search often leverage Machine
Translation (MT) to convert user queries into the search engine’s language. However, traditional MT</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
systems face high latency and domain-specific translation challenges such as cross-lingual ambiguity,
regional and dialect variations, non-translatable entities like brands, and limited query context. These
issues must typically be addressed diferently for each language and locale.</p>
      <p>We introduce an eficient, scalable MT system designed for cross-lingual e-commerce search that
addresses these challenges across languages and markets. The key contributions of the system are:
1. An LLM-powered Translation Memory, the first ofline use of large language models (LLMs)
to resolve cross-lingual ambiguity and perform entity-aware translation for e-commerce search.
This system iscontext-aware, integrating product catalog and user behavior data to improve
translation quality.
2. Language-Tuned Translatability logic to efectively manage code switched queries (eg.
Spanglish) and regional and dialect variations (eg. Québécois, Puerto Rican Spanish) across markets,
with the flexibility to extend to new locales.
3. A quantized Neural Machine Translation (NMT) model for CPU-based inference that delivers
cost-efective, scalable performance at sub-10ms latency without sacrificing translation quality.</p>
      <p>We extend the system from Spanish search in the US to French search in Canada, validating our
approach through end-to-end search improvements ownww.walmart.ca/fr.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Prior work in CLIR for e-commerce has leveraged MT to convert user queries into the search system’s
primary language [
        <xref ref-type="bibr" rid="ref10 ref6 ref7">6, 7, 10</xref>
        ]. To deliver translations at scale with low latency, Yao et[7a]l.introduced
an asynchronous strategy combining the speed of Statistical Machine Translation (SMT) online with
the accuracy of NMT ofline. Recently, several fast NMT frameworks have been developed 1[
        <xref ref-type="bibr" rid="ref1 ref12 ref13">1, 12, 13</xref>
        ].
Perez-Martin et al[.10] adapted the highly-optimizeMdarian-NMT framework [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for synchronous
e-commerce search in Spanish. We extend this work by quantizing tMhaerian-NMT model to enable
faster, more cost-efective inference in production across market1s.
      </p>
      <p>To improve contextual translations in e-commerce, Laenen and Mo[e1n4s] leveraged visual context
from product images to improve quality of product description translations. Gao e[1t5a]la.dapted
LLMs using domain-specific tokenizer optimization and fine-tuning on product title corpora. However,
as noted in Section1, translating user queries poses additional challenges, such as cross-lingual
ambiguity and identification of non-translatable entities, that neither work on product title or description
translation addresses. These issues remain unexplored in LLM-augmented CLIR. We address them
using user engagement signals and product catalog data to improve translation quality and to generate
rules for handling non-translatable entities in NMT.</p>
      <p>Prior work on adapting MT systems to multiple languages has focused on machine translation rather
than translatability. Gupta et[a1l6.] proposed a cross-lingual decoder for low-resource language
adaptation using incremental training. However, fine-tuning a language-specific NMT model remains
more efective for medium- and high-resource languages. Moslem et al[.17] addressed stylistic variation
in languages using LLMs with in-context learning, though this approach is unsuitable for low-latency
applications. Perez-Martin et a[l1.0] built a dialect-sensitive lexicon from Wiktionary for eficient
Spanish query detection, but its application to other languages remains unexplored. We experiment
with this approach for Canadian French, confirming the importance of locale-specific lexicons and
showing that detection logic must be tailored to each language due to varying degrees of English overlap
and usage across locales.</p>
    </sec>
    <sec id="sec-3">
      <title>3. System Architecture</title>
      <p>We present an eficient multi-language, multi-locale query translation system that detects the source
language of the query and returns its English translation for retrieval by the underlying search engine.
1Marian-NMT website:marian-nmt.github.io</p>
    </sec>
    <sec id="sec-4">
      <title>4. Translation Memory</title>
      <p>The Translation Memory can be populated using a domain-specific MT model or an LLM guided by a
domain-specific prompt and optional in-context examples. We use GPT-4o (version 2024-05-13) with an
1. Do not translate brands, product lines, model names, media titles, or other named entities.
2. The translation must be in English.
3. The most accurate translation is the most concise translation that completely preserves the
query’s original intent.
4. The translation must refer to a product. If the query is already in English or the translation does
not refer to a product, return the original query as the translation.</p>
      <p>We chose GPT-4o for its strong multilingual support and cost-efectiveness compared to competitors
such as Claude Opus 3. As a result, the incremental cost of the LLM-based translations is less than 5%
of the total system cost.</p>
      <p>Evaluated on a sample of 5,500 popular, unique French queries with human-curated reference
translations, the LLM-based approach achieves a significant BLEU18[] score improvement, from 69.8
using domain-adaptedMarian-NMT to 97.7. This gain is primarily due to broader knowledge of entities
like brands and books often seen by NMT models, and the ability to detect and implicitly correct
grammar and spelling errors. Despite these strengths, GPT-4o still struggles with ambiguous queries
and those involving unknown non-translatable entities. We introduce two LLM modules that address
these shortcomings: the Entity-Aware Translator and the Ambiguity Resolver.</p>
      <sec id="sec-4-1">
        <title>4.1. Entity-Aware Translator</title>
        <p>The Entity-Aware Translator begins with a data pipeline that extracts structured entity data including
brands, product lines, franchises, characters, sports teams, and media titles from the product catalog
along with their associated product categories. This information is then matched with historical queries
from the past year where the entity appears as a proper sub-string. The module supplies the LLM
with three key pieces of information: the query, the identified non-translatable entity, and the product
categories associated with the entity. The LLM is prompted to translate the query according to the
earlier translation guidelines, with an added instruction: the entity must not be translated within the
context of the given product categories. Because the module relies solely on product catalog data to
contextualize queries, it can handle novel and long-tail entities mentioned in queries as long as they are
present in the catalog. Tabl1eillustrates examples of entity-aware translations.</p>
        <p>Compared to the generic translation prompt (Sect4i)o,nthis approach improved exact match
translation accuracy by 2.6% and increased the BLEU score by 1.4 on a representative random sample of 3,000
unique queries containing non-translatable entities, as validated by professional linguists. As such
queries comprise about 15% of French search trafic, this boosts overall translation accuracy by 0.4%.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Ambiguity Resolver</title>
        <p>The LLM-based Ambiguity Resolver translates common but ambiguous queries like ”gomme”, ”ballon”,
”trésor”, or ”pêche”, where the intent cannot be determined from the query alone and there is insuficient
context for the Entity-Aware Translator to apply. These queries may have multiple translations or refer
to non-translatable entities. To resolve these ambiguities, the module combines LLM-based translation
with product catalog data and session-level user behavior signals to infer the most likely intent.</p>
        <p>The module first uses the product catalog to extract all queries linked to non-translatable entities (eg.
product line ”Trésor”). For these and other single-token queries, it analyzes past session data including
query refinements to determine the distribution of add-to-cart (ATC) actions across distinct product
categories. The module then prompts the LLM to generate the most accurate translation of each query
using three key pieces of context: the product categories customers engaged with, the percentage of
ATC events linked to each category, and examples of query refinements. While this approach chooses
the single most relevant translation for simplicity, in cases where ATC distribution is nearly uniform
across categories, it may be preferable to blend search results from multiple plausible translations.
Table 2 presents examples illustrating this workflow.</p>
        <p>We find that 26% of French Canadian search trafic consists of potentially ambiguous single-token
queries, thousands of those overlapping with known non-translatable entities. By leveraging session
behavior, this approach selects translations that are more likely to drive customer engagement.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Language-Tuned Translatability</title>
      <p>
        As shown in Figure1, the Translatability component is customized for each language and locale.
PerezMartin et al[.10] showed that for Spanish, building a lexicon of language-specific terms, including
regional and dialectal variants, fromwiktionary.org and using lexicon lookups at runtime outperforms
pre-trained language classifiers such as those proposed by Joulin et al.1[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>We find similar results for French in Canada. Lexicons derived from external sources are essential
for capturing Québécois terms and translations. For example, “cartable” means “school bag” in France
but “binder” in Canada; “espadrille” refers to a light shoe in France but a sport shoe in Canada; “bleuet”
means “blueberry” in Canada but “cornflower” in France. Terms like “tuque” (winter hat) and “duo
tang” (folder) are uniquely Canadian. We usweiktionary.org to develop a lexicon of 172k unique French
Canadian terms for language detection.</p>
      <p>
        In the US, queries from Hispanic users often include Spanish-English code switchinge.(g., ”cake de
fresa”) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To handle this, the language detection logic must tolerate partial Spanish queries. We
achieve optimal translation performance by requiring approximately 30% of query tokens to appear
in the Spanish lexicon. However, French Canadian queries are more linguistically consistent. Most
containing a French token are either entirely French, include a non-translatable entity, or contain a
word identical in both French and English. Extending the 30% threshold from Spanish misclassified
40.8% of French queries in a 10,000 GPT-4o-labeled query sample, hurting relevance. Instead, we found
that classifying any query with at least one French token as French raised recall to 100% on the sample
and improved BLEU from 80.5 to 82.5.
      </p>
      <p>We also evaluated removing language detection entirely, relying on the fine-tunMeadrian-NMT
model to preserve the 18% of queries that are English or non-translatable entities. However, the model
correctly preserves English queries only 80.6% of the time, introducing errors like altered numerical
values (eg. ”058336173” to ”58336000”), verb tense shifts (eg. ”food weighing scale” to ”food weight scale”),
and truncation of long queries (eg. ”bissel powergroom swivel rewind pets” to ”bissel powergroom
swivel rewind p”). Thus, we conclude that robust language detection remains essential for maintaining
translation quality usinMg arian-NMT.
Corpus details. The average length, vocabulary size, and data split.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Neural Machine Translation</title>
      <p>
        To enable real-time query translation at scale, we require a lightweight and eficient NMT model.
We use theTiny.Untied [20] model architecture, which at only 16.9M parameters is well-suited for
ultra-low inference latency. We fine-tune Fr-EnTiny.Untied on a bilingual parallel corpus combining
in-domain French queries with LLM-generated English translations (Sect4)ioannd out-of-domain data
from the OPUS-MT benchmark [21] (Table 3). This joint training strategy helps the model learn both
general-domain content and e-commerce-specific patterns, including terminology, entities, and code
switched queries 2[
        <xref ref-type="bibr" rid="ref2">2, 23, 24</xref>
        ]. We evaluate model performance on BLEU and CHRF2[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] computed on
our held-out test set of 479k queries (Tab3le), achieving a BLEU of 49.77 and CHRF of 72.37 (Table6).
      </p>
      <sec id="sec-6-1">
        <title>6.1. Contextual Rule Creator</title>
        <p>While the LLM-powered Entity-Aware Translator (Secti4o.1n) enables high-quality ofline translation
of queries containing non-translatable entities by leveraging product catalog context, our lightweight
production NMT model does not fully capture this entity-specific knowledge, especially for new and
rare entities. To bridge this gap, we introduce the Contextual Rule Creator, a moduldeitsthilalstlearned
LLM behaviors into explicit, token-based rules applied during real-time inference.</p>
        <p>Rather than relying solely on heuristic rules, the Contextual Rule Creator extracts patterns from LLM
translations and codifies them into a structured decision framework. For each detected non-translatable
entity, the system performs:
1. Entity Context Extraction: Collects historical queries containing the entity along with its
associated product categories.
2. Candidate Rule Generation: Prompts the LLM to infer translation behaviors (translate vs.</p>
        <p>not-translate) conditioned on co-occurring tokens, capturing domain-specific nuances.
3. Candidate Rule Validation: Proposes structured rules, which are then validated using a
lightweight multi-stage evaluation process (Sect6io.1n.1).</p>
        <p>This process encodes the LLM’s implicit entity knowledge in a form that the NMT model can apply
during inference with minimal latency and cost.</p>
        <p>While the contextual rule engine provides a necessary bridge today, ittrisanasitional mechanism.
Our end goal is to retrain the NMT model directly on LLM-augmented datasets, progressively reducing
the need for explicit rules. Meanwhile, this distillation approach allows the NMT system to immediately
benefit from the improvements of the Entity-Aware Translator without costly daily retraining or added
instability. Table4 illustrates examples of LLM-proposed contextual rules.
6.1.1. Rule Validation and Deployment
The primary goal of the Contextual Rule Creator is to distill LLM translation behavior into lightweight
rules for NMT, requiring a validation process that is fast, scalable, and minimally disruptive. Our
approach balances the need for linguistic precision with the recognition that the LLM already captures
the correct behavior in most cases. The rule validation and deployment process follows four stages:
Step 1. Impact Simulation on Large Query Sample. Each rule is evaluated ofline on a representative
sample of 10 million pre-translated queries. The system computes the number and percentage of queries
impacted by the rule, returns the list of impacted queries and outputs after applying the rule, and
automatically promotes impacting rules.</p>
        <p>Step 2. Alignment Check with LLM Behavior. Promoted rules are evaluated by re-translating
impacted queries using the LLM without the rule applied. This verifies that no rule introduces new
behavior inconsistent with the LLM translation, and that a high percentage of LLM translations already
conform to the rule’s action (preserve or translate the entity). We find that well-formed rules align with
the LLM output in over 90% of impacted queries.</p>
        <p>Step 3: Targeted Linguist Review Trained linguists review each rule using a lightweight UI that
displays the rule logic, example queries with and without rule application, and the LLM translations.
Each rule undergoes a 5–10 minute review to ensure it improves NMT inference without causing
semantic drift. Rules are either approved, lightly adjusted, or rejected.</p>
        <p>Step 4: Long-Term Rule Quality Monitoring. Deployed rules are continuously monitored for
residual query impact after LLM translation and for cases of minimal ongoing impact, identifying
candidates for retraining. Rules with persistent high residual impact or low relevance are re-evaluated.
The ultimate goal is to transition knowledge distilled into rules back into model retraining, eliminating
the need for manual intervention over time.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Scalable and Cost-Eficient NMT Deployment</title>
        <p>
          Expanding to new regions requires replicating NMT instances, but GPU-based inference is expensive
and sensitive to trafic surges. To address scalability and cost constraints, we leverage 8-bit quantization
via the Marian-NMT 2[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] toolkit, deploying the fine-tuned, quantized model to CPU.
        </p>
        <p>We test this configuration on a dataset of 1M queries with token length ≤ 7, a constraint suficient
to cover 99.98% of French Canadian search queries. As shown in Tabl5e, the int8-quantized model
deployed on an instance with two Intel Ice Lake CPUs exhibits substantial performance improvements
over the GPU-based setup. Additional outcomes include:
1. Throughput of up to 150 QPS per instance, a 3x increase compared to the GPU configuration
2. p99 latency of 27 ms under maximum load
3. 58% reduction in monthly NMT inference costs</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Impact on Key Business Metrics</title>
      <p>We deployed our CLIR system in Canada by integrating our MT system into the existing English search
engine. To measure its impact, we benchmarked the CLIR system against a production baseline that
retrieves search results directly from French-language product content without translation. We first
evaluated translation quality and search relevance impact ofline, then ran a two-week AB test to
validate system performance on key business metrics.</p>
      <p>For the ofline evaluation, we extracted a representative random sample of 2,000 unique queries
weighted by page impressions from Canadian French search trafic over three months post-model
training. This sampling strategy increases the likelihood of including queries with unseen non-translatable
entities, helping evaluate the ability of the system to handle cold-start scenarios. Our MT system
achieved 90% exact match accuracy in translation and a BLEU score8o2f on this sample, based on
comparisons with reference translations provided by professional linguists.</p>
      <p>To measure search relevance, for both control and treatment, bilingual human judges manually
graded the relevance of the top 5 search results for each query on a 4-point scale depicted in T7a. ble
The evaluation showed+8.2% weighted nDCG and a3.3% increase in Relevant results under the CLIR
system, both achieving statistical significance (p-value &lt; 0.05).</p>
      <p>After validating the improvement in search relevance, we ran a two-week AB test. Users that issued
at least one search onwww.walmart.ca/fr qualified for the test. Half were randomly assigned to the
baseline search experience, while the other half received the CLIR experience. The test revealed a
statistically significant lift in conversion rate. We also observed significant reductions in zero-result
pages and search abandonment rate, showing that the CLIR system improves both customer satisfaction
and engagement for non-English-speaking customers.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>In this paper, we presented a machine translation system for cross-lingual e-commerce search that
delivers high translation quality, performance, and scalability across languages and markets. Our key
contributions include the first ofline use large language models to resolve cross-lingual ambiguity and
perform entity-aware translation in e-commerce search, language-tuned translatability logic to handle
code switched and dialectal queries across regions, and a quantized neural machine translation model
for low-latency, CPU-based inference that maintains translation quality while reducing cost. This work
opens the door for future research into deeper LLM integration with low-latency, multilingual, and
localized search applications.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT (GPT-4) in order to: Grammar and
spelling check, paraphrase and reword. After using this tool, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.
U. Germann, A. F. Aji, N. Bogoychev, A. F. Martins, A. Birch, Marian: Fast Neural Machine
Translation in C++, in: ACL 2018 - 56th Annual Meeting of the Association for Computational
Linguistics, Proceedings of System Demonstrations, Association for Computational Linguistics
(ACL), 2018, pp. 116–121. URL: https://aclanthology.org/P18-402.0doi:10.18653/V1/P18-4020.
[14] K. Laenen, M.-F. Moens, Multimodal neural machine translation of fashion e-commerce
descriptions, in: N. Kalbaska, T. Sadaba, F. Cominell, L. Cantoni (Eds.), Fashion Communication in the
Digital Age. FACTUM 2019, Springer, 2019, pp. 46–57. URL:https://doi.org/10.1007/978-3-030-15436-3_
4. doi:10.1007/978-3-030-15436-3_4.
[15] D. Gao, K. Chen, B. Chen, H. Dai, L. Jin, W. Jiang, W. Ning, S. Yu, Q. Xuan, X. Cai, L. Yang, Z. Wang,
LLMs-based machine translation for e-commerce, Expert Systems with Applications 258 (2024)
125087. URL: https://doi.org/10.1016/j.eswa.2024.12508.7
[16] K. K. Gupta, S. Chennabasavraj, N. Garera, A. Ekbal, Pre-training synthetic cross-lingual decoder
for multilingual samples adaptation in E-commerce neural machine translation, in: H. Moniz,
L. Macken, A. Rufener, L. Barrault, M. R. Costa-jussà, C. Declercq, M. Koponen, E. Kemp, S. Pilos,
M. L. Forcada, C. Scarton, J. Van den Bogaert, J. Daems, A. Tezcan, B. Vanroy, M. Fonteyne
(Eds.), Proceedings of the 23rd Annual Conference of the European Association for Machine
Translation, European Association for Machine Translation, Ghent, Belgium, 2022, pp. 241–248.</p>
      <p>URL: https://aclanthology.org/2022.eamt-1.2.7/
[17] Y. Moslem, R. Haque, J. D. Kelleher, A. Way, Adaptive machine translation with large language
models, 2023. URL: https://arxiv.org/abs/2301.1329.4arXiv:2301.13294.
[18] K. Papineni, S. Roukos, T. Ward, W. Zhu, BLEU: a method for automatic evaluation of machine
translation, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
(2002) 311–318. URL: http://dl.acm.org/citation.cfm?id=10731.3d5oi:10.3115/1073083.1073135.
[19] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Eficient Text Classification - ACL
Anthology, in: Proceedings of the 15th Conference of the European Chapter of the Association for
Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics,
Valencia, Spain, 2017. URL:https://aclanthology.org/E17-206 8./
[20] N. Bogoychev, R. Grundkiewicz, A. F. Aji, M. Behnke, K. Heafield, S. Kashyap, E.-I. Farsarakis,
M. Chudyk, Edinburgh’s Submissions to the 2020 Machine Translation Eficiency Task, in:
Proceedings of the Fourth Workshop on Neural Generation and Translation, Association for
Computational Linguistics, Online, 2020, pp. 218–224. URhLt:tps://aclanthology.org/2020.ngt-1..26
doi:10.18653/v1/2020.ngt-1.26.
[21] B. Zhang, P. Williams, I. Titov, R. Sennrich, Improving massively multilingual neural
machine translation and zero-shot translation, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault
(Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, Association for Computational Linguistics, Online, 2020, pp. 1628–1639. UhRtLt:ps:
//aclanthology.org/2020.acl-main.14 8. doi:10.18653/v1/2020.acl-main.148.
[22] M. Dhar, V. Kumar, M. Shrivastava, Enabling Code-Mixed Translation: Parallel Corpus Creation
and MT Augmentation Approach, in: Proceedings of the First Workshop on Linguistic Resources
for Natural Language Processing, 2018, pp. 131–140. URL:https://aclanthology.org/W18-381 7./
[23] D. Gautam, P. Kodali, K. Gupta, A. Goel, M. Shrivastava, P. Kumaraguru, CoMeT: Towards
CodeMixed Translation Using Parallel Monolingual Sentences, in: Proceedings of the Fifth Workshop
on Computational Approaches to Linguistic Code-Switching, 2021. UhRtLt:ps://aclanthology.org/
2021.calcs-1.7/. doi:10.18653/v1/2021.calcs-1.7.
[24] A. Pratapa, M. Choudhury, S. Sitaram, Word Embeddings for Code-Mixed Language Processing,
in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,
2018, pp. 3067–3072. URL: https://aclanthology.org/D18-134 4./doi:10.18653/v1/D18-1344.
[25] M. Popović, chrF: character n-gram F-score for automatic MT evaluation, in: 10th Workshop on
Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural
Language Processing, EMNLP 2015 - Proceedings, Association for Computational Linguistics (ACL),
2015, pp. 392–395. URL: https://aclanthology.org/W15-304.9doi:10.18653/V1/W15-3049.
[26] N. Bogoychev, R. Grundkiewicz, A. F. Aji, M. Behnke, K. Heafield, S. Kashyap, E.-I. Farsarakis,
M. Chudyk, Edinburgh‘s submissions to the 2020 machine translation eficiency task, in: A. Birch,
A. Finch, H. Hayashi, K. Heafield, M. Junczys-Dowmunt, I. Konstas, X. Li, G. Neubig, Y. Oda
(Eds.), Proceedings of the Fourth Workshop on Neural Generation and Translation, Association for
Computational Linguistics, Online, 2020, pp. 218–224. URhLt:tps://aclanthology.org/2020.ngt-1.2.6/
doi:10.18653/v1/2020.ngt-1.26.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Vashee</surname>
          </string-name>
          ,
          <source>The impact of MT on the Global Ecommerce Opportunity</source>
          ,
          <year>2022</year>
          . URhLt:tps://blog. modernmt.
          <article-title>com/the-impact-of-mt-on-the-global-ecommerce-opportun</article-title>
          .ity/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Katariya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Subbian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <article-title>Language-agnostic representation learning for product search on e-commerce platforms</article-title>
          ,
          <source>in: WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining</source>
          , Association for Computing Machinery, Inc,
          <year>2020</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>15</lpage>
          . URL: https://doi.org/10.1145/3336191.337185.2doi:
          <fpage>10</fpage>
          .1145/3336191.3371852.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Graph-based Multilingual Product Retrieval in E-Commerce Search, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Association for Computational Linguistics</article-title>
          , Stroudsburg, PA, USA,
          <year>2021</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>153</lpage>
          . URhLt:tps: //www.aclweb.org/anthology/2021.naacl-industry.
          <volume>1</volume>
          .9doi:
          <fpage>10</fpage>
          .18653/v1/
          <year>2021</year>
          .naacl-industry.
          <volume>19</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mangrulkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bengaluru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Ankith</surname>
          </string-name>
          <string-name>
            <surname>S</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Vivek</given-names>
            <surname>Sembium</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>M. S, Multilingual Semantic Sourcing using Product Images for Cross-lingual Alignment</article-title>
          ,
          <source>in: Companion Proceedings of the Web Conference 2022 (WWW '22 Companion)</source>
          , volume
          <volume>1</volume>
          , ACM,
          <year>2022</year>
          , p.
          <fpage>11</fpage>
          . URLh:ttps: //doi.org/10.1145/3487553.3524204. doi:
          <volume>10</volume>
          .1145/3487553.3524204.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ogueji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cheriton</surname>
          </string-name>
          ,
          <article-title>Towards Best Practices for Training Multilingual Dense Retrieval Models (</article-title>
          <year>2022</year>
          ). URL:https://arxiv.org/abs/2204.02363v.
          <year>1doi</year>
          :
          <fpage>10</fpage>
          .48550/arxiv. 2204.02363.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-F.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Davchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhagat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Dhillon</surname>
          </string-name>
          ,
          <article-title>Query transformation for multilingual product search</article-title>
          , in: SIGIR 2020 Workshop on eCommerce,
          <year>2020</year>
          . URLh: ttps://sigir-ecom. github.io/ecom2020/ecom20Papers/paper6.pd.f
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Luo,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <source>Exploiting Neural Query Translation into Cross Lingual Information Retrieval, in: SIGIR eCom</source>
          <year>2020</year>
          ,
          <year>2020</year>
          . URLh:ttps://arxiv.org/abs/
          <year>2010</year>
          .13659v.1 doi:10.48550/arxiv.
          <year>2010</year>
          .
          <volume>13659</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Flores</surname>
          </string-name>
          ,
          <year>2015</year>
          ,
          <article-title>Hispanic population in the United States statistical portrait, 2020h</article-title>
          .tUtRpLs://www. pewresearch.org/hispanic/2017/09/18/2015-statistical
          <article-title>-information-on-hispanics-in-united-st</article-title>
          . ates/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Heritage</surname>
          </string-name>
          ,
          <source>Some facts on the Canadian Francophonie</source>
          ,
          <year>2024</year>
          . URLh:ttps://www. canada.ca/en/canadian-heritage/
          <article-title>services/official-languages-bilingualism/publications/ facts-canadian-francophonie</article-title>
          .ht m.l
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez-Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gomez-Robles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gutiérrez-Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Adsul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajanala</surname>
          </string-name>
          , L. Lezcano,
          <article-title>Crosslingual search for e-commerce based on query translatability and mixed-domain fine-tuning</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>892</fpage>
          -
          <lpage>898</lpage>
          . URLh:ttps: //doi.org/10.1145/3543873.3587660. doi:
          <volume>10</volume>
          .1145/3543873.3587660.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rush, OpenNMT: Open-Source
          <source>Toolkit for Neural Machine Translation, in: Proceedings of ACL</source>
          <year>2017</year>
          ,
          <string-name>
            <surname>System</surname>
            <given-names>Demonstrations</given-names>
          </string-name>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>72</lpage>
          . URL: https://aclanthology.org/P17-401 2./
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchaiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ginsburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Gitman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Case</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Micikevicius</surname>
          </string-name>
          ,
          <article-title>OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models</article-title>
          , in: Proceedings of Workshop for NLP Open Source
          <string-name>
            <surname>Software (NLP-OSS</surname>
            <given-names>)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics (ACL)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>46</lpage>
          . URLh:ttps://aclanthology.org/W18-250.7 doi:10.18653/V1/W18-2507.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Junczys-Dowmunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grundkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dwojak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H. K.</given-names>
            <surname>Heafield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Neckermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Seide</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>