<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NT Team at Multilingual Job Title Matching Task A: Job Matching via Large Language Model-Based Description Generation and Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ho Thuy Nga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ho Thi Thanh Tuyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dang Van Thin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Information Technology</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Job matching is a dificult problem in the labor market due to the vague nature of job titles and the diferences in language and industry terms. These challenges make it hard to compare, classify, or retrieve similar job titles in diferent contexts. The multilingual job title matching task focuses on identifying and ranking job titles that are semantically similar across languages and industries, facilitating more accurate and consistent job classification across languages and domains. To address this task, we propose a system that leverages Large Language Models to enrich job title understanding and enhance matching performance. By combining generative and retrieval-based components, our approach captures semantic relationships, supports multilingual input, and demonstrates adaptability across diverse job domains.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Job Matching</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Generative and Retrieval-based Components</kwd>
        <kwd>Multilingual Input</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the context of an increasingly dynamic labor market, the precise identification and semantic alignment
of job titles play a critical role in a wide range of human resource and talent management tasks,
including candidate-job matching, career path modeling, and strategic workforce planning. However,
the heterogeneous and constantly evolving nomenclature of job titles presents considerable obstacles
for automated systems, which often struggle to accurately interpret, normalize, and link functionally
equivalent or related occupational roles. The TalentCLEF Task A Challenge aims to develop systems
capable of retrieving and ranking semantically similar job titles from a predefined knowledge base,
given an input job title [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The task is multilingual in scope, requiring support for English, Spanish,
and German, with Chinese as an optional language.
      </p>
      <p>In this paper, we propose a multilingual job title matching system that combines Large Language
Model (LLM) with keyword-based and embedding-based retrieval, followed by LLM re-ranking. Given
an input title, the system uses an LLM to generate a descriptive representation, computes its embedding,
retrieves the top-k most similar titles from a corpus of LLM-generated descriptions using both keyword
and embedding similarity, and refines the final ranking via LLM-based semantic re-ranking.</p>
      <p>The remainder of the paper is organized as follows. Section 2 provides the related work. The system
description is presented in Section 3, followed by results and discussion in Section 4, and Section 5,
respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Job title matching has been a long-standing problem in labor market analytics, particularly in job
recommendation systems, resume parsing, and occupational classification. Recent advances leverage
semantic representation techniques to address the variability and ambiguity of job titles across domains
and languages.</p>
      <p>
        Several studies have explored the use of pretrained embeddings for semantic similarity. Building
on prior research in labor market analytics, Zhu et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] explored semantic similarity techniques for
job title classification, introducing Carotene — a system that leverages Word2Vec-based embeddings
and Word Movers Distance to align job titles with standardized taxonomies. Their work demonstrated
the efectiveness of pretrained embeddings in addressing the variability of job titles across domains.
Expanding on this, Wang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed DeepCarotene, a multi-stream convolutional neural network
that incorporates both word- and character-level representations, significantly improving classification
performance over previous models.
      </p>
      <p>
        Domain adaptation has also been shown to be efective. For instance, ESCOXLM-R [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] adapted
the multilingual XLM-R model using masked language modeling and ESCO taxonomical structure,
improving performance on job-related classification and sequence labeling tasks across 27 languages.
The model particularly excels in capturing short, entity-level spans common in occupation and skill
data. Similarly, JobBERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] demonstrates that adapting pre-trained language models to the job market
domain—by incorporating skill information into job title understanding—can significantly improve
performance in occupation classification and job title normalization tasks. This highlights the importance
of domain-specific signals in efectively modeling occupation-related semantics.
      </p>
      <p>
        Addressing the challenge of rare and out-of-vocabulary (OOV) job titles, Ha et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] developed a
character-level neural model trained to replicate standard word embeddings. Their method enhanced
robustness in job title matching tasks, outperforming traditional embedding techniques and
demonstrating efective normalization of rare terms. In multilingual settings, aligning job postings with taxonomies
such as ESCO becomes more complex due to language diferences. Recent work has shown that
multilingual models like XLM-R and large language models (LLMs) can efectively support cross-lingual job
matching. Experiments with zero-shot classification and LLM-based annotation demonstrate strong
performance in mapping job postings in Italian and Spanish to English ESCO labels [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We participated in Task A: Multilingual Job Title Matching at TalentCLEF 2025, which focuses on
identifying and ranking job titles that are semantically similar to a given query title. The task is
multilingual, covering English, Spanish, German, and optionally Chinese, and aims to support job
matching across diferent languages and professional domains.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>We were provided with a multilingual dataset of job titles in English, Spanish, German and Chinese,
collected from diverse job domains. The corpus was designed to support the identification and ranking
Recording Engineer
A recording engineer specializes in capturing,
editing, and mixing audio to produce high-quality
recordings, ensuring optimal sound clarity and
artistic integrity for music, film, or broadcast
projects.</p>
        <p>Responsibilities: Operate and maintain recording
equipment (microphones, mixing consoles, DAWs);
manage audio quality.</p>
        <p>Skills: Proficiency in DAWs (Pro Tools, Logic Pro),
audio signal flow, audio processing tools.</p>
        <p>Industry: Music Production, Film &amp; TV, Broadcast
Media.</p>
        <p>Short job description
Recording Engineer
A recording engineer specializes in capturing,
editing, and mixing audio to produce high-quality
recordings, ensuring optimal sound clarity and
artistic integrity for music, film, or broadcast
projects.
of semantically similar job titles across languages and industries.</p>
        <p>The training set includes job titles linked to standardized taxonomies such as ISCO (International
Standard Classification of Occupations) and ESCO (European Skills, Competences, Qualifications
and Occupations), which helps guide model learning across occupational structures. In contrast, the
validation and test sets do not contain such taxonomy links, but instead provide query titles, corpus
elements, and binary relevance labels annotated by domain experts. These annotations ensure consistent
and accurate evaluation across diferent languages. Participants must generate TREC-formatted ranked
lists of similar job titles for each test query. The consistent structure of the validation and test sets
enables standard evaluation methods in information retrieval.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. System Description</title>
        <p>The architecture of our system designed for Task A: Multilingual Job Title Matching is illustrated in
Figure 1. It consists of four main components: Job Description Generation, Embedding, Retrieval, and
re-ranking. Each component is described in detail below.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Job Description Generation</title>
          <p>Starting from an input job title, a Large Language Model (LLM), specifically Deepseek-chat, is used to
automatically generate a detailed job description covering key responsibilities and required skills. From
this description, a concise short-form summary is then extracted. An example illustrating this process
is shown in Table 1, and the prompt used for guiding the model is provided in Appendix 6.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Embedding</title>
          <p>The full job descriptions generated during the Job description generation stage are subsequently
processed by alternative text embedding models, including multilingual-e5-large,
text-embedding-3large (OpenAI), and gemini-embedding-exp-03-07 (Google), to convert the textual content into dense
vector representations. This embedding process transforms the descriptions into numerical vectors that
encapsulate their semantic information, thereby facilitating eficient similarity searches and enabling
various downstream analytical tasks.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Retrieval</title>
          <p>For the Retrieval stage, the system implements two distinct strategies to identify relevant job descriptions
based on a given query. In the first approach, vector-based retrieval, the dense vector representations
obtained during the Embedding stage are utilized. Each query is encoded into a semantic vector, and
similarity is computed using cosine similarity between the query vector and the job description vectors.
The system then retrieves the top 100 most similar job descriptions ranked by cosine similarity scores.
In the second approach, hybrid retrieval, the system combines both dense semantic similarity and
sparse keyword-based relevance to enhance retrieval performance. The final relevance score for each
candidate document is computed as a weighted combination of the cosine similarity score from the
dense vectors and the keyword matching score from a sparse retrieval method. Formally, the hybrid
score is defined as:</p>
          <p>score =  × semantic_score + (1 −  ) × keyword_score,
with the highest hybrid scores are selected for further use.
where</p>
          <p>
            ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] controls the balance between the semantic and lexical components. The top 100 results
          </p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Re-ranking</title>
          <p>
            Finally, re-ranking stage is applied to refine the relevance of the retrieved results. Specifically, the top 100
candidates obtained from the retrieval stage are further processed by a Large Language Model (LLM) to
improve ranking quality. For each candidate, the corresponding short-form job description—generated in
the Job Description Generation stage—is used as input to the LLM, specifically Deepseek-chat, alongside
the original query. The Large Language Model then assigns a relatedness score ranging from 0 to
10, reflecting the semantic alignment between the query and the candidate description. We designed
specific prompts to guide the model behavior (Appendix 7). To compute the final ranking score, this
LLM-derived score is normalized to the range [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] by dividing it by 10, and then averaged with the
original retrieval score (either semantic similarity or hybrid score) as follows:
ifnal_score =
1 (︂ LLM_score
2
10
+ retrieval_score
︂)
(1)
(2)
Candidates are then reranked based on this final score, enabling more accurate prioritization of results
that are both semantically and contextually aligned with the user’s intent.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Models</title>
        <p>
          We utilized several advanced models in our work, each chosen for its unique capabilities in natural
language understanding and generation:
• deepseek-chat: deepseek-chat is a state-of-the-art large language model known for its strong
performance in complex reasoning and general-purpose text generation tasks. It is developed
as part of the DeepSeek LLM project, which focuses on scaling open-source language models
guided by scaling laws, with a training dataset of over 2 trillion tokens and advanced fine-tuning
techniques [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
• gemini-embedding-exp-03-07: gemini-embedding-exp-03-07 is an experimental version of
Google’s Gemini model, designed for enhanced contextual understanding and generation across
a wide range of domains.
        </p>
        <p>
          Gemini Embedding leverages Gemini’s multilingual and
codeunderstanding capabilities to produce generalizable embeddings applicable to various natural
language processing tasks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
• text-embedding-large-03-07: This model is optimized for generating high-quality text
embeddings, making it ideal for tasks such as semantic search, clustering, and information retrieval.
• multilingual-e5-large: Multilingual-e5-large is a multilingual embedding model capable of
handling text in multiple languages, enabling cross-lingual understanding and retrieval tasks
efectively [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation</title>
        <p>The evaluation metric for our job matching system is Mean Average Precision (MAP), which measures
the quality of the ranked list of predicted job matches by averaging the precision scores at the ranks
where relevant items occur. MAP is computed to assess the system’s efectiveness across diferent
language scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>To evaluate the efectiveness of our proposed system for Task A: Multilingual Job Title Matching, we
conduct experiments under three distinct configurations. Each configuration is designed to isolate the
contribution of specific components within our architecture. Performance is evaluated using standard
information retrieval metrics, with mean average precision (MAP) as the primary evaluation metric, on
a multilingual validation and test set.</p>
      <sec id="sec-4-1">
        <title>4.1. Experiment 1: Job Title Embedding Only</title>
        <p>In the first experiment, we evaluate a baseline where neither the Job description generation nor
the re-ranking module is applied. Instead, only the original job titles are embedded directly using
the embedding models described in the system architecture, namely multilingual-e5-large,
geminiembedding-exp-03-07, text-embedding-3-large. Retrieval is conducted using vector-based similarity on
these embedded titles. This experiment serves as a lower-bound reference for matching performance, as
it captures only the raw semantic information present in the job title without any contextual enrichment.</p>
        <p>The results from Experiment 1 in Table 2 confirm the hypothesis that directly embedding raw job
titles without additional context yields limited retrieval performance. Among the three models
evaluated, text-embedding-3-large and gemini-embedding-exp-03-07 significantly outperform
multilingual-e5-large, particularly in English and Chinese datasets. This suggests that newer
embedding models with broader semantic capacity are better suited for capturing the distinctions
among job titles even without context. However, the results indicate that title-only representations are
insuficient for capturing the deeper semantics of job roles. These results highlight the necessity of
incorporating richer contextual signals — such as job descriptions or re-ranking strategies — for more
efective job matching across languages.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experiment 2: Full Job Description without re-ranking</title>
        <p>The second configuration incorporates the Job description generation module to enrich each job title
with detailed textual content. Full-length descriptions generated in Job description generation stage are
embedded into dense vectors. Retrieval is performed using both vector-based and hybrid strategies as
described previously. However, in this setting, no re-ranking module is applied—the top 100 results are
ranked solely based on their semantic or hybrid relevance scores. This experiment demonstrates the
impact of LLM-generated descriptions on improving retrieval efectiveness.</p>
        <p>Table 3 reports retrieval results under the second configuration, which uses LLM-generated job
descriptions without re-ranking. At  = 0 (pure semantic search), both embedding models already
yield strong performance. For instance, gemini-embedding-exp-03-07 achieves scores of 0.6749
(English), 0.4749 (German), 0.4827 (Spanish), and 0.5961 (Chinese), with an average of 0.5572. Similarly,
text-embedding-large-3 performs slightly better with 0.675 (English), 0.4739 (German), 0.5055
(Spanish), and 0.604 (Chinese), averaging 0.5646.</p>
        <p>Across all languages, hybrid retrieval consistently outperforms both pure semantic ( = 0) and
keyword-based retrieval ( = 1.0). Both models reach peak performance at  = 0.4, indicating that
a integration of keyword signals improves results. Notably, performance declines at higher  values,
particularly in non-English languages such as Chinese, where keyword-based methods are less efective.</p>
        <p>Overall, text-embedding-large-3 yields stronger results than
gemini-embedding-exp-03-07, especially in semantic-dominant settings. These findings
highlight the value of LLM-generated descriptions and hybrid retrieval strategies for improving
multilingual job search performance.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experiment 3: Full System</title>
        <p>The final experiment evaluates the full system, including the re-ranking module. After generating
job descriptions and retrieving the top 100 candidates, we apply an LLM-based re-ranking method
using short-form job descriptions. Each candidate is scored for semantic relatedness to the query using
Deepseek-chat, and the final score combines the normalized LLM score and the original retrieval score.
This setup leverages both rich content generation and advanced contextual re-ranking to optimize
match quality. Empirical results show that this configuration achieves the highest performance across
all evaluation metrics, validating the efectiveness of integrating LLMs throughout the pipeline.</p>
        <p>Table 4 reports the detailed results of this configuration, which employs hybrid retrieval with  =
0.4—a value optimized in Experiment 2. Among the evaluated methods, text-embedding-large-3
achieves the highest average score (0.5748), slightly outperforming gemini-embedding-exp-03-07
(0.5699). These results further confirm the benefit of incorporating LLM-based re-ranking into the
multilingual candidate-job matching pipeline.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluation on Test Set</title>
        <p>Final performance is evaluated on the held-out test set using the same full system setup from validation
(Table 5), where the final scores are evaluated by the organizers on CodaBench.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we present a job matching system developed for Task A of TalentCLEF 2025. Our approach
integrates hybrid retrieval methods, along with re-ranking mechanisms powered by large language
models. We also leverage LLMs to generate job descriptions for better representation. Experimental results
show that combining keyword-based retrieval, semantic embeddings, and LLM-based re-ranking
significantly improves the relevance of job-job pairings. Furthermore, exploring more efective prompting
strategies for LLMs could further enhance the system’s overall performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was supported by The VNUHCM-University of Information Technology’s Scientific
Research Support Fund.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, we used GPT-4 in order to: check grammar, spelling, and edit the
content for clarity and coherence. After using this tool, we reviewed and edited the content as needed
and take full responsibility for the publication’s content.
technical report (2024). doi:10.48550/arXiv.2402.05672.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Prompt Job Description Generation</title>
    </sec>
    <sec id="sec-9">
      <title>B. Prompt re-ranking</title>
      <p>Given two short job descriptions written in English, evaluate how related they are on a scale from 0 to 10. Think
step by step before giving your final score.</p>
      <p>Follow this reasoning process:
– Compare job titles
– Compare domains
– Compare job responsibilities
Scoring scale (based on your reasoning):
– 10: Almost the same job (same title, industry and role)
– 8–9: Very closely related (similar work, same domain, strong role overlap)
– 6–7: Related in the same field with some overlap in tasks
– 3–5: Slightly related (same industry but diferent roles)
– 0–2: Barely related
Think carefully through each aspect above and then provide only the final score, from 0 to 10, with no
explanation.</p>
      <p>Input Format:
– Job 1: {short_description_job1}
– Job 2: {short_description_job2}</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Estrella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodrigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management, in: International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozturk</surname>
          </string-name>
          ,
          <article-title>Semantic similarity strategies for job title classification (</article-title>
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.1609.06268.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Abdelfatah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Korayem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <article-title>Deepcarotene -job title classification with multistream convolutional neural network</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Big Data (Big Data)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1953</fpage>
          -
          <lpage>1961</lpage>
          . doi:
          <volume>10</volume>
          .1109/BigData47090.
          <year>2019</year>
          .
          <volume>9005673</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , Escoxlm-r:
          <article-title>Multilingual taxonomy-driven pre-training for the job market domain</article-title>
          ,
          <year>2023</year>
          , pp.
          <fpage>11871</fpage>
          -
          <lpage>11890</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>662</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pelzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Jobbert: Understanding job titles through skills (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          . 48550/arXiv.2109.09605.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Djuric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vucetic</surname>
          </string-name>
          ,
          <article-title>Improving word embeddings through iterative reifnement of word- and character-level models</article-title>
          , in: D.
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Bel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Zong (Eds.),
          <source>Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1204</fpage>
          -
          <lpage>1213</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>104</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>104</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kavas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Serra-Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wanner</surname>
          </string-name>
          ,
          <article-title>Enhancing job posting classification with multilingual embeddings and large language models</article-title>
          , in: F.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Montemagni</surname>
          </string-name>
          , R. Sprugnoli (Eds.),
          <source>Proceedings of the 10th Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2024</year>
          ), CEUR Workshop Proceedings, Pisa, Italy,
          <year>2024</year>
          , pp.
          <fpage>440</fpage>
          -
          <lpage>450</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . clicit-
          <volume>1</volume>
          .53/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Deepseek llm: Scaling open-source language models with longtermism (</article-title>
          <year>2024</year>
          ). doi:doi.org/10.48550/arXiv.2401.02954.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dua</surname>
          </string-name>
          , et al.,
          <article-title>Gemini embedding: Generalizable embeddings from gemini (</article-title>
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2503.07891.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Multilingual e5 text embeddings: A
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>