<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (M. Rodriguez);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Two-Stage Multilingual Job Title Matching System: Combining Expert Knowledge and LLM-based Ranking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mar Rodriguez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olatz Perez-de-Viñaspre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naiara Perez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HiTZ Basque Center for Language Technology - Ixa NLP Group, University of the Basque Country UPV/EHU</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper presents our participation in the TalentCLEF 2025 Shared Task A, which focuses on identifying and ranking job titles similar to a given query across English, German and Spanish. We propose and compare two approaches: (1) an end-to-end LLM-based baseline that performs both retrieval and ranking of job titles in a single step; and (2) a two-step pipeline that first retrieves candidates using the ESCO taxonomy, followed by semantic ranking with an LLM. Our experiments investigate the impact of various preprocessing techniques, including translation and normalization, as well as diferent retrieval configurations using sentence embeddings. Results show that combining ESCO-based filtering with LLM ranking, especially when using English as a pivot language, improves performance across languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Job Title Matching</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>ESCO</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, the field of Human Resources (HR) has experienced significant transformation, largely
driven by advances in Natural Language Processing (NLP) and the emergence of LLMs. These
technologies have enabled the development of intelligent systems capable of processing large volumes
of unstructured textual data, such as résumés and job descriptions, to identify candidates who best
match specific job requirements [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. As these systems continue to evolve, they are increasingly being
integrated into recruitment pipelines. However, practical deployment of NLP systems in HR contexts
faces several key challenges [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These include multilingualism, ensuring fairness, mitigating bias,
and achieving cross-sector adaptability. Multilingual systems must handle semantic diferences across
languages without losing domain-specific meanings; fairness is essential to prevent discrimination in
system outputs; mitigating bias is necessary because training data can reflect or amplify existing societal
biases; and cross-sector adaptability is also important, as job semantics vary significantly between
professional domains.
      </p>
      <p>
        In this context, the TalentCLEF 2025 Shared Task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] Task A challenges participants to develop
systems that identify and rank job titles most similar to a given query job title. For each job title in
the provided test set, participants must generate a ranked list of similar titles drawn from a specified
knowledge base (Figure 1). The task involves English, German and Spanish, making it a multilingual
challenge. This paper presents our approach to this task, where we explore the application of Large
Language Models (LLM) to the semantic matching of job titles. Our baseline strategy employs an LLM
fully end to end, performing both retrieval and ranking of relevant job titles in a single pass. Given the
practical challenges of processing large candidate sets directly with LLMs, our main exploration centers
on a two-step pipeline: first, we apply a retrieval step based on similarity metrics to ESCO’s taxonomy
to reduce the candidate pool; then, we use an LLM to perform semantic ranking on the filtered results.
      </p>
      <p>The remainder of this paper is structured as follows: Section 2 presents the datasets and supporting
resources used for the task. Section 3 describes our methodology, covering data pre- and postprocessing
strategies, and our two proposed approaches. Section 4 details the experimental setup, and the results
achieved by both systems. Finally, Section 5 discusses the findings, identifies current limitations, and
outlines avenues for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data and Other Resources</title>
      <p>In this section, we present the occupational classification taxonomies ISCO and ESCO, which served as
foundational resources for performing the task. We also describe the dataset provided for the shared
task, detailing its structure.</p>
      <sec id="sec-2-1">
        <title>2.1. ISCO and ESCO</title>
        <p>In order to carry out the task, ISCO1 and ESCO2 codes were provided and subsequently used as part
of the methodology. The following subsection ofers a brief overview of these codes, outlining their
structure and relevance within the context of the task.</p>
        <p>ISCO (International Standard Classification of Occupations) is a four-level classification of
occupation groups (Major Group, Sub-Major Group, Minor Group and Unit Group). Each group is
identified by a title and a numerical code, and is accompanied by a description that defines its scope:
• Major Group is denoted by a 1-digit code; e.g., 3 Technicians and associate professionals
• Sub-Major Group is denoted by a 2-digit code; e.g., 32 Health associate professionals
• Minor Groups are denoted by 3-digit codes; e.g., 322 Nursing and midwifery associate
professionals
• Unit Groups are denoted by 4-digit codes; e.g., 3221 Nursing associate professionals
Each unit group consists of multiple occupations that are highly similar in both skill level and
specialization. Since ISCO is a statistical classification, its occupation groups are mutually exclusive. This
results in a strictly mono-hierarchical structure, in which every element at level 2 or below has exactly
one parent group.</p>
        <p>ESCO (European Skills, Competences, Qualifications and Occupations) is a multilingual
classification system that works as a dictionary, describing, identifying and classifying professional
occupations and skills relevant for the EU labour market. ESCO provides descriptions of 3,039
occupations and 13,939 skills linked to these occupations, translated into 28 languages (namely, all oficial
EU languages plus Icelandic, Norwegian, Ukrainian, and Arabic). Figure 10 in Appendix A shows an
example of an occupation at the ESCO level.</p>
        <p>Each ESCO occupation is mapped to exactly one ISCO-08 code (i.e., the 2008 version of the ISCO
classification). ISCO-08 can therefore be used as a hierarchical structure for the occupations pillar (Figure</p>
        <sec id="sec-2-1-1">
          <title>1https://ilostat.ilo.org/methods/concepts-and-definitions/classification-occupation 2https://esco.ec.europa.eu</title>
          <p>11 in Appendix A). ESCO occupations are located at level 5 and lower of the ISCO-08 classification.
Some groups of ISCO-08 do not contain ESCO occupations, typically because they represent roles
without relevant economic activity in the EU, such as water and firewood collectors .</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dataset description</title>
        <p>The corpus consists of job titles in three languages: English, German and Spanish. And it covers a
wide range of job domains and professional sectors. It is divided into three subsets: training, validation,
and test. Table 1 summarizes the number of pairs, queries, target corpora, and labeled relationships
across languages and subsets. This multilingual and multi-sector dataset enables robust evaluation of
job title similarity methods across both linguistic and professional boundaries. The training data has
been compiled using publicly available terminologies. In contrast, both the validation and test sets were
annotated by domain experts.</p>
        <p>Training Set. For each language involved in the task, a corresponding training dataset is provided in a
tabular format (see Table 2), consisting of four columns. The family_id column contains the ISCO family
identifier, which represents the occupational group to which the job titles belong; the id column includes
the ESCO identifier indicating the source of the job title pair; and the remaining two columns, jobtitle_1
and jobtitle_2, represent pairs of related job titles, with jobtitle_2 being semantically or functionally
related to jobtitle_1.</p>
        <p>Validation and Test Set. They are organized into two separate files for each language considered
in the task: one for “queries” and one for “corpus elements”. The queries file (see Tables 3a and 4a)
includes a unique identifier for each query ( q_id) along with the corresponding job title used as the
query (jobtitle). The corpus elements file (Tables 3b and 4b) similarly contains a unique identifier ( c_id)
for each element, as well as the associated job title present in the corpus (jobtitle).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We developed two distinct methodological approaches for the multilingual job title matching task.
Our primary approach employs a two-step pipeline (System 1, Subsection 3.3) that first applies ESCO
taxonomy-based retrieval to filter candidates, followed by LLM-based semantic ranking. This design
addresses the computation constraints imposed by LLM context length limitations when processing
large candidate sets directly. As a comparative baseline, we also implemented an end-to-end approach
(System 2, Subsection 3.3) that performs both retrieval and ranking in a single LLM pass, though
this was only feasible for the smaller test set due to input size constraints. Both systems incorporate
preprocessing steps for text normalization and optional translation (Subsection 3.1), with postprocessing
steps to handle multilingual label mapping (Subsection 3.4).</p>
      <sec id="sec-3-1">
        <title>3.1. Preprocessing</title>
        <p>The text preprocessing stage involved normalization, language standardization and, in some cases,
translation. First, known abbreviations such as QA, Sr, and AVP were expanded to their full forms; roman
numerals commonly used to indicate seniority levels (e.g., I, II, III ) were converted into standardized
terms such as Junior, Intermediate, and Senior; the placement of such modifiers was also swapped to
follow correct English syntax, for instance, transforming Engineer Senior into Senior Engineer; unknown
acronyms were preserved in uppercase to maintain their distinctiveness; and extraneous punctuation
was removed and whitespace normalized.</p>
        <p>Regarding the optional translation component of preprocessing, job titles in Spanish and German
were translated into English using Claude 3.7 Sonnet3 to facilitate cross-lingual comparison and
leverage the generally superior performance of NLP models in English. This translation strategy also
provides a mechanism for reducing gender bias inherent in languages with grammatical gender. For
example, in Spanish, the titles Vicepresidente, gerente sénior de planificación de capital (masculine) and
Vicepresidenta, gerente sénior de planificación de capital (feminine) both translate to Vice President, Senior
Capital Planning Manager in English, where the gender distinction that could bias the matching process
is entirely neutralized.</p>
        <sec id="sec-3-1-1">
          <title>3https://www.anthropic.com/claude/sonnet</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. System 1: Retrieval and Ranking</title>
        <p>
          This approach employs a two-step pipeline designed to narrow down and order candidate job titles.
Initially, a retrieval phase uses similarity metrics derived from ESCO and ISCO taxonomies to select a
manageable subset of relevant candidates. Subsequently, an LLM performs filtering and fine-grained
ranking on this set. The following subsections describe in detail the retrieval, ranking, and additional
ifltering strategies implemented in this system.
3.2.1. Retrieval
To perform the initial retrieval of job titles, we leveraged the ESCO and ISCO taxonomies as semantic
pivots to identify the most relevant codes for each query and corpus element. We seek to map job
titles from diferent languages and domains to a standardized occupational classification, as it should
facilitate cross-lingual and cross-domain matching through shared taxonomic representations. For each
job title (query or corpus element), then, we first computed the similarity scores against all taxonomy
codes using several methods:
• Levenshtein distance [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as a simple string similarity baseline.
• Sentence-BERT (sBERT) embeddings [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], using all-MiniLM-L6-v24 for English texts and
distiluse-base-multilingual-cased-v15 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for German and Spanish.
• Flair-based document embeddings [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], combining static word embeddings (GloVe [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for
English; FastText [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for German and Spanish) with bidirectional Flair embeddings through mean
pooling to obtain a fixed-size vector for each job title, following the recommended recipe. 6
• RoBERTa-based embeddings, using RoBERTa base7 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for English and XLM-R8 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] for the
other languages.
        </p>
        <p>Based on these similarity scores, we next kept codes with perfect matches, if any, or selected up to 20
most relevant codes for each job title using one of three filtering strategies:
• A fixed similarity threshold, below which codes are discarded.
• A ratio-based approach, selecting codes within a given percentage of the highest similarity score.
• A gap-based strategy, which selects codes until a significant drop in similarity between consecutive
codes is detected.</p>
        <p>Having thus mapped queries and corpus elements to ESCO and ISCO codes, the retrieval finally consists
in selecting, for each query, all corpus elements that share at least one ESCO or ISCO code with the
query. This creates a premiliminary semantically filtered candidate set informed by expert knowledge
for subsequent LLM-based ranking.
3.2.2. Filtering and Ranking
While the previous step significantly reduces the candidate pool, the resulting sets still contain too
many job titles that require further filtering and precise ranking. However, these filtered candidate sets
are now manageable in size for processing by LLMs with suficient context capacity.</p>
        <p>
          We specifically experimented with three diferent prompts (Figure 2) on the Llama 3.3 70B
Instruct9 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] model, selected for its strong demonstrated multilingual performance. We deployed
the model in 2 A100 GPUs using vLLM [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and applied the model’s default generation hyperparameters,
except for temperature, which was set to 0.7 instead of 0.6 to encourage more diverse output. The first
        </p>
        <sec id="sec-3-2-1">
          <title>4https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2</title>
          <p>5https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1
6https://flairnlp.github.io/docs/tutorial-embeddings/other-embeddings#document-pool-embeddings
7https://huggingface.co/FacebookAI/roberta-base
8https://huggingface.co/FacebookAI/xlm-roberta-base
9https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct</p>
          <p>Ranking Prompt 1 — Relevance
I have this list of professions separated by ‘;’. I want to sort the professions related to ‘{query_label}’, discarding those
that are not as closely related. The list should be useful for someone who works as a ‘{query_label}’ and is looking for
another job. I just want the list, without any explanation. I want the list separated by ‘;’. {corpus_labels}.</p>
          <p>Ranking Prompt 2 — Career counselor
Analyze these professions: {corpus_labels}. As an expert career counselor, sort them by relevance for a {query_label}
considering: 1. Skill similarity 2. Industry proximity 3. Career progression paths. Return only the sorted list separated by
semicolons.</p>
          <p>Ranking Prompt 3 — Skill transfer
For a {query_label} considering a career change, rank these options by transferability of skills: {corpus_labels}.</p>
          <p>Most transferable skills first. Only return semicolon-separated list.
I have this list of professions separated by ‘;’. I want to sort the professions by relevance to ‘{query_label}’, and discard
those that are not as closely related. This list should be useful for someone who works as a ‘{query_label}’ and is
looking for another job. I just want the list, without any explanation. Do not add new job titles nor rephrase the ones I gave.
Simply discard irrelevant job postings and rerank the relevant ones. I want the list separated by ‘;’. Here is the original list:
{corpus_labels}.
prompt (“Relevance”) was crafted by the authors, while the second (“Career counselor”) and third (“Skill
transfer”) were proposed by GPT-4o10 through ChatGPT.11</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. System 2: End-to-End Baseline</title>
        <p>In this simpler approach, the entire task is handled in a single step by an LLM. This approach was
applied exclusively to the test set, as the size of the validation set exceeded the input length limit of the
Llama 3.3 70B Instruct. Another limitation is that, for Spanish and German, we were only able to test
the translated (English) version. The model’s tokenizer has greater fertility in non-English languages,
making it impossible to fit the original data within the input size constraints.</p>
        <p>Furthermore, this baseline does not include a retrieval phase based on ESCO similarity, unlike System
1. Instead, the model receives the full list of corpus job titles directly as input, along with the query title,
and is prompted to return a cleaned and ranked list of relevant results. To ensure comparability, all job
titles were preprocessed and translated into English using the same steps described in Subsection 3.1.</p>
        <p>To better control the model’s output, we refined the initial prompts after observing that previous
versions occasionally introduced new job titles not present in the input list. The refined prompt (see
Figure 3) explicitly instructs the model not to add or rephrase any job titles and to strictly filter out
irrelevant ones while reranking the remaining titles by their relevance to the query.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Postprocessing</title>
        <p>Even after LLM-based filtering and ranking, the output lists could still be excessively long for practical
use. When the ranked list exceeded 50 job titles—a threshold determined based on development data
distributions—we applied an additional filtering step: we trimmed the list to the top 100 candidates and
re-invoked the LLM to obtain a more focused ranking. For Spanish and German datasets, if translation
had been applied, we then mapped the English job titles back to their original languages. This step
10https://openai.com/index/gpt-4o-system-card
11https://chatgpt.com
often resulted in list expansion, as a single English label could correspond to multiple original-language
variants difering in grammatical gender or phrasing (see Table 5). Next, we mapped the output job
titles back to their corresponding corpus codes, since the LLM input contained only candidate job title
text. We retained only unique codes to eliminate duplicates resulting from this process. Finally, if the
resulting code list still remained excessively long, particularly due to the gender-based expansion in
German and Spanish, we arbitrarily retained only the first 80 codes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>In this section, we present the results for both the validation set (Subsection 4.1) and the test set
(Subsection 4.2). The validation results are used to justify the choices made for the final systems
submitted to the shared task. We discuss results in terms of precision (or false positives), recall (or false
negatives), and Mean Average Precision (MAP), which measures the quality of the ranked lists of job
titles by considering both precision and recall at multiple cutof points.</p>
      <sec id="sec-4-1">
        <title>4.1. Validation Experiments and Results</title>
        <p>For System 1, we first explored the full set of possible combinations of the strategies for the retrieval
phase, as explained in Subsection 3.2. In what follows, we report the most impactful results. It must be
noted that, in this step, it is crucial to maximize recall; as this step retrieves the candidates that the LLM
will process. At the same time, however, we need to minimize false positives so that the LLM input fits
within token limits.</p>
        <p>Pivot taxonomy. Figures 4a and 4b compare the performance of ISCO and ESCO as pivots in terms
of MAP and false positive rates obtained with sBERT embeddings. As it can be seen, using the ESCO
taxonomy is more beneficial than using the ISCO taxonomy, as it particularly helps reduce the false
positives. Thus, we only report results using ESCO henceforth.</p>
        <p>Embeddings. Figure 5 shows the MAP results obtained with the diferent tested methods to measure
the similarity between job titles and ESCO labels. We observe that sBERT clearly outperformed the
other methods, followed by Flair. RoBERTa-based embeddings yielded extremely poor results, as did
the Levenshtein distance. Hence, in what follows, we only report results with sBERT.
Filtering strategy. Figures 7 and 8 show the recall and MAP, respectively, of the diferent strategies.
In this case, the ratio-based method provided the best trade-of between recall and precision, compared
to the fixed value threshold and the gap-based strategy. We further experimented with various ratio
values to assess their impact on retrieval performance (Table 6). Lower ratios, such as 0.2 and 0.4,
achieved good recall while keeping false positives manageable. However, they tended to be overly
permissive. Among them, 0.4 slightly outperformed 0.2. In contrast, higher ratios like 0.8 led to a sharp
drop in recall. Mid-range values, particularly 0.6 and 0.7, ofered a better balance between recall and
precision, with 0.7 showing a slight advantage. Based on these findings, we selected 0.4 and 0.7 for
our final configuration. Detailed results for all ratios using sBERT across languages are provided in
Appendix A, Table 11.</p>
        <p>Having optimized the retrieval phase, we next explored the ranking phase, were our only
hyperparameter is the prompt passed to Llama 3.3 70B Instruct. In this exploration, we experimented with 4
ratio values for a wider view of the options and the efect of limiting the recall: 0.4, 0.6, 0.7 and 0.8.
MAP, Precision, and Recall scores for the diferent filtering strategies and their corresponding values, evaluated
on the English validation set using the sBERT model.
1 producing the most accurate and relevant results. Table 7 presents the corresponding MAP scores
for diefrent prompt configurations and ratio thresholds across languages. Another important factor
was execution time. Prompt 1 was consistently faster across all cases, typically requiring between 30
minutes and 3.5 hours, depending on the ratio and language (see Table 12 in Appendix A). In contrast,
Prompt 2 generally ranged from 1.5 to 9.5 hours, while Prompt 3 required between 3.5 and over 14
hours. Due to these limitations, we applied Prompt 1 exclusively in our final system.</p>
        <p>In this final experiments over the validation dataset, the last ranking phase using LLMs showed a
beneficial efect, as we obtain an increase of more than 0.1 in the MAP metric.</p>
        <p>Regarding the preprocessing step (Subsection 3.1), no validation results are available, as it was applied
exclusively to the test set. This decision stems from the fact that the need for normalization arose
when we observed significant discrepancies between the test data and the formats found in oficial
ISCO and ESCO taxonomies. Likewise, job title translation into English was incorporated for two main
reasons: (1) English yielded better performance in preliminary experiments, and (2) translation helped
reduce issues related to grammatical gender present in languages like Spanish and German. Due to
time constraints, we could not apply these enhancements to the validation set, and their impact was
only evaluated during the final testing phase.</p>
        <p>Run</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Test Results</title>
        <p>Based on the results for the validation set outlined in the previous subsection, we conducted five
distinct experimental runs on the test set by implementing diferent combinations of preprocessing
steps, translation, filtering ratios, and ranking strategies. These runs were designed to evaluate the
impact of various configurations on the system’s performance in identifying and ranking job titles
across English, German and Spanish. The five runs performed are as follows (Table 8):</p>
        <p>In Table 9, the MAP scores obtained for each run across the main monolingual language pairs can be
seen: English-English (en-en), Spanish-Spanish (es-es), and German-German (de-de). Runs that include
translation tend to perform better on Spanish and German, highlighting the benefit of cross-lingual
alignment through English. Notably, Run2 and Run4 achieve the highest average MAP scores across
the three languages, indicating that combining translation with sBERT retrieval and LLM ranking
yields superior results. Run5, which skips sBERT retrieval, exhibits markedly poor performance and is
consistently outperformed by the runs that include filtering. In Table 10 a sample of the predicted most
similar job titles for the query reliability engineer is presented.</p>
        <p>We observe a significant drop in performance on the test set compared to the validation set.
Furthermore, the influence of the ratio used to select ESCO codes appears less pronounced in the test set. To
investigate this discrepancy, we analyzed the classification of ESCO codes and found that it was less
efective than in the validation set. Specifically, similarity scores were generally lower, particularly for
the most similar codes. Figure 9 presents violin plots comparing the distributions of similarity scores
for both sets. The results indicate that the sBERT-based similarity scores align more closely with the
ESCO taxonomy in the validation set than in the test set. Consequently, our method may not be the
most suitable approach for the current test set.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Future Work</title>
      <p>Our initial strategy was using LLMs in a fully end-to-end manner, performing both filtering and ranking
of relevant job titles. However, due to the context-length limitations of current LLMs and the size of the
validation data, we evaluated this direct end-to-end approach only on the test set. The other approach
we explored was a two-step pipeline: first, we applied a pre-filtering phase based on ESCO taxonomy
similarity to reduce the candidate pool; then, we used an LLM to perform semantic ranking on the
ifltered results.</p>
      <p>Although the MAP scores obtained on the test set were generally modest, the methodology we
followed proved to be the most efective among the configurations tested. Particularly, the two-step
approach combining ESCO-based retrieval with LLM-based ranking. The incorporation of semantic
ifltering using sBERT embeddings provided a strong foundation for narrowing down the candidate
set, significantly outperforming simpler approaches such as Levenshtein distance or document-level
embeddings based on Flair and RoBERTa.</p>
      <p>One of the most impactful design choices was the decision to translate Spanish and German job titles
into English prior to further processing. This translation step not only improved MAP scores in the
corresponding language pairs, but also contributed to mitigating gender bias inherent in languages with
grammatical gender. As discussed in Section 3.4, English helped neutralize gender-specific job titles
by lacking this grammatical feature. Nonetheless, the translation served as a valuable cross-lingual
normalization mechanism that enhanced the LLM’s ability to generalize across languages.</p>
      <p>The analysis of filtering strategies revealed that ratio-based selection was the most balanced among
the three evaluated alternatives. Unlike fixed thresholds or abrupt similarity gaps, ratio-based filtering
allowed for dynamic cutofs that preserved both precision and recall. Empirical results showed that a
ratio of 0.7 ofered a particularly good trade-of, leading to better final rankings than more permissive
(0.4) or restrictive (0.8) values. Moreover, sBERT emerged as the most robust method for computing
semantic similarity, justifying its exclusive use in the final configurations.</p>
      <p>Regarding the ranking phase, the experiments confirmed the utility of prompting LLMs to reorder
and refine candidate lists. Prompt 1 not only delivered better alignment with the query job title but also
operated more eficiently in terms of execution time compared to the other alternatives. Even in the
end-to-end baseline, the LLM proved capable of performing efective filtering and ranking, though less
reliably than when supported by ESCO-based filtering.</p>
      <p>While our approach proved efective within the tested configurations for the validation set, it may
have relied too heavily on the ESCO taxonomy as a backbone for candidate filtering. This dependence
might have constrained the system’s flexibility and limited the exploration of alternative methods
for initial retrieval, such as using multilingual semantic search directly. Additionally, although we
had prepared the necessary code to support cross-lingual configurations (e.g., EN–ES, EN–DE), time
constraints prevented us from running these experiments. As a result, our evaluation was restricted to
monolingual settings.</p>
      <p>Future work could work on relaxing the dependency on ESCO by investigating multilingual
embedding-based retrieval strategies and assessing system performance in cross-lingual scenarios.
Further exploration into end-to-end LLM approaches with improved context management may also
unlock more streamlined solutions.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We acknowledge the support of the HiTZ Chair of Artificial Intelligence and Language Technology
(TSI100923-2023-1), funded by MTDFP, Secretaría de Estado de Digitalización e Inteligencia Artificial,
ENIA, and by the European Union-Next Generation EU / PRTR; the Spanish Ministry for Digital
Transformation and of Civil Service, and the EU-funded NextGenerationEU Recovery, Transformation
and Resilience Plan (ILENIA, 2022/TL22/00215335); Disargue (TED2021-130810B-C21) project (funded by
MCIN/AEI /10.13039/501100011033 and European Union NextGeneration EU/PRTR) and the TRAIN
(PID2021-123988OB-C31) project (funded by MCIN/AEI /10.13039/501100011033 and “ERDF A way of
making Europe”).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4 in order to: Grammar and spelling check.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Appendix</title>
      <p>This appendix provides additional resources to support the main content of the paper. It includes two
ifgures illustrating the structure and content of the ESCO and ISCO taxonomies: an example occupation
entry with its ESCO code, description, and alternative labels, and an overview of the occupations
pillar hierarchy. Additionally, two tables report extended experimental details: one presents the full
evaluation metrics of the sBERT model across languages and ratio-based thresholds for the validation
set, and the other details execution times for each prompt configuration across languages and ratios
using the validation set.</p>
      <p>Example of an occupation at the ESCO level,
including its ESCO code, description, and alternative
labels. Source: https://esco.ec.europa.eu/en/
classification/occupation_main</p>
      <p>Structure of the occupations pillar hierarchy. Source:
https://esco.ec.europa.eu/en/about-esco/escopedia/</p>
      <p>escopedia/
international-standard-classification-occupations-isco
01:29:00
01:53:00
02:06:00
01:47:00</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Cheng, R. Zha,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , et al.,
          <article-title>A comprehensive survey of artificial intelligence techniques for talent analytics</article-title>
          ,
          <source>arXiv preprint arXiv:2307.03195</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Otani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhutani</surname>
          </string-name>
          , E. Hruschka,
          <article-title>Natural language processing for human resources: A survey</article-title>
          , in: W. Chen,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kachuee</surname>
          </string-name>
          , X.-Y. Fu (Eds.),
          <source>Proceedings of the</source>
          <year>2025</year>
          <article-title>Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          (Volume
          <volume>3</volume>
          :
          <string-name>
            <surname>Industry</surname>
            <given-names>Track)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Albuquerque, New Mexico,
          <year>2025</year>
          , pp.
          <fpage>583</fpage>
          -
          <lpage>597</lpage>
          . URL: https://aclanthology.org/
          <year>2025</year>
          .naacl-industry.
          <volume>47</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Estrella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodrigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Overview of the TalentCLEF 2025 Shared Task: Skill and Job Title Intelligence for Human Capital Management, in: International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Levenshtein</surname>
          </string-name>
          , et al.,
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          ,
          <source>in: Soviet physics doklady</source>
          , volume
          <volume>10</volume>
          ,
          <string-name>
            <surname>Soviet</surname>
            <given-names>Union</given-names>
          </string-name>
          ,
          <year>1966</year>
          , pp.
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . URL: https://aclanthology.org/D19-1410/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1410.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Making monolingual sentence embeddings multilingual using knowledge distillation</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4512</fpage>
          -
          <lpage>4525</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>365</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>365</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          , in: E. M.
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Derczynski</surname>
          </string-name>
          , P. Isabelle (Eds.),
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Santa Fe, New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>1638</fpage>
          -
          <lpage>1649</lpage>
          . URL: https://aclanthology.org/C18-1139/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , C. Manning, GloVe:
          <article-title>Global vectors for word representation</article-title>
          , in: A.
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Pang</surname>
          </string-name>
          , W. Daelemans (Eds.),
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: https://aclanthology.org/D14-1162/. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>D14</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          . URL: https: //aclanthology.org/Q17-1010/. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>747</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . acl-main.
          <volume>747</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Letman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schelten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models, arXiv preprint arXiv:2407.21783</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , I. Stoica,
          <article-title>Eficient memory management for large language model serving with pagedattention</article-title>
          ,
          <source>in: Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>611</fpage>
          -
          <lpage>626</lpage>
          . URL: https://doi.org/10.1145/3600006.3613165. doi:
          <volume>10</volume>
          .1145/3600006.3613165.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>