1. Introduction

Madrid, Spain * Corresponding author. $ iago.vazquez@itcl.es (I. X. Vázquez); rodrigo.sedano@itcl.es (R. Sedano); silvia.gonzalez@itcl.es (S. González); javier.sedano@itcl.es (J. Sedano)

Beyond Titles: Semantic Matching of Jobs and Skills Using LLMs and S-BERT⋆

Iago X. Vázquez

Rodrigo Sedano

Silvia González

Javier Sedano

0 0 ITCL Technology Center , 70 López Bravo St., 09001 Burgos , Spain

2025

000 0 0002

This paper presents an overview of the participation in Task B of the edition of TalentCLEF, a shared task held within the Conference and Labs of the Evaluation Forum (CLEF) 2025. Task B focuses on the challenge of matching job titles to relevant professional skills, a core problem in human capital management. Our team developed a system based on a novel ensemble approach that integrates representations from multiple large language models, which are combined and refined using a Sentence-BERT (S-BERT) model. We describe our methodology, datasets, and evaluation results, showing notable improvements over the oficial baseline, which uses job and skill titles alone. Our best system achieved a MAP of 0.2442, clearly surpassing the baseline MAP of 0.1874.

eol>Human Resources Management Skill Extraction Sentence Embeddings Semantic Retrieval

1. Introduction 2. Task Description and Data Sources

Task B of TalentCLEF 2025 consists of a job-skill matching challenge. To develop our model, we used two datasets provided by the TalentCLEF 2025 organization team. These datasets are the Development

Set, intended for experimenting with diferent approaches, and the Test Set, used for the final evaluation of the developed algorithms.

The list of skills used in this task was obtained from the European Skills, Competences, Qualifications and Occupations (ESCO) 3 database. ESCO is a tool of the European Union that categorizes and links skills, competences, qualifications, and occupations to improve employability, education, and labour mobility across Europe. It facilitates interoperability between labour market and education systems.

Each provided dataset includes two files that define the model inputs: • queries: This file contains a list of occupation names, each identified by a unique key. • corpus_elements: This file contains a list of skills extracted from the ESCO database. Each skill is identified by a unique key. The corresponding URIs, along with various alternate titles available in ESCO, are included.

The results obtained from both datasets are evaluated against a ground truth, assessing the ranked relevance of skills in relation to each occupation.

3. Evaluation

The evaluation of the models focused on ranking all available skills from a provided source according to their relevance to each job title. The used evaluation metrics are: • Mean Average Precision (MAP): Measures the average precision at multiple levels of retrieval for each query, then averages the results across all queries. This metric is particularly suitable for evaluating information retrieval systems, as it accounts for the ranking order of the retrieved results. • Mean Reciprocal Rank (MRR): Computes the average of the reciprocal ranks of the first relevant result for each query. It is particularly efective in tasks where prioritizing the early retrieval of at least one relevant result is critical. • Normalized Discounted Cumulative Gain (nDCG): Evaluates the quality of a ranking by considering both the relevance and position of relevant documents, penalizing those that appear lower in the list. It is normalized to allow comparison across queries. • Precision at diferent cutofs (P@X): Measures the proportion of relevant documents among the top X results. It provides a meaningful assessment of ranking performance, especially in contexts where the highest-ranked outputs are of primary importance to users.

4. Baseline

The oficial baseline provided for Task B of TalentCLEF 2025 was adopted. This baseline computed the cosine similarity between the embeddings of occupation titles and skill names, as provided in the files. Since multiple alternative titles are available for each skill, the highest similarity score between any of these and the job title was taken as the skill’s similarity value. The embeddings were generated using the all-MiniLM-L6-v2 S-BERT model, which was also adopted in our approach.

Although this method establishes a useful reference point, it relies exclusively on surface-level lexical similarities. Therefore, it may overlook deeper semantic connections that could arise when contextual cues, or definitions, are incorporated.

5. Proposal Description 5.1. Overview

The approach for the accomplishment of Task B is based on computing semantic similarity between job and skill definitions. The pipeline consists of the following steps: • Definition Generation : Each job title and skill is defined using prompt-based queries to three diferent configurations, involving combinations of two distinct LLMs,two diferent prompts for job titles and two distinct prompts for skills. These combinations, denoted as 1, 2 and 3, are described in detail in Subsection 5.2. As a result, three distinct definitions are generated for each job title and each skill. Although multiple alternate titles were available for each skill, the first one was selected in each case to generate the definition, as it corresponds to the preferred term in ESCO. • Sentence Embedding: All the generated definitions are encoded using the all-MiniLM-L6-v2

S-BERT model, denoted as . • Similarity Scoring: For each job title and configuration , the cosine similarity is computed between the job definition and each skill definition generated under the same configuration. • Ensemble Aggregation: For each job title, the similarity scores of each skill across the three configurations are averaged to produce a final ranking. The skills are then ordered based on these aggregated scores. In this way, the relative importance of each skill for a given job is determined.

In Fig. 1, a schema representing the pipeline used to generate similarity between a job title and a skill is presented. In the figure, it can be seen how the cosine similarities between the job titles and the skills are merged to form the ensemble aggregation.

5.2. Combinations of LLMs and Prompts

To generate the definitions, we used three diferent configurations, based on two distinct large language models (LLMs), two prompts for job titles and two prompts for skills. For each configuration, a single query was executed per occupation or skill.

The two LLMs employed are: • Model 1 (1): gemma-3-4b-it-qat [ 5 ] • Model 2 (2): gemma3:27b [ 5 ] 1 is a quantized version of 2, with 4 billion parameters, while 2 is the original unquantized version, with 27 billion parameters. Both models were used with a temperature setting of 0.7 and the following system role prompt: ‘You are an expert in the labour market. Answer in English.’ The prompts used for job titles are: • Job Prompt ( 1): ‘Please define, outlining its functions, in less than fifty words, the following occupation: <OCCUPATION>. Write the definition in a similar way to the following example: “The techniques, theories, and commonly accepted strategies regarding pricing of goods. The relation between pricing strategies and outcomes in the market such as profitability maximisation, deterrence of newcomers, or increase of market share.". Make sure you only write the definition.’ • Job Prompt ( 2): ‘Please define, outlining its functions, in less than fifty words, the following occupation: <OCCUPATION>.’ The prompts used for skills are: • Skill Prompt (1): ‘Please define, outlining its utility, in less than fifty words,the following skill: <SKILL>. Write the definition in a similar way to the following example: “The techniques, theories, and commonly accepted strategies regarding pricing of goods. The relation between pricing strategies and outcomes in the market such as profitability maximisation, deterrence of newcomers, or increase of market share.". Make sure you only write the definition.’ • Skill Prompt (2): ‘Please define, outlining its utility, in less than fifty words,the following skill: <SKILL>.’

In these prompts, the symbols ‘<’ and ‘>’ indicate the position where the occupation or skill name was inserted.

Finally, the combinations of models and prompts produced the following configurations: • Configuration 1 ( 1): 1 + 1 + 1 • Configuration 2 ( 2): 2 + 1 + 1 • Configuration 3 ( 3): 2 + 2 + 2

Configurations 1 and 2 used the same prompts ( 1 and 1) but diferent models, whereas 3 employed alternative, simpler prompts ( 2 and 2). In contrast, 1 used the quantized model 1, while 2 and 3 used the unquantized one 2.

In Fig. 2, an example of the generated definitions is shown. There, unique keys identifying skills from the Development Set are associated with their corresponding definitions, obtained through the configuration 1, which is composed of the LLM gemma-3-4b-it-qat, along with the job prompt 1 and the skill prompt 1.

6. Results

The results obtained on the Development Set are presented in Table 1, including those for each individual configuration, the oficial baseline provided for Task B of TalentCLEF 2025, and the final ensemble. The use of LLMs shows a clear improvement over the oficial baseline, suggesting that data augmentation—specifically through the inclusion of item definitions—may be beneficial for NLP tasks. Additionally, the improvement observed with the ensemble compared to the individual models indicates that combining multiple systems could help ofset the specific limitations of each model.

7. Conclusions

The paper Beyond Titles: Semantic Matching of Jobs and Skills Using LLMs and S-BERT presents the submission of the AI Research Group in the ITCL Technology Center to TalentCLEF 2025 Task B, where the job-skill matching problem using an ensemble of S-BERT embeddings based on LLM-generated definitions is addressed. The results suggest that data augmentation, through the generation of definitions, can lead to improved performance. Furthermore, combining multiple LLMs may help enhance results, possibly by compensating for the specific limitations of individual models.

The obtained results may be relevant for NLP systems, particularly those that rely on semantic understanding and concept matching, such as the Recommendation Portal currently under development within the AI4Labour project (GA:101007961), on which the ITCL is currently working. Initial evaluations for the development of the algorithms presented in this proposal were conducted using the O*NET database4, within the scope of that project. This database is the American counterpart to the European ESCO. The Recommendation Portal aims to suggest courses to users based on their skills and educational background, while also seeking to suppress skills that are prone to automation, thereby facilitating adaptation to the AI era. To integrate information from diferent domains and sources, systems like the one proposed here may be employed.

Acknowledgments

We would like to acknowledge the organizers of TalentCLEF 2025 for providing a well-structured and impactful challenge. Our experiments were conducted using publicly available models and open-source tools. This research was carried out under the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement No. 101007961.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT, Grammarly in order to: Grammar and spelling check, Paraphrase and reword. After using this tool/service, the authors reviewed and edited the content as needed, and take full responsibility for the publication’s content.

[1]

Gasco ,

Fabregat ,

García-Sardiña ,

Deniz ,

Rodrigo ,

Estrella ,

Zbib , Talentclef at clef2025: Skill and job title intelligence for human capital management , in: C. Hauf , C.

Macdonald , D.

Jannach , G.

Kazai , F. M.

Nardini , F.

Pinelli , F.

Silvestri , N. Tonellotto (Eds.), Advances in Information Retrieval , Springer Nature Switzerland, Cham, 2025 , pp. 479 - 486 .

[2]

Naveed ,

A. U.

Khan ,

Qiu ,

Saqib ,

Anwar ,

Usman ,

Barnes ,

A. S.

Mian , A comprehensive overview of large language models , ArXiv abs/2307 .06435 ( 2023 ). URL: https://api.semanticscholar. org/CorpusID:259847443.

[3]

Reimers , I. Gurevych , Sentence-bert: Sentence embeddings using siamese bert-networks , in: Conference on Empirical Methods in Natural Language Processing , 2019 . URL: https://api. semanticscholar.org/CorpusID:201646309.

[4]

Gasco ,

Fabregat ,

García-Sardiña ,

Estrella ,

Deniz ,

Rodrigo ,

Zbib , Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management, in: International Conference of the Cross-Language Evaluation Forum for European Languages , Springer, 2025 .

[5]

Team ,

Kamath ,

Ferret ,

Pathak ,

Vieillard ,

Merhej ,

Perrin ,

Matejovicova ,

Ramé ,

Rivière , et al., Gemma 3 technical report, arXiv preprint arXiv:2503.19786 ( 2025 ).