<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fine-Tuned Sentence Transformer for Multilingual Job Title Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chinmay Satish Bhangale</string-name>
          <email>chinmaybhangale.242it006@nitk.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prajwal Anil Gabhane</string-name>
          <email>gabhaneprajwal.242it011@nitk.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar Madasamy</string-name>
          <email>m_anandkumar@nitk.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology, National Institute of Technology Karnataka Surathkal</institution>
          ,
          <addr-line>Mangalore 575025</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Matching job titles is critical task in various fields, such as resume evaluation, and job recommendation platforms. Many companies use diferent job title for similar roles which creates ambiguity. This research tackles the issue by developing a machine learning-based strategy that makes use of a Sentence Transformer model paraphrasemultilingual-mpnet-base v2, finetuned for the job title matching task. The training dataset consists of job titles paired with their corresponding similar job titles across three languages-English, Spanish, and German-while the validation data includes a query file and a corpus file, each containing job titles in the same languages. To ensure data consistency, preprocessing steps are applied, like handling missing values, normalizing text and removing special characters. Cached Multiple Negatives Ranking Loss is used to improve retrieval accuracy, which helps the model to distinguish between similar and dissimilar job titles. After training, the embeddings are generated for each job title in query and corpus file. Cosine similarity is used to compute similarity scores between the query and corpus job title embeddings. Finally, for each query job title, corpus job titles are ranked based on their similarity scores. The model's performance evaluated using standard retrieval metrics, including Mean Average Precision (MAP), Mean Reciprocal Rank (MRR), and Precision@K. The fine-tuned model achieved an average MAP score of 0.49 across English, Spanish, and German languages on the validation data, and 0.45 on the test data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Job title</kwd>
        <kwd>Sentence Transformer</kwd>
        <kwd>Cached Multiple Negatives Ranking Loss</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the tight job market today, organizations are increasingly competing among themselves to find
and retain the best qualified candidates [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The idea of Information and Communication Technologies
(ICTs) in Human Resources (HR) processes has lessened the burden of time-related hiring, especially
with the introduction of artificial intelligence (AI) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The evolution of the job application space has
been marked with many changes, a key one being the development of automated job recommendations.
For job seekers, this also means getting better matched job recommendations from the millions of jobs
posted online at any given moment, and applying for those roles according to their skills and ambitions.
      </p>
      <p>
        Job-matching is at the center of human resource management. Enterprise post-management focuses
on defining requirements of whether the talent can meet the job (position) qualifications and whether
they match the talent that the enterprise needs for growth in a systematic and scientific way [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>Accurately matching job titles is still dificult because of ambiguity in job naming conventions,
linguistic variances, and domain-specific terminologies, even with the integration of ICTs and
improvements in job recommendation systems. Conventional techniques like keyword matching and
TF-IDF frequently fall short in capturing the semantic similarity between related but distinct job titles.
This causes dificulties for job seekers searching for specific roles and also reduces the efectiveness of
recruitment systems.</p>
      <p>There is a need for systems that understand the semantic meaning of job titles and return the most
relevant matches, especially in cross-domain and multilingual environments. Thus, in order to improve
precision and speed of job title retrieval and matching procedures, a more intelligent and scalable
approach is required.</p>
      <p>
        Transformer-based models have changed the landscape of NLP with their performance boost over
the previous state-of-the-art approaches in information retrieval tasks. Transformers, like Bidirectional
Encoder Representations from Transformers (BERT) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and its fellow descendants, learn contextual
meaning and capture semantic similarities much better than bag-of-words or Term Frequency-Inverse
Document Frequency (TF-IDF) based methods. Research has shown that models like Sentence-BERT
improve retrieval tasks by generating high quality sentence embeddings that can easily be compared
using cosine similarity. The latest developments in this domain such as the multilingual BERT (m-BERT)
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Cross Lingual Language Model with RoBERTa (XLM-R) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], they have made cross-lingual
retrieval easier and most suited to be employed for performing job title matching.
      </p>
      <p>
        This work leverages transformer-based models for Job Title Matching, for retrieving and ranking
similar job titles, as described in TalentCLEF 2025 overview paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The training dataset contains job
titles and their corresponding similar job titles, whereas test dataset consists of query file and corpus
ifle, each consists of job titles. The Siamese Sentence Transformer model is then fine-tuned to learn
meaningful representation of job titles. The embeddings are generated for each job title in query and
corpus file. Cosine similarity is then used to compute similarity scores between the query and corpus job
title embeddings. Finally, for each query job title, corpus job titles are ranked based on their similarity
scores.
      </p>
      <p>The rest of this report is structured as follows. Section II presents a review of literature pertaining to
the use of NLP techniques in the field of human resource management. Section III covers a detailed
methodology of the work that includes dataset, data preprocessing and transformer model information.
Later, similarity score calculation is described. Section IV contains the experimental results obtained
after fine-tuning the model. Section V concludes the report by summarizing the entire work, followed
by suggestions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>Work related to the usage of deep learning techniques, especially transformers, in human resource
management are discussed below.</p>
      <p>
        Modified BERT architecture using a Siamese or triplet network [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has been implemented to generate
sentence embeddings eficiently. The model does not simply concatenate each pair of sentences. Instead,
it processes each independently and applies a pooling mechanism (mean, or max pooling) to obtain
fixedsized embeddings. An order of semantic similarity is optimized with the help of contrastive, triplet, and
regression losses. Unlike other tasks, Semantic Textual Similarity (STS) and Natural Language Inference
(NLI) greatly accelerate similarity evaluations. However, there are limitations, including the tradeof
between eficiency and accuracy, since cross-encoders are better for some use cases. Regarding limited
generalizability, as Sentence-BERT suggests, it must be fine-tuned on a narrower dataset. Thereby, it is
not as generalizable to diferent NLP tasks. Its performance drops for deeply interactive problems (e.g.,
question answering) and requires large amounts of high quality labeled data, severely restricting its use
in low-resource settings. These disadvantages aside, Sentence-BERT works well for semantic similarity
and information retrieval tasks.
      </p>
      <p>
        Unsupervised learning and contrastive learning have been combined to propose a two-stage
multilingual job recommendation system [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to improve semantic similarity across 11 languages. An
encoder (such as XLM-RoBERTa) is pretrained on multilingual data through Doc2Vec to align job
titles with skill-embeddings. Then, contrastive fine-tuning uses positive and negative job pairs from
European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy dataset to optimize
the embeddings, thereby improving cross-lingual alignment. The approach attains 4.3% mAP above
monolingual BERT on English and excellent cross-lingual accuracy (e.g., Chinese-to-English mAP rises
from 0.04 to 0.72). Nevertheless, it comes with certain drawbacks such as reliance on the noisy skill
data, exclusion of Asian languages in the ESCO, absence of contextual job descriptions, and very high
computational cost of the transformer models.
      </p>
      <p>VacancySBERT [11], a Siamese network leveraging BERT has been used to normalize job titles
by matching them with skills from descriptions, using distant supervision on 50M title-description
pairs and evaluated on 33K manually annotated triplets. It uses a custom [SKILL] token to sum skill
embeddings and Multiple Negative Ranking Loss to align titles with their associated skills, resulting in
21.5% improved recall by including skills. Despite being efective, it has several limitations, including
that the test data is biased towards specific niches (”Professional”), it works exclusively on
Englishlanguage applications, comes with a proprietary skill-extraction algorithm, it uses BERT-base instead
of larger and better models to optimize processing, and that poor negative sampling would leave out
existing some relevant candidates.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>EN ES DE</p>
      <p>ESCO
Database
INPUT
Input</p>
      <p>Dataset</p>
      <p>SENTENCE TRANSFORMER MODEL ARCHITECTURE
Job title 1</p>
      <p>BERT</p>
      <p>Pooling
Job title 2</p>
      <p>BERT</p>
      <p>Pooling
u
v</p>
      <p>Missing Value
Identification
abc ? ghi
def ghi ?
? jkl mno
cosine
similarity
(u,v)</p>
      <p>DATA PREPROCESSING</p>
      <p>Text</p>
      <p>Normalization
SCIENTIST
ARTIST
scientist
artist</p>
      <p>Special
Characters
Identification
scientist
$ci€nti$t</p>
      <p>OUTPUT</p>
      <p>Ranking the job titles
(dsattaatissctiiceinatnis)t, 0.94 1
(ddaattaaesncgieinnetiesrt), 0.86 2
ne(dtwaotarksceinegnitnisete,r) 0.32 3</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>We have used the training data and validation data from the TalentCLEF 2025 Task A [12] for this
work. The data is provided in three languages: English, Spanish, and German. For each language the
training data file, is a tab-delimited CSV file with four columns: family_id, id, jobtitle_1, jobtitle_2.
The family_id column represents the ISCO family ID, which groups job titles under standardized
occupational categories. The id column denotes the ESCO identifier, which traces the origin of each job
title pair. A semantically similar or related occupation to jobtitle_1 is represented by jobtitle_2, while
pairs of related job titles are stored in the jobtitle_1 and jobtitle_2 columns. The training data with
English language contains 28880 rows, Spanish language contains 20724 rows, and German language
contains 23023 rows.</p>
        <p>The validation data for each language is structured into three separate files: queries, corpus_elements,
and q_rels. The queries file contains 2 columns: q_id, and jobtitle, where q_id is a unique identifier
for each query and jobtitle specifies the job title used as the query. The corpus elements file contains
2 columns: c_id, and jobtitle, where c_id is a unique identifier for each corpus element and jobtitle
represents the job title in the corpus. Finally, the q_rels file defines the relationship between the queries
and the corpus_elements files. It contains 4 columns: q_id, iter, c_id, relevance, where q_id is query
identifier, iter is a reserved field which is always set to 0, c_id is corresponding corpus element identifier,
and relevance is a binary score which indicates the relevance of the corpus element to the query, where 1
means relevant and 0 means non-relevant. The q_rels file serves as the ground truth values for evaluating
the model’s performance, providing the expected relevance labels for each query-corpus jobtitle pair.
For English language queries file contains 105 rows, corpus_elements file contains 2619 rows, and
q_rels file contains 2419 rows. For Spanish language queries file contains 185 rows, corpus_elements
ifle contains 4661 rows, and q_rels file contains 7578 rows. For German language queries file contains
203 rows, corpus_elements file contains 4729 rows, and q_rels file contains 8416 rows.</p>
        <p>Since the training data contains ESCO URLs in the family_id column for each language, we have
used an ESCO dataset (version 1.2.0) [13] as additional dataset, to extract more information about each
job title, which will help to train the model on more data. The Occupations.csv file from ESCO dataset
for each language that is English, Spanish, and German are used as new training dataset. For each
language Occupations.csv contains 3039 rows and 14 columns. We have used two columns named
"preferredLabel" (contains unique job titles) and "altLabels" (contains similar job titles) from each
language Occupations.csv file for training the model.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Preprocessing</title>
        <p>First, the training dataset (from ESCO) is checked for missing values for each language, resulting in
missing similar job titles for 28 unique job titles in English, 143 in Spanish and 28 in German. Hence, all
these missing rows are deleted from the training dataset for each language. After that, the unique job
titles are extracted. There are a total of 3011 unique job titles in English, 2896 in Spanish and 3011 in
German.</p>
        <p>For each unique job title, a set of similar job titles is created, which also includes the primary job title
in that set. This ensured that the model could learn the relationship between the primary job title and
its closely related titles.</p>
        <p>After generating set of similar job title, each job title in the set is converted into lowercase text. Next,
leading and trailing whitespace are removed. Along with that presence of punctuation and special
characters (like !,@,*, etc.) are checked, if found that are also removed.</p>
        <p>For each language, all possible job title pairs are generated within each set of similar job titles.
For example, in English dataset, if a set contained Software Engineer, Developer, Programmer, the
possible pairs are created as: (’Software Engineer’, ’Developer’), (’Software Engineer’, ’Programmer’),
(’Developer’, ’Programmer’). Once all pairs are generated, they are randomly shufled. Shufling is
performed to eliminate any order bias, ensuring that the model should be trained on a diverse and
randomized set of pairs. Hence, 221889 unique job title pairs are generated in English language, 45647
in Spanish language, and 67818 in German language. After this, all unique job title pairs from each
language are combined and randomly shufled twice, resulting in total 335354 job title pairs.</p>
        <p>Finally, 335354 job title pairs are formatted into the InputExample, which is usable by the
SentenceTransformer model. For each pair, We have created an InputExample object with a list of two texts
(job titles). The format is essential and is used in the SentenceTransformer framework to generate
embeddings and train the model for similarity learning. Figure 2 shows the creation of multilingual
training dataset.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Sentence Transformer Model</title>
        <p>This particular model is based on BERT (Bidirectional Encoder Representations from Transformers),
which was created specifically to provide dense representations of words or small pieces text. BERT
is mainly for getting token-level and contextual word representations, but Sentence Transformers
provides meaningful sentence embeddings. These can be used downstream in applications including
semantic similarity tasks and clustering or information retrieval. Reimers and Gurevych (2019) proposed
an approach targeting DIF between BERT and other methodology such as BERT for the scoring of
near identical sentence pair. BERT requires pairwise input comparisons, making it computationally
expensive. Sentence Transformer can encode each sentence independently, allowing for eficient
similarity calculations using cosine similarity or Euclidean distance.</p>
        <p>The architecture of Sentence Transformer is based on pretrained BERT, but it includes an additional
pooling layer to derive fixed-size embeddings from token level representations. In standard transformer
models, the output consists of contextualized word embeddings for each token in the input sequence. A
pooling operation is applied for sentence level tasks to aggregate information into a single vector. The
most commonly used pooling strategies include mean-pooling (averaging all token embeddings), [CLS]
token representation, and max-pooling. Among these, mean-pooling is frequently used as it efectively
captures global sentence semantics.</p>
        <p>Compared to BERT, Sentence Transformer is optimized for similarity-based tasks through the use
of contrastive learning objectives. Instead of fine-tuning BERT using next sentence prediction (NSP)
or masked language modeling (MLM), Sentence Transformer is trained using ranking losses, such as
Multiple Negatives Ranking Loss and Multiple Negatives Symmetric Ranking Loss, which improve the
model’s ability to diferentiate between similar and dissimilar text pairs.</p>
        <p>The paraphrase-multilingual-mpnet-base-v2 model is a pretrained Sentence Transformer based on
Microsoft’s MPNet (Masked and Permuted Pre-training) architecture, which is fine-tuned for
multilingual sentence similarity tasks. It generates 768-dimensional embeddings and it is trained on paraphrase
pairs across more than 50 languages, which makes it exceptionally powerful for multilingual semantic
understanding. This model captures very deep semantic relationships between languages, allowing
for precise comparison of sentence meaning despite language disparity. It is particularly beneficial for
multilingual use cases like multilingual information retrieval and semantic search. The paraphrase
multilingual-mpnet-basev2 is selected for its best performance for multilingual sentence embeddings, a
key requirement whenever job titles or job descriptions contain diferent languages yet similar meaning.</p>
        <p>For the loss function, Cached Multiple Negatives Ranking Loss (Cached MNRL) is employed. This
is a sophisticated version of the basic Multiple Negatives Ranking Loss (MNRL), which is intended
to enhance training eficiency and representation quality in Sentence Transformer models for text
similarity tasks. Like MNRL, it considers all other positive pairs within a batch as implicit negatives,
avoiding the need for explicitly labeled negative samples. But Cached MNRL adds a cache of past batch
embeddings to the memory, enabling the model to match current positive pairs with a more extensive
and varied set of negatives. This greatly boosts the number of hard negatives, and this facilitates
the model to better detect fine-grained semantic diferences. Cached MNRL is particularly useful in
large-scale or domain-specific applications—like job title matching—where it’s important to detect
ifne-grained semantic diferences.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Hyperparameter Setting</title>
        <p>In the training phase, the Sentence Transformer model is trained using the below set of hyperparameters
to optimize performance for the job title matching task.</p>
        <p>The training was performed using the paraphrase-multilingual-mpnet-base-v2 model with the Cached
Multiple Negatives Ranking Loss. The model was fine-tuned on a GPU-accelerated environment in order
to improve training speed. Important hyperparameters were properly chosen to maximize performance.
The number of training epochs was 1, and the batch size was 128 per device to utilize GPU memory
eficiently without sacrificing negative diversity. A learning rate of 2e-5 was utilized as per standard
finetuning practices for transformer-based models. A warmup ratio of 0.1 was also used to slowly increase
the learning rate, minimizing the chances of instability at the beginning of training. Mixed precision
training was also activated to save memory and accelerate computations. Training occurred in epoch
strategy mode for saving model checkpoints and logging. In addition, we employed a no-duplicates
batch sampling strategy to ensure that each batch contains unique samples, which is beneficial for
contrastive learning methods that depend on in-batch negatives. Figure 3 illustrates the fine-tuning of
the paraphrase-multilingual-mpnet-base-v2 Sentence Transformer model using the previously created
multilingual training dataset.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Similarity Score Computation</title>
        <p>Once the Sentence Transformer model is trained on all job title pairs, for each language, it generates
embeddings for each job title in queries and corpus element files. Embeddings are dense vector
representations that capture the semantic meaning of the job titles in a high-dimensional space. To
measure the similarity between the queries and the corpus job titles, cosine similarity is used.</p>
        <p>Cosine similarity computes similarity by measuring the angle between two vectors in a
multidimensional space. The formula to calculate cosine similarity between two vectors A and B is given in
Equation (1) below:
cosine_similarity(, ) =</p>
        <p>· 
‖‖ × ‖ ‖</p>
        <p>For each language in validation data, each job title in the query file is compared against all job titles
in the corpus element file, and the corresponding cosine similarity scores are calculated and stored.
Job title pairs with higher cosine similarity scores indicate stronger semantic similarity, while lower
similarity scores reflect dissimilar job title pairs.</p>
        <p>After calculating similarity for all query-corpus job title pairs, for each query job title, corpus element
job titles are ranked based on their similarity scores in descending order, that is job title with highest
(1)
similarity score ranked as 1. Figure 4 illustrates the Similarity computation of job titles using fine-tuned
model and ranking the job titles.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>4.1. System Configuration</title>
        <p>This section discusses the results obtained after evaluating the fine-tuned model on the validation data.
We have used the Google Colab environment to train the model and to evaluate its performance. This
online environment provided us with an Nvidia Tesla T4 GPU having 15GB of Video RAM. This GPU is
better suited for performing deep learning tasks and running inference models. In addition, the CPU
provided was Intel Xeon processor clocked at 2.20 GHz and a cache size of 54.96 MB. This CPU supports
multi-threading, which enabled faster data processing and computation. Further, the system had 15GB
of on-board RAM which was suficient for us to load and work with the dataset.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Information retrieval metrics</title>
        <p>The following information retrieval metrics are used to observe the retrieval performance of the model.
1. Mean Average Precision (MAP): MAP is the mean of the average precision (AP) scores across
all queries. It tells how well the model ranks relevant items across multiple queries, rewarding
relevant results appearing higher in the ranking. MAP is computed using Equation (2) as given
below:</p>
        <sec id="sec-4-2-1">
          <title>Where:</title>
          <p>•  is the total number of queries.</p>
          <p>• AP is the average precision for query .</p>
          <p>Average Precision for a query is computed using Equation (3) as follows:</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Where:</title>
          <p>• || is the number of relevant items for query .</p>
          <p>MAP = 1 ∑︁ AP</p>
          <p>=1
AP =</p>
          <p>1 ∑︁  () · rel()
|| =1
(2)
(3)
•  () is the precision at rank .</p>
          <p>• () = 1 it item at rank  is relevant else 0.</p>
          <p>Higher values mean relevant results are being ranked closer to the top — and consistently so
across queries.
2. Mean Reciprocal Rank (MRR): MRR measures how early the first relevant result appears in the
ranking for each query, then averages that over all queries. MRR is computed using Equation (4)
given below:

MRR = 1 ∑︁</p>
          <p>1
 =1 rank
Where:
•  is the rank position of the first relevant item for query .</p>
          <p>•  is the total number of queries.</p>
          <p>If the system often gets correct answer right at the top (rank 1), MRR will be high. It strongly
penalizes late-ranked relevant results.
3. Precision@K: Precision@K is the proportion of relevant items in the top K retrieved results for
a query. It is calculated as shown in Equation (5):</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Number of relevant items in top K</title>
          <p>(4)
(5)</p>
          <p>To compare the performance of paraphrase-multilingual-mpnet-base-v2, the
paraphrase-multilingualMiniLM-L12-v2 model was also fine-tuned using the same approach. Table 1 presents the
performance metrics for both models across three languages: English, Spanish, and German. The fine-tuned
paraphrase-multilingual-mpnet-base-v2 model outperforms the
paraphrase-multilingual-MiniLM-L12v2 in all evaluated metrics. On the validation data, the average MAP across all three languages is 0.49 for
the paraphrase-multilingual-mpnet-base-v2 model, compared to 0.44 for the
paraphrase-multilingualMiniLM-L12-v2 model, demonstrating the superior ranking ability of the model. On the test data, the
ifne-tuned paraphrase-multilingual-mpnet-base-v2 model achieves an average MAP of 0.45 across all
three languages.</p>
          <p>The MRR values show that the paraphrase-multilingual-mpnet-base-v2 model consistently returns
more relevant results at higher ranks, with an MRR of 0.8004 in English compared to 0.8111 for
paraphrase-multilingual-MiniLM-L12-v2, and improved results in Spanish and German as well. While
paraphrase-multilingual-MiniLM-L12-v2 slightly outperforms paraphrase-multilingual-mpnet-base-v2
in MRR for English, paraphrase-multilingual-mpnet-base-v2 shows stronger performance overall when
considering the complete set of metrics.</p>
          <p>This trend is consistent across the other evaluation metrics, indicating that the
paraphrasemultilingual-mpnet-base-v2 model provides better semantic understanding and retrieval performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Scope</title>
      <p>In this work, a Sentence Transformer model named paraphrase-multilingual-mpnet-base-v2 is used
to create a multilingual job title matching system. To ensure that the model captures the semantic
relationships between job titles across diferent languages, it is fine-tuned using a similar job title
pairs dataset constructed from the ESCO dataset for English, Spanish and German. With high-quality
embeddings produced by the fine-tuned model for job titles, it enables precise similarity computation
and efective retrieval in a multilingual context. The performance of the model is measured by standard
information retrieval metrics, such as MAP, MRR, and Precision@K. Training was performed using
Cached Multiple Negatives Ranking Loss with a batch size of 128, which allowed the model to learn from a
larger set of implicit negatives. The overall average MAP over all three languages is 0.49 on the validation
data and 0.45 on the test data, reflecting the model’s generalization capacity in multilingual settings.
Compared to the paraphrase-multilingual-MiniLM-L12-v2 baseline, the
paraphrase-multilingual-mpnetbase-v2 model consistently achieved higher scores across most metrics, confirming its robustness in
semantic matching tasks.</p>
      <p>Future work can focus on adding domain-specific job title data, expanding to more languages, and
experimenting with diferent contrastive learning losses to enhance representation quality. Furthermore,
adding external labor market databases could increase retrieval accuracy and increase the model’s
industry adaptability.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>[11] M. Bocharova, E. Malakhov, V. Mezhuyev, Vacancysbert: the approach for representation of titles
and skills for semantic similarity search in the recruitment domain, arXiv preprint arXiv:2307.16638
(2023).
[12] L. Gascó, F. M. Hermenegildo, G.-S. Laura, D. C. Daniel, P. Estrella, R. Alvaro, Z. Rabih, Talentclef
2025 corpus: Skill and job title intelligence for human capital management, 2025. URL: https:
//doi.org/10.5281/zenodo.15038364.
[13] Esco– european skills, competences, qualifications, and occupations, european union, 2024. URL:
https://esco.ec.europa.eu/en/use-esco/download, (Accessed on 25 February, 2025).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Böhm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Linnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kohl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Teetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bandurka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <article-title>Analysing gender bias in it job postings: A pre-study based on samples from the german job market</article-title>
          ,
          <source>in: Proceedings of the 2020 Computers and People Research Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tasheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpovich</surname>
          </string-name>
          ,
          <article-title>Transformation of recruitment process through implementation of ai solutions</article-title>
          ,
          <source>Journal of Management and Economics</source>
          <volume>4</volume>
          (
          <year>2024</year>
          )
          <fpage>12</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nützi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Schwegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staubli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Trezzini</surname>
          </string-name>
          ,
          <article-title>Factors, assessments and interventions related to job matching in the vocational rehabilitation of persons with spinal cord injury</article-title>
          ,
          <source>Work</source>
          <volume>64</volume>
          (
          <year>2019</year>
          )
          <fpage>117</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <article-title>Structural change in the job matching process in the united states</article-title>
          ,
          <source>1923-1932, European Review of Economic History</source>
          <volume>26</volume>
          (
          <year>2022</year>
          )
          <fpage>107</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pires</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schlinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garrette</surname>
          </string-name>
          ,
          <article-title>How multilingual is multilingual bert?</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>01502</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>02116</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Estrella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodrigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Overview of the TalentCLEF 2025 Shared Task: Skill and Job Title Intelligence for Human Capital Management, in: International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Retyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Combined unsupervised and contrastive learning for multilingual job recommendation (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>