<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Job Title Matching with MPNet-Based Sentence Transformers⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adam Brikman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Sana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Holden Ruegger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>North Ave NW, Atlanta, GA 30332</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We compare a pretrained multilingual sentence transformer to a fine-tuned variant for the TalentCLEF 2025 competition, which focuses on retrieving semantically similar job titles in a target language given a sourcelanguage query. Our baseline model is MPNet, a transformer with 278 million parameters. Fine-tuning was performed using Multiple Negatives Ranking Loss (MNRL) on in-domain monolingual job title pairs. The resulting model achieved a mean average precision (MAP) score of 0.360, placing 32nd on the public leaderboard. All code is publicly available at https://github.com/dsgt-kaggle-clef/talentclef-2025.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NLP</kwd>
        <kwd>Human Capital Management</kwd>
        <kwd>Multilinguality</kwd>
        <kwd>Cross-lingual Capability</kwd>
        <kwd>Job Title Ranking</kwd>
        <kwd>MPNet</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Overview</title>
      <p>
        To address the task of multilingual job title retrieval, we rely on MPNet, a transformer-based sentence
encoder known for its ability to generate semantically meaningful embeddings across languages [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
Our system encodes each job title into a dense vector representation, enabling eficient similarity-based
retrieval across both monolingual and cross-lingual settings. Below, we describe the MPNet model and
embedding process, followed by our rationale for using both pretrained and fine-tuned variants tailored
to specific subtasks.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Sentence Transformer Model</title>
        <p>
          MPNet (Masked and Permuted Pre-training Network) is a multilingual transformer encoder that builds
on BERT and XLNet, integrating both masked language modeling and permuted language modeling
objectives [
          <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
          ]. This hybrid pretraining strategy allows MPNet to capture deep semantic relationships
across diferent languages, making it well suited for paraphrase mining and multilingual retrieval tasks.
        </p>
        <p>Each job title is tokenized using MPNet’s internal word-piece tokenizer, which segments text into
subword units. A [CLS] token is prepended to the input and is used by the model to generate a global
sentence-level embedding. This [CLS] vector, with a dimensionality of 768, is extracted from the final
hidden layer and serves as the job title’s semantic representation.</p>
        <p>The model was pretrained using contrastive learning on a large-scale multilingual corpus. In this
setting, positive sentence pairs are pulled closer in the embedding space, while negative pairs are
pushed apart. This training objective helps preserve alignment across languages, enabling zero-shot
and few-shot generalization in cross-lingual retrieval scenarios.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model Variants</title>
        <p>
          As outlined in Figure 1, our final system incorporates two MPNet variants: a pretrained model, and
a fine-tuned version adapted to the job title dataset. The fine-tuning process updates all parameters,
not just the classification head, using a contrastive loss objective on provided English, Spanish, and
German job title pairs [
          <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
          ].
        </p>
        <p>Each variant was deployed strategically: the pretrained model was used for cross-lingual retrieval
tasks (EN to ES and EN to DE), where maintaining robust multilingual alignment was critical. The
ifne-tuned model was reserved for monolingual subtasks (EN to EN, ES to ES, DE to DE), where
domain-specific adaptation was expected to yield performance gains.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        Our implementation leveraged a lightweight pipeline utilizing pretrained multilingual sentence
embeddings for cross-lingual job title retrieval and a fine-tuned model for monolingual retrieval [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Training
was limited to the available positive job title pairs on the English, German, and Spanish training sets,
while evaluation was performed on the validation sets. The following subsections outline our data
handling, fine-tuning, training strategy, and evaluation methodology.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Data Handling and Representation</title>
        <p>
          During training, we extracted positive sentence pairs from the provided TSV files in the English, Spanish,
and German training sets. Each file consisted of European Skills, Competences, Qualifications and
Occupations (ESCO) job ID numbers, associated URLs, and job title pairs representing semantically
similar occupations [
          <xref ref-type="bibr" rid="ref1 ref6 ref7">1, 6, 7</xref>
          ].
        </p>
        <p>Job titles were provided in a consistently formatted manner, free of punctuation and in all lower-case
characters. As a result, preprocessing such as stopword filtering, punctuation removal, or lowercasing
was not implemented. All text was passed directly into the sentence transformer model, which natively
handled tokenization, special tokens, and padding for varying job title length.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture</title>
        <p>
          We used the Multilingual MPNet Base V2 model as the foundation for both our pretrained and fine-tuned
variants [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The transformer model contains 278 million parameters and produces 768-dimensional
embeddings, based on the sentence transformers implementation built on top of MPNet [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We selected
MPNet Base V2 for its strong cross-lingual performance, earning scores of approximately 0.84 for EN-ES
and EN-DE in both Pearson and Spearman correlations on sentence similarity benchmarks [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Both
metrics are commonplace in evaluating semantic alignment, evaluating how well a model captures
meaning relationships between languages [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We selected this model for its strong performance on
cross-lingual tasks and accessibility through the sentence transformer framework.
        </p>
        <p>To address both monolingual and cross-lingual retrieval scenarios, we used two variants of the model.
The pretrained variant was used for cross-lingual tasks (EN-ES and EN-DE) while the fine-tuned variant
was optimized for monolingual retrieval (EN-EN, ES-ES, and DE-DE) by training on positive job title
pairs from the English, Spanish, and German training sets.</p>
        <p>
          While fine-tuning improved monolingual retrieval on the validation set, we observed impaired
crosslingual performance, which we attribute to overfitting and loss of multilingual generalization due to
the limited linguistic and contextual diversity of the training set [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training Strategy</title>
        <p>
          We fine-tuned the MPNet Base V2 model using a Multiple Negatives Ranking Loss (MNRL) function,
which is a contrastive objective that treats all other samples in the batch as implicit negatives [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The
MNRL function is suitable for tasks involving only positive pairs as it assumes all other pairs in the
batch are unrelated to the anchor. This strategy is efective for Task A since it does not require negative
cases and only positive pairs are given in the training set [
          <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
          ].
        </p>
        <p>
          The model was trained over 5 epochs with a batch size of 32. We used a learning rate of 1e-6, epsilon
of 2e-5, and a weight decay of 0.01. These hyperparameters were selected to minimize the risk of
overfitting and catastrophic forgetting [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Warmup steps were set to 10% of the total training steps
and the best-performing model (based on training loss) was automatically saved.
        </p>
        <p>No random seed was set during training, so minor variations may occur between runs. As will be
noted in the future work section, reproducibility can be improved by establishing a random seed at the
beginning of the implementation.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Metrics</title>
        <p>
          Our model’s performance was evaluated on the provided validation set using Mean Average Precision
(MAP) and Mean Reciprocal Rank (MRR), which were the primary metrics used in the leaderboard for
this task [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Job title embeddings were compared using cosine similarity and titles were ranked based
on their similarity to each query embedding.
        </p>
        <p>Evaluation was performed on monolingual (EN-EN, ES-ES, DE-DE, ZH-ZH) and cross-lingual (EN-ES,
EN-DE) scenarios. Chinese (ZH-ZH) was included in the evaluation despite the absence of training
data, which allowed for the assessment of the fine-tuned model’s zero-shot multilingual generalization.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section presents our model’s performance across monolingual and cross-lingual retrieval tasks, as
evaluated on the TalentCLEF test set. As seen in Table 1, our model achieved an average MAP of 0.360
on monolingual retrieval tasks, ranking 32nd on the public leaderboard. Performance was strongest
on EN-EN (0.408), while lower scores were observed on ES-ES (0.348) and DE-DE (0.324). Notably,
the model achieved a MAP of 0.380 on the ZH-ZH test set despite no fine-tuning on Chinese data,
suggesting a level of zero-shot generalization being preserved from multilingual pretraining.</p>
      <p>Table 2 summarizes our model’s performance on the cross-lingual objectives (EN-ES and EN-DE).
Despite utilizing the pretrained multilingual MPNet model, we observed near-zero MAP scores (0.023 for
EN-ES and 0.019 for EN-DE). Upon further review, we identified a language mismatch in the inference
pipeline: for both EN-ES and EN-DE evaluations, job titles were mistakenly embedded in English, rather
than in Spanish or German, as required. This resulted in invalid similarity comparisons between queries
and unrelated job embeddings, leading to a collapse in retrieval performance. Future work will correct
this by ensuring proper language-specific embedding and alignment during cross-lingual evaluation.</p>
      <p>These findings highlight the importance of aligning evaluation pipelines with model expectations,
particularly for cross-lingual objectives. While monolingual retrieval remained reasonably efective,
the collapse in cross-lingual performance illustrates the fragility of multilingual generalization when
inputs are misaligned or when training data lacks suficient linguistic diversity. These trade-ofs form
the basis for further reflection in our Discussion section.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Our findings challenge the conventional wisdom that task-specific fine-tuning improves retrieval
performance. Despite training on in-domain monolingual data, the fine-tuned MPNet model consistently
underperformed the pretrained model on both monolingual and cross-lingual objectives. Fine-tuning
appears to have disrupted the model’s multilingual alignment and impaired generalization. Notably
for cross-lingual tasks, during test evaluation, we identified an inference-time bug that caused all job
title embeddings, regardless of language, to be generated in English. This mismatch led to invalid
query-target comparisons and a collapse in similarity scores for EN-ES and EN-DE. As illustrated in</p>
      <p>Despite this, the pretrained model demonstrated promising robustness in the ZH-ZH setting, where it
had no prior exposure to Chinese. This underscores the strength of large-scale multilingual pretraining
for zero-shot generalization and raises important questions about when and how fine-tuning should be
applied in multilingual contexts.</p>
      <p>
        Fine-tuning on limited, single-language data introduced harmful overfitting while failing to deliver
expected improvements, even on languages seen during training. In contrast, the unmodified pretrained
model proved more consistent and robust across both seen and unseen languages. Prior work has shown
that catastrophic forgetting [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], distortion of pretrained features [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and the erosion of multilingual
alignment during fine-tuning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are common risks when adapting pretrained models. Techniques
such as language-specific regularization, contrastive alignment objectives [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or parameter-eficient
tuning strategies like adapter modules [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have been proposed to mitigate these efects and preserve
the benefits of multilingual pretraining.
      </p>
      <p>
        Our findings echo concerns raised in recent work [
        <xref ref-type="bibr" rid="ref10 ref5 ref9">5, 9, 10</xref>
        ] that, without explicit safeguards,
finetuning multilingual transformers on narrow monolingual datasets can degrade performance by distorting
the pretrained embedding space, even in-language.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>Future work should include a broader evaluation of multilingual models beyond MPNet to
determine whether alternative architectures ofer improved retrieval performance. One such candidate is
ESCOXLM-R, a model specifically trained on job market data and designed for multilingual
representation learning [12]. To enhance reproducibility, the fine-tuning process should also be repeated
with a fixed random seed—an element omitted here due to time constraints. Further research should
explore bias-controlled training approaches that reduce performance disparities across gendered job
titles. Finally, re-generating the job title embeddings as outlined in the preceding section may ofer a
straightforward path to performance improvement.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this work, we fine-tuned a multilingual MPNet transformer encoder to retrieve semantically similar
job titles in a target language, given a source-language query. However, the fine-tuned model
underperformed relative to the pretrained baseline, yielding lower MAP scores across all evaluated
language pairs. The final submission achieved an average MAP score of 0.360, placing 32nd on
the public leaderboard of the TalentCLEF 2025 competition. Future improvements could involve
exploring pretrained models specifically developed for the human resources domain, as well as
incorporating enhanced language-specific embeddings for job titles. Our code is publicly available at
https://github.com/dsgt-kaggle-clef/talentclef-2025</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We want to thank the Data Science at Georgia Tech (DS@GT)-CLEF group for cloud infrastructure and
their support, and the organizers of TalentCLEF for hosting the competition.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT (GPT-4) by OpenAI to assist with
grammar, spelling, and phrasing. After using this tool, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content.
[12] M. Zhang, R. van der Goot, B. Plank, ESCOXLM-R: Multilingual taxonomy-driven pre-training for
the job market domain, 2023. URL: https://aclanthology.org/2023.acl-long.662.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Estrella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodrigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management, in: International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , T.-Y. Liu,
          <article-title>Mpnet: Masked and permuted pre-training for language understanding</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .09297. arXiv:
          <year>2004</year>
          .09297.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084. arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell, R. Salakhutdinov,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>08237</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Making monolingual sentence embeddings multilingual using knowledge distillation, 2020</article-title>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .09813. arXiv:
          <year>2004</year>
          .09813.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          , ESCO - European
          <string-name>
            <surname>Skills</surname>
          </string-name>
          , Competences, Qualifications and Occupations,
          <year>2024</year>
          . URL: https://esco.ec.europa.eu/en, accessed:
          <fpage>2024</fpage>
          -05-25.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gascó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Cerpa</surname>
          </string-name>
          , P. Estrella, Á. Rodrigo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Talentclef 2025 corpus: Skill and job title intelligence for human capital management, 2025</article-title>
          . URL: https: //doi.org/10.5281/zenodo.15292308. doi:
          <volume>10</volume>
          .5281/zenodo.15292308.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] AIDA-UPM, mstsb-paraphrase-multilingual-mpnet-base-</article-title>
          <string-name>
            <surname>v2</surname>
          </string-name>
          , https://huggingface.co/AIDA-UPM/
          <article-title>mstsb-paraphrase-multilingual-mpnet-base-</article-title>
          <string-name>
            <surname>v2</surname>
          </string-name>
          ,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -05-25.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotha</surname>
          </string-name>
          , J. M. Springer, A. Raghunathan,
          <article-title>Understanding catastrophic forgetting in language models via implicit inference</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2309.10105. arXiv:
          <volume>2309</volume>
          .
          <fpage>10105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raghunathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Ma, P. Liang,
          <article-title>Fine-tuning can distort pretrained features and underperform out-of-</article-title>
          <string-name>
            <surname>distribution</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2202.10054. arXiv:
          <volume>2202</volume>
          .
          <fpage>10054</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giurgiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jastrzebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Morrone</surname>
          </string-name>
          , Q. de Laroussilhe,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gesmundo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Attariyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          ,
          <article-title>Parameter-eficient transfer learning for nlp</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1902</year>
          .00751. arXiv:
          <year>1902</year>
          .00751.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>