<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Retrieval to Ranking: A Two-Stage Neural Framework for Automated Skill Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksander Bielinski</string-name>
          <email>a.bielinski@napier.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Brazier</string-name>
          <email>d.brazier@napier.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Skill Extraction, Multi-Stage Information Retrieval, Contrastive Learning, ESCO</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Edinburgh Napier University, School of Computing, Engineering &amp; The Built Environment</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Automated skill extraction from job postings is crucial for understanding labour market dynamics, but current approaches struggle to balance retrieval eficiency with ranking accuracy. Most existing methods focus on either dense retrieval for candidate generation or multi-label classification, failing to leverage the complementary strengths of both paradigms. While recent work has begun exploring retrieve-and-rank pipelines for skill extraction using Large Language Models (LLMs) for ranking, we propose training dedicated neural models for both retrieval and ranking stages. In our two-stage approach, the bi-encoder eficiently retrieves skill candidates, while the cross-encoder provides precise ranking using focal loss optimisation. We evaluate both stages separately on publicly available datasets. Our bi-encoder achieves up to 4.78 percentage points improvement in RP@5 over existing baselines, while our cross-encoder demonstrates up to 30.54 percentage points improvement in micro-F1 compared to LLM-based ranking methods. Additionally, our bi-encoder shows strong zero-shot performance on held-out skills. The framework leverages public datasets and freely available skill taxonomies like ESCO, promoting scalable and reproducible skill extraction. We release our code and configurations to encourage further research, available at https://github.com/AleksanderB-hub/Multi-Stage-Pipeline-Skill-Extraction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Fuelled by technological developments and societal changes,
today’s labour market transforms dynamically, making the
assessment of job market demand an increasingly
challenging task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. With the European Skills, Competences,
Qualiifcations and Occupations (ESCO) taxonomy alone
containing nearly 14,000 distinct skills, and millions of job postings
published daily across various platforms, the need for
automated skill extraction has never been more critical. Skill
extraction plays a pivotal role in this task as it allows for
the extraction of competencies from available data (e.g.,
resumes, job postings) and mapping them to a standardised
taxonomy. This enables HR professionals and
policymakers to better understand current market trends and support
workforce planning, ensuring the eficient functioning of
labour markets. The growing importance of such systems
is evidenced by the recent surge in research on automated
skill extraction [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        The task of skill extraction from job postings presents
unique challenges that distinguish it from traditional text
classification. First, skills are often mentioned implicitly
rather than explicitly [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This implicit nature renders
simple keyword matching approaches inefective. Second, the
diverse vocabulary used across industries and regions means
that even accurately extracted skills must be normalised to
a standardised taxonomy like ESCO to enable meaningful
analysis and comparison across markets and time periods.
      </p>
      <p>
        The main problem with developing skill extraction
systems is the scarcity of real-life annotation data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This was
partially addressed by the creation of artificially generated
job posting data, which is later used to train the models
for automated skill extraction. However, these artificial
datasets often fail to capture the full complexity and
linguisRecSys in HR’25: The 5th Workshop on Recommender Systems for Human
Resources, in conjunction with the 19th ACM Conference on Recommender
†Corresponding author.
(D. Brazier)
(D. Brazier)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
tic variety of real-world job postings. With this creating a
potential gap between training and deployment scenarios,
it is necessary to carefully balance the use of synthetic
training data with limited real-world resources.
      </p>
      <p>Beyond data availability, existing approaches to skill
extraction face architectural limitations that constrain their
efectiveness. Current approaches either prioritise the
direct skill classification or relevant candidate retrieval (dense
retrieval), where each query (job description sentence) is
provided with a list of relevant documents (matching skills).
The issue with the standard classification is that it is
limited to the data it was trained on, consequently impairing
the generalisability of such solutions. This is particularly
problematic due to the constantly evolving skill space.
Conversely, dense retrieval approaches frame skill extraction as
a similarity search problem. By searching for similar skills
from the entire taxonomy, they often return numerous
irrelevant candidates, making accurate skill profile extraction
challenging.</p>
      <p>
        Recent advances in Information Retrieval (IR) suggest
that combining dense retrieval with ranking capability
offers significant improvements over single-stage systems
[
        <xref ref-type="bibr" rid="ref38 ref6 ref7">6, 7</xref>
        ]. These architectures combine a dense retriever (e.g.,
bi-encoder) with a ranking model (e.g., cross-encoder),
leveraging the complementary strengths of these two methods.
Given that skill extraction requires the retrieval of relevant
skills from large taxonomies, these two-stage architectures
present a natural fit. However, while Clavié and Soulié [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and D’Oosterlinck et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] recently showed that Large
Language Models (LLMs) rankers could improve skill extraction,
the potential of training dedicated neural architectures for
both stages remains unexplored. This represents a
significant gap, as purpose-built rankers can ofer better
performance and eficiency than general-purpose LLMs.
      </p>
      <p>We propose a novel two-stage neural architecture that
adapts successful IR practices to the unique requirements
of skill extraction. Our approach combines a bi-encoder for
eficient candidate retrieval from large skill taxonomies with
a cross-encoder for precise skill identification. In the first
stage, we fine-tune a bi-encoder using a curriculum learning
strategy that leverages freely available ESCO skill
definitions. The model first learns to associate skills with their
CEUR</p>
      <p>ceur-ws.org
canonical definitions before training on synthetic job
posting data. This approach not only improves performance on
seen skills but also enables strong zero-shot generalisation
to skills excluded from training. The bi-encoder retrieves
the top-K most relevant skills for each job description
sentence based on embedding similarity.</p>
      <p>In the second stage, we employ a cross-encoder that ranks
the retrieved candidates using a binary classification
objective. Unlike traditional ranking approaches that reorder
candidates, our cross-encoder makes explicit decisions about
whether each skill is truly relevant to the job description.
Such a design choice aligns with the multi-label nature of
skill extraction, where a given query might describe multiple
or no skills at all. This approach leverages a cross-encoder’s
ability to jointly process query and documents, allowing the
capture of subtle semantic relationships that independent
encoding might miss. Our multi-stage approach is
visualised in Figure 1.</p>
      <p>Comprehensive evaluation across established skill
extraction benchmarks demonstrates the efectiveness of this
two-stage approach. The bi-encoder achieves up to 4.78
percentage points improvement in R-Precision@5 (RP@5)
compared to existing dense retrieval baselines while
maintaining strong retrieval performance on held-out skills in
zero-shot settings. When combined with the cross-encoder,
our complete pipeline achieves F1 scores up to 30.54
percentage points higher than LLM-based ranking methods.
These results validate that carefully designed two-stage
neural architectures can significantly improve skill extraction
while maintaining the eficiency required for practical
deployment.</p>
      <p>Contributions. In summary, our main contributions are:
• A curriculum‑trained dense retriever over taxonomy
labels for candidate skill generation, showcasing
strong zero-shot retrieval capabilities.
• A task‑specific, supervised cross‑encoder ranker for
multi‑label skill extraction, delivering strong
classiifcation performance across public benchmarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Relevant Work</title>
      <sec id="sec-2-1">
        <title>2.1. Skill Extraction</title>
        <p>
          Early approaches to skill extraction were mostly limited
to span-level extraction. The task consisted of retrieval of
relevant fragments from the sentences (job descriptions or
resumes) to train Named Entity Recognition (NER) models
[
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ]. Seeing the rapid advances in LLMs, researchers
demonstrated how generative AI can be leveraged for a
span-level skill extraction [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. Some notable work also
exists using the graph neural networks for context-aware
skill extraction [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. Despite their strong performance,
the main issue with such approaches is the lack of skill
label normalisation. The retrieved spans are not linked to
the standardised taxonomy (e.g., ESCO), making these
techniques less applicable in real-world scenarios. To address
that, authors in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] demonstrate how the identification of
relevant skill spans can aid downstream classification of
competencies, highlighting the complementary nature of
such approaches. Similar techniques were later expanded in
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], where job descriptions are directly matched with skills
in a taxonomy. Another issue lies in reliance on high-quality
annotation data, which is costly to obtain, especially when
considering the necessary involvement of human resource
domain experts [
          <xref ref-type="bibr" rid="ref12 ref2">12, 2</xref>
          ].
        </p>
        <p>
          Large-scale skill and occupation taxonomies ofer a
breadth of potentially useful information for skill extraction
tasks, such as co-dependency of competencies,
hierarchical classification, etc. Building on this, researchers in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
explored the use of ESCO skill labels as weak supervision
signals. Furthermore, work by Decorte et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] showcased
that taxonomy-based weak supervision signals can be
combined with classification models to satisfy skill
normalisation of extracted competencies. The role of such taxonomies
in supporting the performance on downstream tasks has
been further highlighted in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], where information from
ESCO was used as pre-training signals for a skill extraction
model.
        </p>
        <p>
          Understanding the importance of skill normalisation
and challenges around sourcing annotation data, research
shifted towards the generation of artificial job description
data [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]. Partially fuelled by the wealth of information
ofered by ESCO, these works showcased how incorporating
definition (skill description) information into the generating
pipeline increases the quality of the examples. In addition,
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] also shows that definitions can serve as a training signal
on their own. Most recently, Decorte et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] introduced
a novel end-to-end skill extraction architecture, achieving
strong results on skill retrieval benchmarks. Their work
utilises the skill definitions as training signals, further
conifrming the benefits of incorporating taxonomy information
into skill extraction pipelines.
        </p>
        <p>
          In light of this evidence, our work introduces a novel
pre-training phase utilising the skill descriptions and
labels provided by the ESCO taxonomy. This builds on the
reported success of curriculum learning in information
retrieval (IR), where models benefit from training on
progressively complex examples [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Such approaches have shown
promise in dense retrieval tasks [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], where domain-specific
pre-training improves downstream performance. However,
unlike prior work that requires specialised architectures or
training procedures, our curriculum learning strategy
maintains the standard bi-encoder architecture while leveraging
freely available taxonomy data.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Two-Stage Retrieval Architectures</title>
        <p>
          Modern information retrieval has undergone a fundamental
shift from sparse keyword matching to dense neural
representations, revolutionising how systems retrieve and rank
information. While traditional methods like BM25 remain
competitive baselines, neural approaches, particularly
biand cross-encoder architectures, have demonstrated
superior performance across diverse IR tasks [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Nonetheless,
these methods face an inherent trade-of. Bi-encoders
enable eficient retrieval through pre-computed
representations but sacrifice fine-grained query-document interaction,
while cross-encoders that jointly process query-document
pairs provide superior relevance modelling but cannot scale
to large collections.
        </p>
        <p>
          This challenge has given rise to two-stage retrieval
architectures that combine dense retrievers with dedicated
rankers to achieve superior retrieval quality. Early work
utilised the BM25 for the retrieval stage and combined it
with a BERT-based ranker in a question answering task
[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. However, modern systems increasingly employ neural
methods in both stages. Such methods often use bi-encoders
for eficient candidate retrieval followed by cross-encoders
for precise ranking [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ]. While such neural two-stage
systems have shown strong results, recent work has
explored the use of LLM-based rankers across various domains
[
          <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
          ], including skill extraction [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. Despite recent
interest in LLM-based rankers, bi-encoder/cross-encoder
architectures remain widely deployed due to their predictable
computational costs and proven efectiveness.
        </p>
        <p>
          Building on the success of two-stage retrieval systems
in IR, we adapt this paradigm to skill extraction. To the
best of our knowledge, we propose the first architecture
combining bi-encoder retrieval and cross-encoder ranking
models specifically designed for this task. In contrast to
LLM‑based ranking for skills [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ], we train dedicated
neural models for both stages, ofering better eficiency and
performance while leveraging the complementary strengths
of both encoder types.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We present a two-stage neural architecture for skill
extraction that frames the task as dense retrieval followed by
binary ranking. Our approach leverages curriculum
learning to maximise the utility of limited training data,
combining synthetic datasets with freely available ESCO skill
definitions. This section details our problem formulation,
data configuration strategy, and the design of both pipeline
stages.</p>
      <sec id="sec-3-1">
        <title>3.1. Problem Statement</title>
        <p>In our case, we aim to extract all relevant skills for a given
job description fragment. For example, given the sentence:
”Be able to lead and motivate people and have
good communication skills.”
The goal is to extract skills such as communication, lead
others, and motivate others from a larger taxonomy of possible
skills (e.g., ESCO). We approach this problem in two stages.</p>
        <sec id="sec-3-1-1">
          <title>Stage 1: Dense Skill Retriever (bi-encoder retriever).</title>
          <p>We first retrieve a small subset of relevant skills from a
large skill taxonomy. This is done by encoding the job
sentence and each skill into dense vectors using a trained
biencoder, and computing their cosine similarity. The top-K
most similar skills are returned as candidates.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Stage 2: Binary Skill Ranker (cross-encoder ranker).</title>
          <p>Next, we refine this list using a trained cross-encoder model
that jointly reads the sentence and each candidate skill,
and assigns a relevance score. Skills above a tuned
relevance threshold are predicted as relevant (see Section 3.5
for threshold tuning details).</p>
          <p>Assumption. At inference, the ranker sees only retrieved
candidates; thus, its efectiveness depends on Stage 1
retrieval quality. To create a representative training sample,
we inject the missing gold labels into the training data (no
injection at test time; see Section 3.5 for details).
Evaluation. Retrieval is assessed using RP@K and MRR.
Ranking performance is measured using micro-F1 across all
sentence-skill pairs. The use of each metric is justified in
Section 4.1.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Overview of Available Datasets</title>
        <p>
          Our experiments leverage both synthetic and real-world
datasets to address the data scarcity challenge in skill
extraction. For synthetic data, we utilise two
complementary resources. First, the DECORTE dataset [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which
contains 138,240 artificially generated examples covering nearly
the entire ESCO taxonomy. Secondly, we use SKILLSKAPE
dataset [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], comprising 8,940 multi-skill examples divided
into train, val and test sets, where each sentence can describe
up to nine diferent skills. While DECORTE provides broad
coverage of the ESCO taxonomy with clear single-skill
associations, SKILLSKAPE better reflects real-world complexity
where multiple skills co-occur within job requirements.
        </p>
        <p>
          When it comes to real-world data, we employ three
manually annotated datasets: HOUSE, containing 663 job
description sentences annotated with ESCO labels, split into
val and test sets; TECH, featuring 796 fully annotated job ad
sentences (val + test); and TECHWOLF with 588 annotated
examples (test). All these datasets were originally sourced
from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and later annotated in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (HOUSE, TECH ) and
in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] (TECHWOLF ) using ESCO labels. Additionally, we
incorporate the skill labels, alongside their synonyms and
definitions from ESCO v1.1.0 [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], serving as a valuable
knowledge source for our curriculum learning approach
(ESCO-D).
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Stage-Specific Data Configuration</title>
        <p>Given the limited availability of real-world annotated data
and the diferent requirements of our two-stage
architecture, we strategically partition the datasets described above
to serve distinct roles in training and evaluation. Table 1
presents the complete data allocation, which we designed
following three key principles: (1) maintaining strict
separation between training and test data for fair evaluation,
(2) maximising the use of real-world examples where they
provide the most benefit and (3) balancing the use of
synthetically generated data to ensure eficient learning.</p>
        <p>
          For Stage 1, we use only one example per skill from
DECORTE despite the availability of ten. This decision is
based on preliminary experiments showing no significant
performance improvement when using additional examples,
provided the augmentation strategy from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is employed
(see Section 5). However, as highlighted in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], real-life job
descriptions often describe multiple skills within a single
sentence. Consequently, SKILLSKAPE provides job ad
fragments which are both longer and more complex than those
of DECORTE, albeit ofering inferior taxonomy coverage.
Therefore, we decided to combine these two datasets in our
training data. We hypothesise that such a configuration
satisfies both taxonomy coverage (i.e., each skill in ESCO
has at least one example) and provides a more informative
learning signal for our model. To take advantage of
existing taxonomies, we further expand our training data with
definitions from
        </p>
        <p>
          ESCO-D, which were shown to provide a
strong training signal in skill extraction tasks [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The main
training phase is preceded by pre-training, where both
definitions and skill labels from ESCO-D are used, forming our
curriculum learning strategy (see Section 3.4.1 for details).
        </p>
        <p>For Stage 2, the objective is to ensure a representative
training sample for the ranker. Since SKILLSKAPE consists
of artificially generated data, we decided to incorporate
both validation sets of TECH and HOUSE datasets into
training. Given their relatively small size, we further expand the
training data for Stage 2 by the TECHWOLF dataset. We
acknowledge that such a decision prevents assessing the
performance of the ranking stage on this dataset. However,
such a step was crucial due to the unique nature of real-life
data, where job description fragments can consist of both
single phrases (e.g., ”Python”) as well as longer texts
describing one or multiple competencies. Section 3.5 describes the
exact process of forming training data for this stage.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Stage 1: Bi-encoder for Skill Retrieval</title>
        <p>
          Our bi-encoder is based on all-mpnet-base-v21, a sentence
transformer model pre-trained for semantic similarity tasks.
The model has previously demonstrated its efectiveness
in the job domain for job recommendation [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] and skill
extraction [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] problems.
1https://huggingface.co/sentence-transformers/all-mpnet-base-v2
The bi-encoder processes inputs independently, encoding
job description sentences and skill labels into a shared
embedding space. During inference, we pre-compute
embeddings for all 13,890 ESCO skills, enabling real-time retrieval
via similarity search. For each query sentence, the model
retrieves the top-K skills based on cosine similarity scores.
        </p>
        <p>
          The data is organised into pairs of job description
fragments and their assigned skill labels. For multi-label
sentences, we create one query–skill pair per gold label. We
employ the augmentation strategy from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], where sentences
are randomly concatenated during training. To prevent
augmented sentence pairs from serving as negatives to each
other, we maintain a mask that excludes these pairs (and
associated skill labels) from the negative set. As examined
in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], this augmentation strategy forces the model to learn
robust representations. The model must identify relevant
skills even when unrelated content is present. This mirrors
real job descriptions, where target skills are often embedded
among other skills and irrelevant text. The augmentation
strategy is only applied to job description sentences and not
skill labels at this stage.
        </p>
        <p>
          We employ NT-Xent loss from [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] as a learning objective.
Following the approach used in CLIP [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], we compute the
loss symmetrically. The loss for a single direction is defined
as:
 → = −
∑ log

1
1
2
where  is the batch size,  , = cos(  ,   ) represents the
cosine similarity between the  -th query embedding   and
the  -th skill embedding   , and   denotes the index of the
positive skill for query  . The mask  , = 0 when skill 
should be excluded (i.e.,  =   or  comes from an augmented
version of query  ), and  , = 1 otherwise. The temperature
parameter  controls the sharpness of the distribution. Total
loss is:
 =
        </p>
        <p>( → +  → ),
where  → is computed identically as in (1) but with skill
labels as anchors and queries as positives/negatives. This
bidirectional formulation ensures that both job descriptions
and skills are equally optimised within the shared
embedding space.
(1)
(2)</p>
        <sec id="sec-3-4-1">
          <title>Experimental Configuration.</title>
          <p>The AdamW optimiser is
used with a cosine learning rate schedule and a base learning
rate of 2e-5, with 5% of training steps for warmup. We adopt
cosine decay as it provided smoother convergence than a
linear schedule in preliminary runs. The model is trained for
a single epoch, with a batch size of 64 (the largest fitting on
a 16GB GPU), and gradients clipped at 1.0 to stabilise
training against large updates. Training is performed in mixed
precision to improve eficiency. The temperature parameter
is set to 0.05, selected via grid search on the SKILLSKAPE
validation set in increments of 0.01 within [0.01, 0.07]. For
tokenization, we set the maximum token length to 128 and
32 for sentences and skill labels, respectively. With the
average example length in SKILLSKAPE validation set of
27.8 words and ESCO skill labels no longer than 13 words,
this ensures complete context coverage while maintaining
computational eficiency.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.1. Curriculum Learning with Skill Definitions</title>
          <p>Prior to the training procedure described above, we
employ a pre-training phase that leverages definitions and
all available skill labels from ESCO (ESCO-D). Together,
these form our curriculum training strategy, where the
model first learns from simpler skill-definition alignments
before progressing to more complex job description-skill
mappings. During pre-training, we train the bi-encoder on
skill-definition pairs using the same symmetric NT-Xent
loss and augmentation strategy (applied to definitions) as
in the main phase. This ensures consistency between
pretraining and fine-tuning phases while teaching the model to
align skill names with their semantic meanings. Notably, we
reuse ESCO definitions in the main training phase, where
they serve as high-quality reference examples alongside
job descriptions, contributing to improved performance as
shown in our ablation studies (see Section 5).</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Stage 2: Cross-encoder for Skill</title>
      </sec>
      <sec id="sec-3-6">
        <title>Ranking</title>
        <p>
          At the base of the cross-encoder, we adopt the
ms-marcoMiniLM-L6-v2 model2. It was tuned for the ranking and
displays a strong eficiency–efectiveness trade-of via
selfattention distillation [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
        </p>
        <p>The training data used for this stage is directly sourced
from the previous dense retrieval stage. Specifically, for
each job description sentence in training data, we retrieve
the top-100 skill candidates. To ensure positives are present,
we inject any missing gold skills by replacing the
lowestscoring retrieved items. This is unique to the training data,
as at inference, the test sets contain only the originally
retrieved skills. Such a configuration represents a more
realistic setting where a dedicated ranker might not have access
to a complete set of true labels.</p>
        <p>For each sentence, we pair it with all candidate skills
and assign a binary label (1 if the skill appears in the gold
set, 0 otherwise). To improve generalisation, we apply two
lightweight augmentations with probability 0.2: (i) partial
label masking, where one token of a multi-word skill is
replaced by a [MASK] placeholder to discourage
memorisation of exact surface forms; and (ii) sentence word dropout,
where one random token is removed from longer sentences
to add noise.</p>
        <p>
          During training, each job description sentence is paired
with 100 candidate skills, of which at most 10 are relevant,
yielding a highly imbalanced label distribution. To
mitigate the dominance of easy negatives, we replace standard
binary cross-entropy with the focal loss [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. Focal loss
down-weights well-classified examples, forcing the model
to focus on hard positives and hard negatives. Let   be the
logit and   ∈ {0, 1} the label. The focal loss is:
ℒfocal =
∑   (1 −   ) [− log   ],

1
2https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
with
  = {
 (  )
1 −  (  ) if   = 0,
where  controls the degree of focussing and  balances
positive vs. negative classes. To further address class imbalance
during training, we employ a balanced batch sampler that
maintains approximately 30% positive examples per batch.
This prevents the model from simply learning to predict all
negatives.
dataset 80:20 into training and validation and train the model
for 5 epochs with AdamW (learning rate 2e-5) and a linear
warm-up of 10% of total steps. Validation tests showed no
consistent benefits beyond 5 epochs. Gradients are clipped
at 1.0 with batch size set to 64.
        </p>
        <p>At inference, the model outputs a relevance score per
sentence-skill pair for each test set. While training uses
top100 candidates, we evaluate on top-20 across all methods
for practical reasons: managing API costs for LLM baselines
and maintaining reasonable inference speed. The decision
threshold is selected on a held-out split of the constructed
Stage 2 training pool, tested in increments of 0.05 in the
range [0.1, 0.7]. Based on this tuning, a fixed threshold of
0.2 is used for all reported results. Similarly, we fix  and 
at 0.8 and 3.0, respectively, testing multiple configurations
[0.5, 0.6, 0.8, 0.9] ( ) and [2.0, 2.5, 3.0] ( ). The maximum
tokenization length is set to 128 for each sentence-skill pair.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>While our pipeline can operate as an integrated system,
we evaluate the bi-encoder and cross-encoder components
separately on the datasets described in Section 3.3. This
is done to understand their individual contributions and
identify potential bottlenecks.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Metrics and Benchmarks</title>
        <p>We evaluate each pipeline component with task-appropriate
metrics and baselines.</p>
        <sec id="sec-4-1-1">
          <title>Bi-encoder Evaluation.</title>
          <p>
            Following established practice
in skill extraction tasks [
            <xref ref-type="bibr" rid="ref19 ref2 ref8">19, 2, 8</xref>
            ], we employ R-Precision@K
(RP@K) and Mean Reciprocal Rank (MRR). Since job
description sentences typically contain at most 10 relevant
sentences, where   is the number of gold ESCO skills for
sentence n, and Rel(, ) ∈ {0, 1}
indicates whether the
kth predicted skill is relevant (binary indicator), RP@K is
defined as:
1 
 =1
∑
          </p>
          <p>1
min( ,   ) =1

∑ Rel(, ).</p>
          <p>
            As baselines, we compare against: (1) the base
all-mpnetbase-v2 model without fine-tuning (BASE), and (2) a similar
skill extraction method from Decorte et al. [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Cross-encoder Evaluation.</title>
          <p>
            Our cross-encoder performs
binary classification on retrieved candidates, predicting
whether each ESCO skill is relevant to the given job
description sentence. We evaluate using the micro-F1 score
as it captures both the identification of relevant skills and
the rejection of irrelevant ones [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. For baselines, we
implement LLM-based ranking using GPT-4o-mini and GPT-4.1,
building on similar approaches [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. Each model receives
the query and top-20 retrieved candidates with instructions
to classify each as relevant or irrelevant using single-shot
prompting with temperature set to 0 for deterministic
outputs. The demonstrations for LLM baselines are drawn
from training data, selected to maximise overlap with the
candidate set (see Appendix A for exact configuration).
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Bi-encoder Performance</title>
        <p>Table 2 presents our bi-encoder results across four skill
extraction datasets. Our curriculum-based approach
consistently outperforms both baselines, achieving the highest
scores on all metrics.</p>
        <p>
          Compared to Decorte et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we observe improvements
ranging from 1.95 percentage points (pp) (TECHWOLF ) to
4.78 pp (TECH ) in RP@5. The gains are even more
substantial against the non-fine-tuned baseline, with SKILLSKAPE
experiencing a 32.63 pp improvement in RP@5. The
consistent gains from RP@5 to RP@10 indicate that additional
relevant skills are retrieved when expanding the candidate set.
However, the gap between MRR scores (48.39-72.46%) and
perfect ranking indicates that while most relevant skills are
retrieved, they are not always optimally ordered. This
motivates our cross-encoder stage, which can take advantage of
seeing more closely associated candidates in determining
the relevance of the skills.
        </p>
        <p>Our curriculum bi-encoder achieves consistent
improvements across all datasets, showcasing the coverage of
various job domains present in evaluation data. The small
standard deviations (typically &lt;1.5 pp) across three random
seeds indicate stable training despite the additional
complexity of our curriculum setup. The specific contribution
of the pre-training phase is analysed in Section 5.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Zero-shot Performance</title>
          <p>To evaluate our model’s ability to handle emerging skills
not present in training data, we conduct zero-shot
experiments on held-out skills. We fix a held-out skill set  of
100 ESCO skills and exclude  from all training data (both
pre-training and fine-tuning). Specifically, we exclude the
50 most frequent and 50 least frequent skills based on the
SKILLSKAPE test set. This selection ensures we test on
both common skills (that the model might implicitly learn
through co-occurrences) and rarer ones. At inference, the
retriever still searches the full ESCO taxonomy. For each
test set we filter to queries  whose gold labels intersect  ,
and treat only held-out labels as relevant.</p>
          <p>Table 3 shows that our model successfully retrieves
heldout skills despite no direct training exposure. Improvements
over the non-fine-tuned all-mpnet-base-v2 range from 11.02
pp (TECH ) to 25.88 pp (HOUSE) in RP@5. A similar pattern
can be observed for MRR scores with our model providing
up to 19.33 pp (HOUSE) improvement. These results
demonstrate that our approach creates skill representations that
generalise beyond the training vocabulary.</p>
          <p>
            Notably, zero-shot performance shows higher variance
across seeds (up to 3.29 pp standard deviation) compared
to the complete model (&lt;1.5 pp). This aligns with prior
work showing increased instability in low-data regimes [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ],
where diferent initialisations lead to diferent
representation geometries for unseen classes, especially with a low
amount of training iterations.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Cross-Encoder Performance</title>
        <p>Building on our bi-encoder’s strong retrieval performance,
we now evaluate the cross-encoder stage that refines these
candidates through binary relevance classification. As
explained in Section 3.5, we train on top-100 retrieved
candidates to ensure broad coverage and evaluate on top-20 for
practical inference and fair comparison to our LLM
baselines. The results are provided in Table 4.</p>
        <p>Our fine-tuned cross-encoder delivers the best
performance across all three selected benchmarks. These benefits
are more pronounced when compared to a simpler
GPT4o-mini model, ranging from 6.39 pp (HOUSE) to 30.54 pp
(SKILLSKAPE) increase in F1 scores. However, even when
paired with a much more capable GPT-4.1 model, our ranker
provides improvements for TECH (+8.03 pp) and
SKILLSKAPE (+22.54 pp), with only marginal gains in HOUSE (+
0.53 pp).</p>
        <p>
          Beyond performance advantages, our cross-encoder
offers significant practical benefits for large-scale deployment.
In our setup, GPT-4o-mini costs ≈$0.0001/example and
GPT4.1 ≈$0.001/example3, while our cross-encoder has a
onetime training cost and near-zero per-example inference cost.
For labour-market pipelines processing millions of
postings, this diference is material. Furthermore, our dedicated
ranker achieves approximately 0.021s per example inference
time, compared to 1.07s average for LLM-based solutions,
a ≈50x speed improvement (see Appendix B for full
breakdown of run-times). While open-source alternatives like
Llama exist [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ], models matching GPT-4’s ranking
performance require high-end GPUs (e.g., A100), whereas our
ranker runs eficiently on RTX-4070Ti SUPER with 16GB of
available memory.
        </p>
        <p>
          Our results demonstrate the value of dedicated ranking
with our current bi-encoder. Notably, concurrent work [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
has achieved even stronger retrieval performance, which
presents an exciting opportunity: combining
state-of-theart retrieval with our cross-encoder could yield substantially
better results. At inference (top-20 skill candidates), the
retrieved list contains, on average, about 78%4 of the gold
skills per sentence across the Stage 2 evaluation sets.
Therefore, roughly 22% of gold skills are absent from the ranker’s
candidate set. Improved retrieval would provide our ranker
with more complete candidate sets, likely amplifying its
performance.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Ablation Studies</title>
      <p>3Based on June 2025 prices, averaged over all evaluation data and runs.
Total replication costs for our full evaluation are $0.61 (GPT-4o-mini)
$5.88 (GPT-4.1).
4Measured as Recall@20, macro-averaged.
riculum training regimen. Adding definitions to the training
data without a pre-training phase provides inconsistent
results, even decreasing performance in TECH by 1.22 pp.
However, when pre-training is applied to this data with
definitions, we observe consistent improvements across all
datasets, most notably in TECHWOLF (+3.13 pp) and TECH
(+2.22 pp). Comparing our full system to the model without
definitions or pre-training, we achieve gains ranging from
1.02 pp (SKILLSKAPE) to 3.70 pp (TECHWOLF ). This
demonstrates that the introduced pre-training phase is essential for
leveraging taxonomic knowledge, as without it, definitions
can have a detrimental efect on retrieval performance. The
modest but consistent gains justify using the full
curriculum training and ESCO definitions as additional reference
examples in our final architecture.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations and Future Work</title>
      <p>Our two-stage architecture requires training separate
models. This increases computational requirements compared
to single-stage retrieval. While the cross-encoder processes
queries slower than bi-encoder retrieval due to joint
encoding, it remains eficient at 0.021s per example. Combined
with its more concise outputs (specific skills vs a list of
relevant candidates) and the fact that real-time skill extraction
is rarely critical in labour market analysis, the architecture
is well-suited for practical deployment.</p>
      <p>Data scarcity remains a key challenge. Limited
availability of high-quality annotated job descriptions necessitated
using validation sets for cross-encoder training. While we
maintained evaluation integrity by using separate test sets,
larger training corpora would likely improve performance.</p>
      <p>
        Our current evaluation is limited to English-language job
postings. Generalisation to other languages and industries
requires further investigation. ESCO’s availability in 28
languages presents an opportunity for multilingual extension,
following recent work in multilingual job recommendation
[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>Finally, our cross-encoder uses direct label encoding
rather than one-hot representations. In theory, this
enables handling of new skills, though cross-encoder
generalisation capability to previously unseen skill taxonomies
requires further empirical validation. Future work should
explore cross-taxonomy transfer and the framework’s
ability to adapt to evolving skill landscapes.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This paper introduced the two-stage neural architecture for
skill extraction, adapting successful information retrieval
practices to address the unique challenges of matching job
descriptions to large skill taxonomies. By combining
biencoder retrieval with cross-encoder ranking, our approach
bridges the gap between eficient candidate generation and
precise skill identification.</p>
      <p>Our experiments validate the efectiveness of the
twostage approach across multiple dimensions. The bi-encoder
achieves up to 62.02% RP@5 through curriculum learning
with ESCO definitions, while maintaining strong zero-shot
capability compared to the vanilla model on held-out skills.
More importantly, our cross-encoder ranker amplifies these
gains, delivering F1 scores up to 30.54 percentage points
higher than LLM-based alternatives. Notably, our ablations
revealed that in most cases, taxonomic definitions provide
value only through structured pre-training, highlighting
the importance of curriculum design in leveraging existing
resources. Together, these components create a system that
balances practical deployability with strong skill extraction
performance.</p>
      <p>These results showcase how two-stage recommender
architectures can efectively address skill extraction
challenges in HR systems. While previous work has explored
retrieval-based approaches or LLM-based ranking for skill
extraction, ours is the first to train purpose-built neural
architectures for both retrieval and ranking stages within
a unified framework. This modular design can leverage
advances in either component, with better retrievers
yielding richer candidate sets, and stronger rankers delivering
more precise relevance judgments. HR applications
increasingly rely on recommender techniques across recruitment
(job-candidate matching), development (skill gap
identification), and retention (career path recommendations). Our
two-stage approach ofers a flexible foundation that can
be integrated into these diverse recommendation pipelines,
where extracted skills often serve as essential features for
downstream tasks.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We would like to thank Franziska Heck for her feedback
on this paper. We also thank Katie Killen, Alistair Lawson,
Dimitra Gkatzia and Matthew Dutton for their support in
this project. This research was supported by the Economic
and Social Research Council (Grant Ref: ES/P000681/1).</p>
    </sec>
    <sec id="sec-9">
      <title>Ethical Considerations</title>
      <p>We evaluate on publicly available datasets. Likewise, our
training data comprises job advertisements and taxonomy
labels publicly available; we do not process personal data.
This publication uses the ESCO classification of the
European Commission. All other datasets are accessible through
CC-BY-4.0 (DECORTE, HOUSE, TECH, TECHWOLF ) and MIT
licences (SKILLSKAPE). Nonetheless, job ads and
standardised taxonomies may encode societal and market biases (e.g.,
occupational stereotypes). Consequently, our system is
intended for labour-market analytics and skill insights rather
than automated hiring decisions. We recommend human
oversight for any downstream HR use. We release code and
configuration files, to support reproducibility.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used
Grammarly, ChatGPT in order to: Grammar and spelling check,
Paraphrase and reword. After using these tools, the authors
reviewed and edited the content and take full responsibility
for the publication’s content.
The created LLM-based rankers are deployed using the
OpenAI API platform5, implemented in Python. In total, two
models: GPT-4o-mini and GPT-4.1 are tested. Each model
is provided with the following system prompt:</p>
      <p>System Prompt: You are an expert
skill classifier. Given a sentence and
a list of possible skills, your task
is to select only the skills that are
explicitly or implicitly required. Be
precise and avoid including unrelated or
weakly related skills. Return a JSON {
"relevant_skills": ["skill_1", "skill_2",
...] }. If no skills are relevant, return
{ "relevant_skills": [] }. Do not add any
other keys or text.</p>
      <p>This prompt defines the task and ensures that the output is
constrained to a dictionary format, enabling eficient
parsing and evaluation.</p>
      <p>At inference time, we provide a single-shot demonstration
example using a designated get_demonstration function.
This improves model performance and ensures fair
comparability with the cross-encoder ranker, which also utilises
training data. The function selects a demonstration example
from a pool of annotated instances based on maximal skill
overlap with the input (see (A)). The candidate lists used
for demonstration are restricted to the top-20 labels prior to
the injection of missing gold labels. This is done to control
API cost and retain consistency with test examples.
Algorithm 1 Get Demonstration Based on Skill Overlap
Given the original job description, a set of candidate skills,
and the retrieved demonstration, the model predicts a list
of truly relevant skills. These are then compared against
gold labels using the same evaluation protocol as the
crossencoder ranker.</p>
    </sec>
    <sec id="sec-11">
      <title>B. Runtime Statistics</title>
      <p>All experiments were run on a single RTX 4070Ti SUPER
GPU with 16GB VRAM. Table 6 reports training time (total)
and average per-example inference time. Times are averaged
over 3 random seeds.
LLM (GPT-4o-mini)
LLM (GPT-4.1)</p>
      <p>Train</p>
      <p>Stage 1
8.6 minutes</p>
      <p>Stage 2</p>
      <p>Test (inference)
&lt;0.002 seconds per</p>
      <p>example
&lt;0.002 seconds per</p>
      <p>example</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Forum</surname>
          </string-name>
          ,
          <source>The Future of Jobs Report</source>
          <year>2025</year>
          ,
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          , Amherst, MA, USA,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Decorte</surname>
          </string-name>
          , Jens-Joris and Verlinden, Severine and Van Hautte,
          <article-title>Jeroen and Deleu, Johannes and Develder, Chris and Demeester, Thomas, Extreme multi-label skill extraction training using large language models, in: AI4HR PES, ECML-PKDD 2023 Workshop</article-title>
          , Proceedings,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Magron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montariol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bosselut,
          <article-title>JobSkape: A framework for generating synthetic job postings to enhance skill matching</article-title>
          , in: E. Hruschka,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Otani</surname>
          </string-name>
          , T. Mitchell (Eds.),
          <source>Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics, St</article-title>
          .
          <source>Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>58</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .nlp4hr-
          <fpage>1</fpage>
          .4/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gugnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <article-title>Implicit skills extraction using document embedding and its use in job recommendation</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artiifcial intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>13286</fpage>
          -
          <lpage>13293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Senger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. van der Goot,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>Deep learning-based computational job market analysis: A survey on skill extraction and classification from job postings</article-title>
          , in: E. Hruschka,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Otani</surname>
          </string-name>
          , T. Mitchell (Eds.),
          <source>Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics, St</article-title>
          .
          <source>Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . URL: https: //aclanthology.org/
          <year>2024</year>
          .nlp4hr-
          <fpage>1</fpage>
          .1/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>A readand-select framework for zero-shot entity linking</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Singapore,
          <year>2023</year>
          , pp.
          <fpage>13657</fpage>
          -
          <lpage>13666</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .findings-emnlp.
          <volume>912</volume>
          /. doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2023</year>
          .findings- emnlp.912.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Comparing neighbors together makes it easy: Jointly comparing multiple candidates for eficient and efective retrieval</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>22255</fpage>
          -
          <lpage>22269</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .emnlp-main.
          <volume>1242</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . emnlp- main.1242.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          , G. Soulié,
          <article-title>Large language models as batteries-included zero-shot esco skills matchers</article-title>
          ,
          <source>in: Proceedings of the 3rd Workshop on Recommender Systems for Human Resources</source>
          (
          <article-title>RecSys in HR 2023), in conjunction with the 16th ACM Conference on Recommender Systems, Association for Computing Machinery</article-title>
          , Singapore,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>K. D'Oosterlinck</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Khattab</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Remy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Demeester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Develder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Potts, In-context learning for extreme multi-label classification</article-title>
          ,
          <source>arXiv preprint arXiv:2401.12178</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sonniks</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Plank,</surname>
          </string-name>
          <article-title>SkillSpan: Hard and soft skill extraction from English job postings</article-title>
          , in: M.
          <string-name>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          <string-name>
            <surname>Meza Ruiz</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the</source>
          <year>2022</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>4962</fpage>
          -
          <lpage>4984</lpage>
          . URL: https:// aclanthology.org/
          <year>2022</year>
          .naacl-main.
          <volume>366</volume>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2022</year>
          .naacl-main.
          <volume>366</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. van der Goot, M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , NNOSE:
          <article-title>Nearest neighbor occupational skill extraction</article-title>
          , in: Y. Graham, M. Purver (Eds.),
          <source>Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics, St. Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>589</fpage>
          -
          <lpage>608</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>eacl-long</article-title>
          .
          <volume>35</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montariol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <article-title>Rethinking skill extraction in the job market domain using large language models</article-title>
          , in: E. Hruschka,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Otani</surname>
          </string-name>
          , T. Mitchell (Eds.),
          <source>Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics, St</article-title>
          .
          <source>Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>42</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .nlp4hr-
          <fpage>1</fpage>
          .3.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Herandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Skill-llm: Repurposing general-purpose llms for skill extraction</article-title>
          ,
          <source>arXiv preprint arXiv:2410.12052</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I.</given-names>
            <surname>Konstantinidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maragoudakis</surname>
          </string-name>
          , I. Magnisalis,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berberidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peristeras</surname>
          </string-name>
          ,
          <article-title>Knowledge-driven unsupervised skills extraction for graph-based talent matching</article-title>
          ,
          <source>in: Proceedings of the 12th Hellenic Conference on Artificial Intelligence</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mutharaju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sachdeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumaraguru</surname>
          </string-name>
          , Jobxmlc:
          <article-title>Extreme multilabel classification of job skills with graph neural networks</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EACL</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>2181</fpage>
          -
          <lpage>2191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>A</surname>
          </string-name>
          .
          <article-title>-s.</article-title>
          <string-name>
            <surname>Gnehm</surname>
            , E. Bühlmann,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Buchs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Clematide</surname>
          </string-name>
          ,
          <article-title>Fine-grained extraction and classification of skill requirements in German-speaking job ads</article-title>
          , in: D.
          <string-name>
            <surname>Bamman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jurgens</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Keith</surname>
            ,
            <given-names>B. O</given-names>
          </string-name>
          <string-name>
            <surname>'Connor</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          Volkova (Eds.),
          <source>Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)</source>
          ,
          <article-title>Association for Computational Linguistics, Abu Dhabi</article-title>
          ,
          <string-name>
            <surname>UAE</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>24</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .nlpcss-
          <volume>1</volume>
          .2/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .nlpcss-
          <volume>1</volume>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Kavargyris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Georgiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Papaioannou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Petrakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mittas</surname>
          </string-name>
          , L. Angelis,
          <article-title>Escox: A tool for skill and occupation extraction using llms from unstructured text</article-title>
          ,
          <source>Software Impacts</source>
          (
          <year>2025</year>
          )
          <fpage>100772</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Jensen</surname>
          </string-name>
          , R. van der Goot,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>Skill extraction from job postings using weak supervision</article-title>
          ,
          <source>in: Proceedings of RecSys in HR'22: The 2nd Workshop on Recommender Systems for Human Resources, in conjunction with the 16th ACM Conference on Recommender Systems</source>
          , Association for Computing Machinery, Seattle,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>J.-J. Decorte</surname>
            ,
            <given-names>J. Van Hautte</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deleu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Develder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <article-title>Design of negative sampling strategies for distantly supervised skill extraction</article-title>
          , in: Kaya, Mesut and Bogers, Toine and Graus, David and Mesbah, Sepideh and Johnson, Chris and Gutiérrez, Francisco (Ed.),
          <source>Proceedings of the 2nd Workshop on Recommender Systems for Human Resources (RecSys-inHR 2022)</source>
          , volume
          <volume>3218</volume>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2022</year>
          , p.
          <fpage>7</fpage>
          . URL: https: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3218</volume>
          /RecSysHR2022-paper_4.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. van der Goot, B. Plank, ESCOXLMR:
          <article-title>Multilingual taxonomy-driven pre-training for the job market domain</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>11871</fpage>
          -
          <lpage>11890</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>662</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>662</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>J.-J. Decorte</surname>
            ,
            <given-names>J. Van</given-names>
          </string-name>
          <string-name>
            <surname>Hautte</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Develder</surname>
          </string-name>
          , T. Demeester,
          <article-title>Eficient text encoders for labor market analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2505.24640</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Soviany</surname>
          </string-name>
          , R. T. Ionescu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          ,
          <article-title>Curriculum learning: A survey</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>130</volume>
          (
          <year>2022</year>
          )
          <fpage>1526</fpage>
          -
          <lpage>1565</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Pre-training for ad-hoc retrieval: Hyperlink is also you need</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management, CIKM '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>1212</fpage>
          -
          <lpage>1221</lpage>
          . URL: https://doi.org/10.1145/3459637.3482286. doi:
          <volume>10</volume>
          . 1145/3459637.3482286.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Dense text retrieval based on pretrained language models: A survey</article-title>
          ,
          <source>ACM Transactions on Information Systems</source>
          <volume>42</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Passage re-ranking with bert</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>04085</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Reveal the unknown: Out-of-knowledge-base mention discovery with entity linking</article-title>
          ,
          <source>in: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source>
          , CIKM '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>452</fpage>
          -
          <lpage>462</lpage>
          . URL: https://doi.org/10.1145/3583780.3615036. doi:
          <volume>10</volume>
          . 1145/3583780.3615036.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Braun,
          <string-name>
            <surname>Twente-</surname>
          </string-name>
          BMS-NLP at PerspectiveArg 2024:
          <article-title>Combining bi-encoder and crossencoder for argument retrieval</article-title>
          , in: Y. Ajjour,
          <string-name>
            <surname>R. BarHaim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. El</given-names>
            <surname>Baf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Skitalinskaya (Eds.),
          <source>Proceedings of the 11th Workshop on Argument Mining (ArgMining</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>168</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .argmining-
          <volume>1</volume>
          .17/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .argmining-
          <volume>1</volume>
          .
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , S. Wang,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Is ChatGPT good at search? investigating large language models as re-ranking agents</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>14918</fpage>
          -
          <lpage>14937</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .emnlp-main.
          <volume>923</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>923</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <article-title>Beyond retrieval: Ensembling cross-encoders and gpt rerankers with llms for biomedical qa</article-title>
          ,
          <source>arXiv preprint arXiv:2507.05577</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          and
          <article-title>Directorate-General for Employment, Social Afairs and Inclusion, ESCO handbook - European skills, competences, qualifications and occupations</article-title>
          ,
          <source>Publications Ofice</source>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          . 2767/934956.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Retyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Sardiña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <article-title>Combined unsupervised and contrastive learning for multilingual job recommendation</article-title>
          ,
          <source>in: Proceedings of the 4th Workshop on Recommender Systems for Human Resources (RecSys-in-HR</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>A simple framework for contrastive learning of visual representations</article-title>
          ,
          <source>in: International conference on machine learning</source>
          ,
          <source>PmLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1597</fpage>
          -
          <lpage>1607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          , in: International conference on machine learning,
          <source>PmLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Minilm:
          <article-title>Deep self-attention distillation for taskagnostic compression of pre-trained transformers</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>5776</fpage>
          -
          <lpage>5788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal loss for dense object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2980</fpage>
          -
          <lpage>2988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mosbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Andriushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klakow</surname>
          </string-name>
          ,
          <article-title>On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>04884</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>7 minutes</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>