<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Recommender Systems, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Prediction using Resume Representation Learning and Skill-based Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jens-Joris Decorte</string-name>
          <email>R@10</email>
          <email>R@5</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeroen Van Hautte</string-name>
          <email>jeroen@techwolf.ai</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Deleu</string-name>
          <email>johannes.deleu@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Develde r</string-name>
          <email>chris.develder@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Demeester</string-name>
          <email>thomas.demeester@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Career Path Prediction, Resume Representation Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>( Jeroen Van</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ghent University - imec</institution>
          ,
          <addr-line>9052 Gent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TechWolf</institution>
          ,
          <addr-line>9000 Gent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>ropean Skills</institution>
          ,
          <addr-line>Competences, Qualifications and Occupa-</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach RecSys in HR'23: The 3rd Workshop on Recommender Systems for</p>
      </abstract>
      <kwd-group>
        <kwd>prediction task as follows</kwd>
        <kwd>given a career history</kwd>
        <kwd>i</kwd>
        <kwd>e</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>next step in a career is known as career path predictionp.ath prediction algorithms.</p>
      <p>It is well-known that person-job fit has a positive impact both employers and job titles [4, 5]. Relying on only
on both job satisfaction and job performance1[, 2]. Also, sparse features, such as job title and company names,
employment plays a large role in most people’s lives and necessitates large amounts of career trajectories in order
has an important impact on their well-being3[]. Thus, to learn meaningful (graph) representations from them.
providing people with next steps at the right time in However, as such career data constitutes personal
infortheir career that are both inspiring and suited to theimration, most research relies on closed datasets, often
experience is important for job satisfaction, productivityproprietary to a company. Hence, there is a lack of open
and well-being of workers. The task of predicting the datasets for the development and evaluation of career</p>
      <sec id="sec-1-1">
        <title>While it is closely related to job recommendation, career</title>
        <p>path prediction does not recommend specific job ads to</p>
        <p>We believe that the career path prediction task can
benefit from as of yet untapped unstructured data sources,
turnover prevention to internal job mobility.
candidates, but rather aims to predict the next role in ani.e., the free-form textual descriptions of past work
expeindividual’s career. Such a role is typically characterizerdience in resumes. Concretely, we propose a relatively
by a company name, job title and optional attributes such small, anonymous dataset of textual career histories from
as salary or location. Being able to predict next steps inresumes, enriched with structured occupation labels from
individual’s careers has many applications, ranging froma predefined ontology. For the latter we adopt the
EuCommon approaches to career path prediction relytions (ESCO) [6]. In this paper, we define the career path
on large amounts of career history data, and structure
all career transitions into a large graph that contains
nEvelop-O
LGOBE
Human Resources, in conjunction with the 17th ACM Conference on
∗Corresponding author.
1,  2, … ,</p>
        <p>−1 ), predictthe ESCO occupation
label   of the held-out next experience. We believe that
by focusing on the prediction of the next occupation, such
a system can help in recommending relevant next jobs
or providing clarity on internal mobility at employers in
the future. Our main contributions are:
• We create, annotate and publis1ha dataset of 2,164 We propose a new way of learning expressive
representaanonymous career histories across 24 diferent indus- tions of textual career histories calleCdareerBERT
withtries (§ 3). The career histories are structured as a lisotut the need for resume, job pairs. InsteadC, areerBERT
of work experiences described in free-form text. Eachrelies on textual career histories and their corresponding
experience is annotated with corresponding ESCO oc-ESCO occupations labels only.</p>
        <p>cupation.
• We show how the parallel information present in the 2.2. Career Path Prediction
textual career histories and in the occupation ontol-In the field of career path prediction, large scale data from
ogy provides opportunities to train a domain-specific social networks (LinkedIn) has been an important source
text representation model§( 4) that can be used down- of information [11, 5, 4]. An early work on career path
stream for the career path prediction task, under aprediction focused on four distinct career paths -
softconstrained dataset size. ware engineering, sales, consulting, and marketing11[].
• We show how the hybrid approach of combining text- They simplified these paths into four stages of seniority
based and skill-based prediction achieves the strongestand normalized LinkedIn job titles accordingly for the
results (§ 5) for our task, thus demonstrating the value prediction task. While the specific dataset is not publicly
of injecting skill ontology information into the model available, they extracted demographic, psycholinguistic,
(as opposed to using purely text-based models). and topic-related features from social media content to
enhance their predictions. An extended approach that
predicts multiple future job titles and company changes
2. Related Work ahead, rather than just the next step was proposed b5y][.</p>
      </sec>
      <sec id="sec-1-2">
        <title>They utilized a proprietary dataset of 300,000 resumes,</title>
        <p>2.1. Resume Representation Learning allowing them to delve deeper into career trajectory
analysis, but only used job titles and companies as features for
We believe that expressive representations of resumesthe task at hand. Another approach to career path
prediccan benefit many HR-related tasks such as job recommen- tion uses an LSTM to represent both profile context and
dation and career path prediction. Building qualitativecareer path dynamics, leveraging a LinkedIn dataset to
resumes representations is challenging due to the semi-predict both the next company and job title 1[2]. Massive
structured nature of resumes. Resumes tend to contain amounts of resumes (+459k) have been used to predict
similar sections, but within each section, the text is typi- job mobility patterns using a heterogeneous
companycally unstructured. Current works on capturing resumesposition network constructed from the resumes’ career
into more structured representations mostly focus on ex-trajectory data, providing insights into career transitions
tracting only a subset of information present in resumes.and progression 4[]. All aforementioned methods rely
As a result, many approaches focus on just a subset of in-on extensive collections of resumes and overlook the
information present in resumes. The Job2Vec model learns formation embedded within the free-form text that is
job title representations based on a graph of thousandspart of work experience sections. In contrast, our work
of career paths in the IT and Finance7][, but completely leverages this text to enable new methods, that do not
ignores the unstructured description linked to the expe-require massive-scale datasets and interaction graphs,
riences. Another interesting work develops a similarityas the textual content could ofer a richer context for
measure between careersS(imCareers) as a sequence understanding career progression.
alignment metric between sequences of positions 8[].</p>
      </sec>
      <sec id="sec-1-3">
        <title>This work does use the unstructured summaries, but</title>
        <p>only after applying keyword extraction on them. 3. Anonymous Career Path Dataset
Only a minority of works aims to capture the full job
position information and typically relies on matched pairsWe reuse the set of anonymous resumes1[3, 14], gathered
of resume text and job ads. Examples of this are9[] that from Kaggle,2 which contains 2,482 anonymous resumes,
train a siamese adaptation of convolutional neural netb-oth in textual form and as pdf files. These anonymized
work. A more recent work uses contrastive learning ofresumes were originally collected from an online portal,
a sentence-transformer model between correspondingand are based on diferent profiles that applied on the
resume, job ad pairs1[0]. The downside of these methods platform to jobs from 24 diferent industries. In § 3.1,
is efectively the need for a job recommendation dataset, we detail how we transformed these resumes into
strucwhich is hard to get access to, and may contain unex- tured lists of experiences, each with their respective job
pected biases depending on how the data was gathered.title, experience summary, time period and ESCO
occu1https://huggingface.co/datasets/jensjorisdecorte/
anonymous-working-histories</p>
      </sec>
      <sec id="sec-1-4">
        <title>2https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset</title>
        <p>pation counterpart. Then§, 3.2 summarizes the main From that uniform text format, we then easily parse the
characteristics of the obtained dataset. text into a JSON structure combining the title,
description, start and end date. Finally only profiles with 2+
3.1. Dataset Construction experiences are retrained, after which 2,164 career
histories remain. The quality the rewritten text from GPT-3.5
We parse structured career histories from the resumewsas validated on 100 individual resumes. Although some
in free-text form, as written by their authors. Such ca-sentences were rephrased slightly, the rewritten text was
reer history is composed of a sequence oefxperiences found to be accurate overall.
ex1 … ex , each defined as a title and description and the
time period it covered. The length of a career his- Enrich with Occupation Labels: Every experience in
tory may obviously difer across resumes. We supple- our dataset is enriched with its corresponding occupation
ment each individual experiencex with a corresponding out of all 3007 ESCO occupations available. We use a
ESCO occupation labelocc . Next, we detail how we ex- proprietary classifier that is able to accurately classify
tract the title and descriptions from the full-text resumes,each experience based on its title and description. An
as well as the process to obtain ESCO labels. extensive manual validation process on 10% of the dataset
confirmed the accuracy of these labels as only 2.2% of
Extract experience section: Since we observed that labels were found to be suboptimal. These ESCO labels
the original dataset’s text format lacks structure, presum-are stored as part of the final dataset. Note that the
ably due to PDF or HTML parsing artefacts, we prepro3-007 ESCO occupations do not capture all aspects of the
cess the data to restore paragraph segmentation. Conr-oles, as they for example do not reflect diferent seniority
secutive whitespaces were identified as suitable places levels within a role. Rather, they provide a high-level
to insert newlines, which reconstructs a readable forc-ategorisation of jobs based on their performed activities.
mat. Since we are only interested in the professional
experience listed in the resume, we want to skip all of 3.2. Dataset Analysis
the sections on “education”, “certifications”, “projects”,
“skills”, “publications”, “awards”, “personal information”,The industries are relatively balanced across the dataset,
“presentations”, etc. We thus manually inspected the re-with 18 out of 24 industries having between 90 to 108
resumes in the dataset to identify the section titles used, sumes. A detailed breakdown is included inAppendix A.
and extract theexperiences of interest as the region in Figure 1shows the distribution of the number of
experibetween one of the related experience headin3gsand the ences per career history.
earliest subsequent section header. The length of the
thus selected sections on average amounts to 59% of the
original resume length. We successfully processed 2,473
out of all 2,484 resumes, discarding the remaining 11 low
quality resumes.</p>
        <p>Structure working experiences: The obtained work
experience sections list the diferent roles, often
in chronological order. Because the resumes are
anonymized, experiences are annotated with general
“Company Name” and “City, State” placeholders, which
we thus neglect. Each experience contains a job title
(typically on a separate line) and a paragraph describing
the respective responsibilities. Finally, each experience
contains the period in which it was performed, with start
and end date (or “current”). The order in which title,
pe</p>
        <p>Figure 1: Histogram of the number of work experiences per
riod and description are mentioned varies across resumes,</p>
        <p>resume in our dataset.
which makes it hard to uniformly define the separation
(e.g., as a regular expression) between each experience in
the text. Therefore, we rewrite the experience section in The ESCO occupations in our dataset follow a
longa uniform format using theGPT-3.5 API (see Appendix C). tailed distribution, as can be seen in detail from the
log-log plot in Appendix A. The most frequent 300 ESCO
3We found the following headings preceding the experiences of occupations represent a little over 80% of all experiences
interest: “experience”, “professional experience”, “work history”,
“work experience”, “relevant experience”, “relevant professionailn the dataset, while over 60% of ESCO occupations never
experience”, “employment history”, “employment &amp; experience”. appear in the dataset.</p>
        <sec id="sec-1-4-1">
          <title>Career History (chronological left to right)</title>
          <p>Title: Sales Associate
Description: Greeted customers,
determined their needs, maintained
knowledge of sales and ...</p>
        </sec>
        <sec id="sec-1-4-2">
          <title>ESCO: sales assistant</title>
          <p>Title: Collections Specialist
Description: Managed a high-volume
of customer calls, evaluated and
initiated alternative solutions ...
ESCO:
debt collector
MODEL</p>
          <p>Ranked ESCO occupations
1.
3.
.
..
2. retail department manager
✔
financial risk manager
debt collector
...</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Career Path Prediction Models</title>
      <p>4.1. Task Description
We formalize career path prediction on our dataset as
history up until then, as illustrated inFig. 2. Each
caranking the full set of ESCO occupations by how suit- skills as ( occ). Given a career history with ESCO
occuable they are as a next career step, based on the careepration labelsocc1, … , occ , we represent the skills of the
reer history(ex1, … , ex ) corresponds to − 1</p>
      <p>diferent
prediction problems: for each experienceex except the
ifrst one, its corresponding ESCO occupation labelocc
serves as the true label to predict based on the preceding
its attributes and structure to predict next jobs. In the</p>
      <sec id="sec-2-1">
        <title>ESCO ontology, each occupationocc is linked to a set of</title>
        <p>standardized skills, which is partitioned in skills that are
either “essential” or “optional” foorcc. We denote such
unified skill set combining both essential and optional
full career as the union of all related skil⋃ls=1  ( occ ).</p>
        <p>Finally, as a score to rank potential ESCO occupations
occ, we define the skill match  SKILLS of an experience
history against a specific ESCO occupation as the
frac
experiences and any ESCO occupationocc and outputs
a score, after which all ESCO occupations are scored
against the experience histor(yex1, … , ex−1 ), and ranked
 − 1 experiences. More formally, we expect a scoring tion of skills linked to that ESCO occupation that are also
function (( ex1, … , ex−1 ), occ) that takes a sequence of present in the union of skills associated with the work
experience’ ESCO labels, i.e.,
both.
should be the true labelocc . However, applications that
rank recommended jobs to candidates can typically show
from high to low scores. The highest scored ESCO label  SKILLS((ex1, … , ex ), occ) =
more than one recommended job. As such, we use rank- 4.3. Description-based Prediction
based metrics with a focus on top 5 and top 10 ranked Our second model relies on the textual descriptions
occupations, specifically Mean Reciprocal Rank (MRR), present in the career histories. Given a suficiently strong</p>
        <p>To solve the ranking problem, in§ 4.2 we detail ap- possible to predict next roles based on what has been
deproaches that use the information contained within the scribed in previous experiences. Two steps are necessary
ESCO ontology. Next, § 4.3 presents a combination of for this model. First, a strong domain-specific
representarepresentation learning and regression to tackle the probti-on model needs to be developed to accurately represent
lem. Finally, § 4.4 describes a hybrid method combining career histories and ESCO occupations in the same space.
text representation model, we argue that it should be
|⋃=1  ( occ ) ∩  ( occ)|
| ( occ)|
4.2. Skill-based Prediction</p>
      </sec>
      <sec id="sec-2-2">
        <title>We hypothesize that job positions taken strongly rely on</title>
        <p>the skills of the person, and thus intuitively expect that
the career path prediction could benefit from informa- Career History Representation Learning
tion on underlying skills. Such information is inherently a powerful domain-specific representation model for
ca</p>
      </sec>
      <sec id="sec-2-3">
        <title>To learn</title>
        <p>present in ESCO, which captures both skills and job ti- reer histories, we make use of the parallel information
tles. As the inferred ESCO labels for all experiences arethat is contained in our dataset. For each work experience
available, we can make use of the full ESCO ontology, in the dataset, we have two textual descriptions, being
(1) the self-reported title and experience description from
Second, a mapping needs to be learned from the
representation of a career history to the representation of relevant
next ESCO occupations, through which the career path
prediction task can be performed.
doc1
&lt;SEP&gt;
&lt;SEP&gt;
doc2</p>
        <p>CareerBERT-ALL
doc1 doc2
&lt;SEP&gt;
&lt;SEP&gt;
doc1
&lt;SEP&gt;
&lt;SEP&gt;
&lt;SEP&gt;
&lt;SEP&gt;
&lt;SEP&gt;
&lt;SEP&gt;
embed</p>
        <p>embed
( vec1 , vec2 )
&lt;SEP&gt;
&lt;SEP&gt;
embed</p>
        <p>embed
( vec1 , vec2 )
embed</p>
        <p>embed
( vec1 , vec2 )
embed</p>
        <p>embed
( vec1 , vec2 )
embed</p>
        <p>embed
( vec1 , vec2 )
the resume, and(2) the ESCO occupation title as well as cast only the lastESCO occupation intodoc2.
its “description” field in the ESCO ontology. Inspired by
this parallel textual data, we adopt a contrastive learn•- CareerBERT-ALL – given a career history, cast the
ing strategy to finetune a sentence-transformer model sequence of self-reported experiences intdooc1. For
(all-mpnet-base-v2)4 that was pretrained on over 1B En- each ESCO occupation in the sequence, cast it
sepaglish sentence pairs 1[5]. We make use of multiple neg- rately into adoc2 text, generating as many pairs as the
atives ranking loss with in-batch negatives, as proposed length of the sequence.
by [16]. This training procedure only requires positive The CareerBERT-FULL is the typical scenario of
conpairs (doc1, doc2) of corresponding textual documents. trastive learning in which we use two diferent
(texWe format both an experience’s self-reported job titletual) representations of the same underlying information.
and description and those for an ESCO occupation in the However, we suspect that this strategy might be limited
same way, to embed them each with the chosen sentence- in its efectiveness, as properties like the length of the
transformer (where we add the“esco” prefix only for text, or the amount of SEP tokens could already give away
ESCO roles): the correct matching of pairs within a batch, without
con(esco) role: &lt;title&gt; sidering the underlying meaning of the text. To counter
description: &lt;description&gt; this expectation, theCareerBERT-LAST strategy is
included. This strategy uses only the last ESCO label in</p>
        <p>Since we want to represent full career histories anddoc2, thus avoiding the above mentioned risks. However,
not just individual work experiences, multiple work ex- a risk with this strategy is that the representation of the
periences are combined in one document, by concatenat- self-reported career history will focus only on the last
ing the single experience representations (ordering thempart (the last experience). A final strategyC(
areerBERTchronologically from oldest to most recent), separated by ALL) is thus included to counter this expectation. This
the tokenizer’s reservedSEP token, which we denote as strategy is similar toCareerBERT-LAST, but duplicated
(  1, ⋯ ,    ). Now for each career trajectory, wefor each ESCO label in the sequence instead of only the
want to create pairsd(oc1, doc2) of textual representations last one. We hypothesize that, bydoc2 randomly being
of on the one hand the experiences as described in the one of the assigned ESCO labels, the representation of
resumes, and on the other hand the ESCO-ontology coun- the self-reported career needs to be expressiveaolfl its
terparts, to use in the contrastive training. For this, weexperiences.
explore three diferent approaches (visualized inFig. 3): Finally, note that each contiguous subspan of a career
• CareerBERT-FULL – given a career history, cast history is a plausible career trajectory, and for each
histhe sequence of self-reported experiences intdooc1 and tory with experiences, there exist⋅( +12) such spans.
cast the corresponding sequence of ESCO occupations We use this insight the vastly increase the number of
into doc2. career trajectories that can be used in this representation
learning stage.
• CareerBERT-LAST – given a career history, cast
the sequence of self-reported experiences intdooc1 and</p>
      </sec>
      <sec id="sec-2-4">
        <title>Linear Projection As a second stage of the text4https://huggingface.co/sentence-transformers/all-mpnet-base-v2 based career path prediction, a mapping needs to be</title>
        <p>Finally, we combine the above metrics SKILL and  TEXT
because we hypothesize that the signal of skill-based
prediction and description-based prediction are
complementary. Introducing just one hyperparamt er, our hybrid
approach is defined as the weighted sum:</p>
        <p>HYBRID =  ⋅  TEXT + (1 − ) ⋅  SKILL.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Experimental results and Discussion</title>
      <p>4.4. Hybrid Prediction
learned from the career history representation to the5.1. Representation Learning Quality
representation of the next ESCO occupation. Formally,
given a text representation function , we need to An initial validation of theCareerBERT representation
l(e arn a map).pWinhg ilefrmoomre(s(ophisticate d o1p,⋯tio, ns a−r1e))avtoail-tmhoedierlsefeisctpievrefnoersmseidnarsetporbeestetnertinungdcearrsetaenrdhaisntdorcoiemsp.aFroer
able, we take the simple approach of learning a linearthis, we use the industry classification task as proposed
transformation between both vectors, and optimize this in [13]. Each career history in our dataset is linked to one
using the ordinary least squares regression. This proin- 24 total industries. The quality of the representation
jection  then allows us to write down the text-based model, when kept frozen and combined with a simple
scoring function as follows: classification layer, should correlate with performance
on this prediction task. We follow the same setup as1[3]
which is to sample 80% of all histories for training and the
other 20% for validation. This is measured across 10
dif TEXT(( 1,  2, … ,   ),  ) ferent random splits. We use a one-vs-all support-vector
= cosim( (((  1, ⋯ ,    ))) , (  ) ) machine (SVM) for the classification. Table 2 shows the
average accuracy across the 10 random runs, as well as
with cosim(, ) ≜ ‖‖ ⋅⋅ ‖‖ tahneyi rfinesttaunndinagrdisdienvciluadtieodnfso.rTrheefeprreentrcae.inWede ombosdeerlvweitthhaotut</p>
      <sec id="sec-3-1">
        <title>CareerBERT-ALL leads to the highest performance in this case.</title>
        <p>We split our dataset randomly into a train, validation and5.2. Career Path Prediction
test subset (80%/10%/10%), stratified along the industries We include a simple baseline system “reversed history”
to maintain diverse profiles in each. The statistics of each which simply predicts the ESCO occupations present in
subset are shown inTable 1. the input, ranked most to least recent. Our formulation of
skill-based career path prediction has no parameters that</p>
        <p>Career Histories Experiences can be tuned, so we directly report performance on the
TVraaliidnation 1271270 7995172 test set. For the text-based prediction, no
hyperparameTest 227 1050 ter needs to be tuned. Therefore, for eacChareerBERT
strategy, we directly train the linear projection on the
Table 1 combined train and validation set to report performance
Statistics of the train, validation and test subsets of the on the test set. We include the pretrained encoder model
dataset. without any finetuning for comparison. Also, for each
text representation model, we measure rank-based
re</p>
        <p>The diferent CareerBERT models are trained on the sults with and without the linear projection, to estimate
train subset, for a maximum of 2 epochs. During training, the impact of this stage. Finally, for the hybrid prediction
we measure the loss on the validation set every 10% of method, the  parameter needs to be tuned. We perform
an epoch, and keep the best performing checkpoint. We a grid search for values between 0 and 1 with increments
refer toAppendix B for further details about the trainingof 0.1 and measure performance for each value on the
valprocedure. In the rest of this section, we first validate idation set, as shown inFig. 4. As text-based method for
the quality of eachCareerBERT strategy through the this grid search, we decide to use thCeareerBERT-ALL
industry classification task in § 5.1. Then the main task method as it seems to perform favorably. The projection
of career path prediction is evaluated §in5.2. in this case is optimized on just the train set, as to not
Pretrained
CareerBERT-FULL
CareerBERT-LAST
CareerBERT-ALL</p>
        <p>Accuracy (%)
61.82 ±1.70
67.14 ±1.72
66.40 ±1.37
68.94 ±1.70
overfit on the validation set for this grid search. Based simple baseline. The skill-based prediction method
suron this grid search, the value for was set to 0.8 for passes the baseline with close to 9 %-points recall@10.
best results. All results on the test set are compiled in Among the text-based prediction methods, we observe
table Table 3. that CareerBERT-ALL performs strongest. This
validates our assumption that stronger representation
models (as represented on the industry classification task)
indeed lead to stronger results for career path prediction
as well. Adding the linear projection increases
performance in general, although recall@10 seems to go down
a bit in some cases. Finally, we show that skill-based and
text-based prediction are complementary, as the hybrid
approach reaches the overall best results on all metrics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion and Future Work</title>
      <sec id="sec-4-1">
        <title>We develop and release a new dataset of over 2,164 anony</title>
        <p>mous work histories annotated with ESCO occupations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>The dataset is unique in its focus on the free-form tex</title>
        <p>tual descriptions that come with work experiences in
resumes. Through this dataset, we
formulatedCareerBERT, a novel representation learning technique tailored
for work history texts. We study diferent approaches
to trainCareerBERT and find non-trivial quality
differences. The strongest performance for both industry
classification and career path prediction is obtained using
the CareerBERT-ALL strategy, which is in line with our
expectations when designing the diferent strategies. Our
research yielded two distinct models: a skill-based and
a text-based model for career path prediction. Next to
the textual information, underlying skills and the match
between current skills and skills for future jobs plays
an important role. Combining both text-based and
skillbased predictions turns out to work best due to their
information being complementary.</p>
      </sec>
      <sec id="sec-4-3">
        <title>We left out the period and duration of work experiences</title>
        <p>from our experiments, but this would be interesting to
include in future work. Furthermore, future work might
investigate how more of the structured information in
the ESCO ontology could be leveraged to increase the
performance of career path prediction even more.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>We thank the anonymous reviewers for their valuable feedback. This project was funded by the Flemish Government, through Flanders Innovation &amp; Entrepreneurship</title>
        <p>Baseline
Reverse history
0.211
26.37
26.49
Skill-based Prediction
Skill-based prediction
0.211
29.04
35.24
Text-based Prediction
Pretrained
Pretrainedproj
CareerBERT-FULL
CareerBERT-FULLproj
CareerBERT-LAST
CareerBERT-LASTproj
CareerBERT-ALL
CareerBERT-ALLproj</p>
      </sec>
      <sec id="sec-5-2">
        <title>We observe that the baseline using reverse history reaches 26.37% recall@5 and only 26.49% recall@10, which reflects the limited information available in this</title>
      </sec>
      <sec id="sec-5-3">
        <title>A logarithmic plot of all ESCO occupation frequencies in the dataset is shown inFig. 5 below.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>B. CareerBERT Training Details</title>
      <sec id="sec-6-1">
        <title>The contrastive training is implemented using the pop</title>
        <p>ular SBERT implementation [15]. We keep the default
value of 20 for the “scale” hyperparametearlpha. The
positive pairs are randomly shufled into batches of 16.</p>
      </sec>
      <sec id="sec-6-2">
        <title>We use the AdamW optimizer with a learning rate of</title>
      </sec>
      <sec id="sec-6-3">
        <title>2e-5 and a “WarmupLinear” learning rate schedule with a warmup period of 5% of the training data. Automatic mixed precision was used to speed up training. All experiments where performed using an Nvidia T4 GPU.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>C. GPT-3.5 Prompt For Experience Reformatting</title>
      <sec id="sec-7-1">
        <title>Below, the exact prompt used to rewrite the working</title>
        <p>histories is shown. The prompt makes use of the
conversational interface of the GPT-3.5 model, and consists
of only one user message. The position in which the
original text is inserted is indicated in the prompt with
text.</p>
        <p>User: ## Resume
text
## Task
Rewrite the working history with the following
format:
Role: &lt;role&gt;
Start: &lt;start&gt;
End: &lt;end&gt;
Description: &lt;description&gt;</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>