1. Introduction

Recommender Systems, September

Prediction using Resume Representation Learning and Skill-based Matching

Jens-Joris Decorte

R@10 R@5 0 1 2 3

Jeroen Van Hautte

jeroen@techwolf.ai 0 2 3

Johannes Deleu

johannes.deleu@ugent.be 0 1 3

Chris Develde r

chris.develder@ugent.be 0 1 3

Thomas Demeester

thomas.demeester@ugent.be 0 1 3

Career Path Prediction, Resume Representation Learning

0 ( Jeroen Van 1 Ghent University - imec , 9052 Gent , Belgium 2 TechWolf , 9000 Gent , Belgium 3 ropean Skills , Competences, Qualifications and Occupa-

2023

1 8 22

The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach RecSys in HR'23: The 3rd Workshop on Recommender Systems for

prediction task as follows given a career history i e

1. Introduction

next step in a career is known as career path predictionp.ath prediction algorithms.

It is well-known that person-job fit has a positive impact both employers and job titles [4, 5]. Relying on only on both job satisfaction and job performance1[, 2]. Also, sparse features, such as job title and company names, employment plays a large role in most people’s lives and necessitates large amounts of career trajectories in order has an important impact on their well-being3[]. Thus, to learn meaningful (graph) representations from them. providing people with next steps at the right time in However, as such career data constitutes personal infortheir career that are both inspiring and suited to theimration, most research relies on closed datasets, often experience is important for job satisfaction, productivityproprietary to a company. Hence, there is a lack of open and well-being of workers. The task of predicting the datasets for the development and evaluation of career

While it is closely related to job recommendation, career

path prediction does not recommend specific job ads to

We believe that the career path prediction task can benefit from as of yet untapped unstructured data sources, turnover prevention to internal job mobility. candidates, but rather aims to predict the next role in ani.e., the free-form textual descriptions of past work expeindividual’s career. Such a role is typically characterizerdience in resumes. Concretely, we propose a relatively by a company name, job title and optional attributes such small, anonymous dataset of textual career histories from as salary or location. Being able to predict next steps inresumes, enriched with structured occupation labels from individual’s careers has many applications, ranging froma predefined ontology. For the latter we adopt the EuCommon approaches to career path prediction relytions (ESCO) [6]. In this paper, we define the career path on large amounts of career history data, and structure all career transitions into a large graph that contains nEvelop-O LGOBE Human Resources, in conjunction with the 17th ACM Conference on ∗Corresponding author. 1, 2, … ,

−1 ), predictthe ESCO occupation label of the held-out next experience. We believe that by focusing on the prediction of the next occupation, such a system can help in recommending relevant next jobs or providing clarity on internal mobility at employers in the future. Our main contributions are: • We create, annotate and publis1ha dataset of 2,164 We propose a new way of learning expressive representaanonymous career histories across 24 diferent indus- tions of textual career histories calleCdareerBERT withtries (§ 3). The career histories are structured as a lisotut the need for resume, job pairs. InsteadC, areerBERT of work experiences described in free-form text. Eachrelies on textual career histories and their corresponding experience is annotated with corresponding ESCO oc-ESCO occupations labels only.

cupation. • We show how the parallel information present in the 2.2. Career Path Prediction textual career histories and in the occupation ontol-In the field of career path prediction, large scale data from ogy provides opportunities to train a domain-specific social networks (LinkedIn) has been an important source text representation model§( 4) that can be used down- of information [11, 5, 4]. An early work on career path stream for the career path prediction task, under aprediction focused on four distinct career paths - softconstrained dataset size. ware engineering, sales, consulting, and marketing11[]. • We show how the hybrid approach of combining text- They simplified these paths into four stages of seniority based and skill-based prediction achieves the strongestand normalized LinkedIn job titles accordingly for the results (§ 5) for our task, thus demonstrating the value prediction task. While the specific dataset is not publicly of injecting skill ontology information into the model available, they extracted demographic, psycholinguistic, (as opposed to using purely text-based models). and topic-related features from social media content to enhance their predictions. An extended approach that predicts multiple future job titles and company changes 2. Related Work ahead, rather than just the next step was proposed b5y][.

They utilized a proprietary dataset of 300,000 resumes,

2.1. Resume Representation Learning allowing them to delve deeper into career trajectory analysis, but only used job titles and companies as features for We believe that expressive representations of resumesthe task at hand. Another approach to career path prediccan benefit many HR-related tasks such as job recommen- tion uses an LSTM to represent both profile context and dation and career path prediction. Building qualitativecareer path dynamics, leveraging a LinkedIn dataset to resumes representations is challenging due to the semi-predict both the next company and job title 1[2]. Massive structured nature of resumes. Resumes tend to contain amounts of resumes (+459k) have been used to predict similar sections, but within each section, the text is typi- job mobility patterns using a heterogeneous companycally unstructured. Current works on capturing resumesposition network constructed from the resumes’ career into more structured representations mostly focus on ex-trajectory data, providing insights into career transitions tracting only a subset of information present in resumes.and progression 4[]. All aforementioned methods rely As a result, many approaches focus on just a subset of in-on extensive collections of resumes and overlook the information present in resumes. The Job2Vec model learns formation embedded within the free-form text that is job title representations based on a graph of thousandspart of work experience sections. In contrast, our work of career paths in the IT and Finance7][, but completely leverages this text to enable new methods, that do not ignores the unstructured description linked to the expe-require massive-scale datasets and interaction graphs, riences. Another interesting work develops a similarityas the textual content could ofer a richer context for measure between careersS(imCareers) as a sequence understanding career progression. alignment metric between sequences of positions 8[].

This work does use the unstructured summaries, but

only after applying keyword extraction on them. 3. Anonymous Career Path Dataset Only a minority of works aims to capture the full job position information and typically relies on matched pairsWe reuse the set of anonymous resumes1[3, 14], gathered of resume text and job ads. Examples of this are9[] that from Kaggle,2 which contains 2,482 anonymous resumes, train a siamese adaptation of convolutional neural netb-oth in textual form and as pdf files. These anonymized work. A more recent work uses contrastive learning ofresumes were originally collected from an online portal, a sentence-transformer model between correspondingand are based on diferent profiles that applied on the resume, job ad pairs1[0]. The downside of these methods platform to jobs from 24 diferent industries. In § 3.1, is efectively the need for a job recommendation dataset, we detail how we transformed these resumes into strucwhich is hard to get access to, and may contain unex- tured lists of experiences, each with their respective job pected biases depending on how the data was gathered.title, experience summary, time period and ESCO occu1https://huggingface.co/datasets/jensjorisdecorte/ anonymous-working-histories

2https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset

pation counterpart. Then§, 3.2 summarizes the main From that uniform text format, we then easily parse the characteristics of the obtained dataset. text into a JSON structure combining the title, description, start and end date. Finally only profiles with 2+ 3.1. Dataset Construction experiences are retrained, after which 2,164 career histories remain. The quality the rewritten text from GPT-3.5 We parse structured career histories from the resumewsas validated on 100 individual resumes. Although some in free-text form, as written by their authors. Such ca-sentences were rephrased slightly, the rewritten text was reer history is composed of a sequence oefxperiences found to be accurate overall. ex1 … ex , each defined as a title and description and the time period it covered. The length of a career his- Enrich with Occupation Labels: Every experience in tory may obviously difer across resumes. We supple- our dataset is enriched with its corresponding occupation ment each individual experiencex with a corresponding out of all 3007 ESCO occupations available. We use a ESCO occupation labelocc . Next, we detail how we ex- proprietary classifier that is able to accurately classify tract the title and descriptions from the full-text resumes,each experience based on its title and description. An as well as the process to obtain ESCO labels. extensive manual validation process on 10% of the dataset confirmed the accuracy of these labels as only 2.2% of Extract experience section: Since we observed that labels were found to be suboptimal. These ESCO labels the original dataset’s text format lacks structure, presum-are stored as part of the final dataset. Note that the ably due to PDF or HTML parsing artefacts, we prepro3-007 ESCO occupations do not capture all aspects of the cess the data to restore paragraph segmentation. Conr-oles, as they for example do not reflect diferent seniority secutive whitespaces were identified as suitable places levels within a role. Rather, they provide a high-level to insert newlines, which reconstructs a readable forc-ategorisation of jobs based on their performed activities. mat. Since we are only interested in the professional experience listed in the resume, we want to skip all of 3.2. Dataset Analysis the sections on “education”, “certifications”, “projects”, “skills”, “publications”, “awards”, “personal information”,The industries are relatively balanced across the dataset, “presentations”, etc. We thus manually inspected the re-with 18 out of 24 industries having between 90 to 108 resumes in the dataset to identify the section titles used, sumes. A detailed breakdown is included inAppendix A. and extract theexperiences of interest as the region in Figure 1shows the distribution of the number of experibetween one of the related experience headin3gsand the ences per career history. earliest subsequent section header. The length of the thus selected sections on average amounts to 59% of the original resume length. We successfully processed 2,473 out of all 2,484 resumes, discarding the remaining 11 low quality resumes.

Structure working experiences: The obtained work experience sections list the diferent roles, often in chronological order. Because the resumes are anonymized, experiences are annotated with general “Company Name” and “City, State” placeholders, which we thus neglect. Each experience contains a job title (typically on a separate line) and a paragraph describing the respective responsibilities. Finally, each experience contains the period in which it was performed, with start and end date (or “current”). The order in which title, pe

Figure 1: Histogram of the number of work experiences per riod and description are mentioned varies across resumes,

resume in our dataset. which makes it hard to uniformly define the separation (e.g., as a regular expression) between each experience in the text. Therefore, we rewrite the experience section in The ESCO occupations in our dataset follow a longa uniform format using theGPT-3.5 API (see Appendix C). tailed distribution, as can be seen in detail from the log-log plot in Appendix A. The most frequent 300 ESCO 3We found the following headings preceding the experiences of occupations represent a little over 80% of all experiences interest: “experience”, “professional experience”, “work history”, “work experience”, “relevant experience”, “relevant professionailn the dataset, while over 60% of ESCO occupations never experience”, “employment history”, “employment & experience”. appear in the dataset.

Career History (chronological left to right)

Title: Sales Associate Description: Greeted customers, determined their needs, maintained knowledge of sales and ...

ESCO: sales assistant

Title: Collections Specialist Description: Managed a high-volume of customer calls, evaluated and initiated alternative solutions ... ESCO: debt collector MODEL

Ranked ESCO occupations 1. 3. . .. 2. retail department manager ✔ financial risk manager debt collector ...

4. Career Path Prediction Models

4.1. Task Description We formalize career path prediction on our dataset as history up until then, as illustrated inFig. 2. Each caranking the full set of ESCO occupations by how suit- skills as ( occ). Given a career history with ESCO occuable they are as a next career step, based on the careepration labelsocc1, … , occ , we represent the skills of the reer history(ex1, … , ex ) corresponds to − 1

diferent prediction problems: for each experienceex except the ifrst one, its corresponding ESCO occupation labelocc serves as the true label to predict based on the preceding its attributes and structure to predict next jobs. In the

ESCO ontology, each occupationocc is linked to a set of

standardized skills, which is partitioned in skills that are either “essential” or “optional” foorcc. We denote such unified skill set combining both essential and optional full career as the union of all related skil⋃ls=1 ( occ ).

Finally, as a score to rank potential ESCO occupations occ, we define the skill match SKILLS of an experience history against a specific ESCO occupation as the frac experiences and any ESCO occupationocc and outputs a score, after which all ESCO occupations are scored against the experience histor(yex1, … , ex−1 ), and ranked − 1 experiences. More formally, we expect a scoring tion of skills linked to that ESCO occupation that are also function (( ex1, … , ex−1 ), occ) that takes a sequence of present in the union of skills associated with the work experience’ ESCO labels, i.e., both. should be the true labelocc . However, applications that rank recommended jobs to candidates can typically show from high to low scores. The highest scored ESCO label SKILLS((ex1, … , ex ), occ) = more than one recommended job. As such, we use rank- 4.3. Description-based Prediction based metrics with a focus on top 5 and top 10 ranked Our second model relies on the textual descriptions occupations, specifically Mean Reciprocal Rank (MRR), present in the career histories. Given a suficiently strong

To solve the ranking problem, in§ 4.2 we detail ap- possible to predict next roles based on what has been deproaches that use the information contained within the scribed in previous experiences. Two steps are necessary ESCO ontology. Next, § 4.3 presents a combination of for this model. First, a strong domain-specific representarepresentation learning and regression to tackle the probti-on model needs to be developed to accurately represent lem. Finally, § 4.4 describes a hybrid method combining career histories and ESCO occupations in the same space. text representation model, we argue that it should be |⋃=1 ( occ ) ∩ ( occ)| | ( occ)| 4.2. Skill-based Prediction

We hypothesize that job positions taken strongly rely on

the skills of the person, and thus intuitively expect that the career path prediction could benefit from informa- Career History Representation Learning tion on underlying skills. Such information is inherently a powerful domain-specific representation model for ca

To learn

present in ESCO, which captures both skills and job ti- reer histories, we make use of the parallel information tles. As the inferred ESCO labels for all experiences arethat is contained in our dataset. For each work experience available, we can make use of the full ESCO ontology, in the dataset, we have two textual descriptions, being (1) the self-reported title and experience description from Second, a mapping needs to be learned from the representation of a career history to the representation of relevant next ESCO occupations, through which the career path prediction task can be performed. doc1 <SEP> <SEP> doc2

CareerBERT-ALL doc1 doc2 <SEP> <SEP> doc1 <SEP> <SEP> <SEP> <SEP> <SEP> <SEP> embed

embed ( vec1 , vec2 ) <SEP> <SEP> embed

embed ( vec1 , vec2 ) embed

embed ( vec1 , vec2 ) the resume, and(2) the ESCO occupation title as well as cast only the lastESCO occupation intodoc2. its “description” field in the ESCO ontology. Inspired by this parallel textual data, we adopt a contrastive learn•- CareerBERT-ALL – given a career history, cast the ing strategy to finetune a sentence-transformer model sequence of self-reported experiences intdooc1. For (all-mpnet-base-v2)4 that was pretrained on over 1B En- each ESCO occupation in the sequence, cast it sepaglish sentence pairs 1[5]. We make use of multiple neg- rately into adoc2 text, generating as many pairs as the atives ranking loss with in-batch negatives, as proposed length of the sequence. by [16]. This training procedure only requires positive The CareerBERT-FULL is the typical scenario of conpairs (doc1, doc2) of corresponding textual documents. trastive learning in which we use two diferent (texWe format both an experience’s self-reported job titletual) representations of the same underlying information. and description and those for an ESCO occupation in the However, we suspect that this strategy might be limited same way, to embed them each with the chosen sentence- in its efectiveness, as properties like the length of the transformer (where we add the“esco” prefix only for text, or the amount of SEP tokens could already give away ESCO roles): the correct matching of pairs within a batch, without con(esco) role: <title> sidering the underlying meaning of the text. To counter description: <description> this expectation, theCareerBERT-LAST strategy is included. This strategy uses only the last ESCO label in

Since we want to represent full career histories anddoc2, thus avoiding the above mentioned risks. However, not just individual work experiences, multiple work ex- a risk with this strategy is that the representation of the periences are combined in one document, by concatenat- self-reported career history will focus only on the last ing the single experience representations (ordering thempart (the last experience). A final strategyC( areerBERTchronologically from oldest to most recent), separated by ALL) is thus included to counter this expectation. This the tokenizer’s reservedSEP token, which we denote as strategy is similar toCareerBERT-LAST, but duplicated ( 1, ⋯ , ). Now for each career trajectory, wefor each ESCO label in the sequence instead of only the want to create pairsd(oc1, doc2) of textual representations last one. We hypothesize that, bydoc2 randomly being of on the one hand the experiences as described in the one of the assigned ESCO labels, the representation of resumes, and on the other hand the ESCO-ontology coun- the self-reported career needs to be expressiveaolfl its terparts, to use in the contrastive training. For this, weexperiences. explore three diferent approaches (visualized inFig. 3): Finally, note that each contiguous subspan of a career • CareerBERT-FULL – given a career history, cast history is a plausible career trajectory, and for each histhe sequence of self-reported experiences intdooc1 and tory with experiences, there exist⋅( +12) such spans. cast the corresponding sequence of ESCO occupations We use this insight the vastly increase the number of into doc2. career trajectories that can be used in this representation learning stage. • CareerBERT-LAST – given a career history, cast the sequence of self-reported experiences intdooc1 and

Linear Projection As a second stage of the text4https://huggingface.co/sentence-transformers/all-mpnet-base-v2 based career path prediction, a mapping needs to be

Finally, we combine the above metrics SKILL and TEXT because we hypothesize that the signal of skill-based prediction and description-based prediction are complementary. Introducing just one hyperparamt er, our hybrid approach is defined as the weighted sum:

HYBRID = ⋅ TEXT + (1 − ) ⋅ SKILL.

5. Experimental results and Discussion

4.4. Hybrid Prediction learned from the career history representation to the5.1. Representation Learning Quality representation of the next ESCO occupation. Formally, given a text representation function , we need to An initial validation of theCareerBERT representation l(e arn a map).pWinhg ilefrmoomre(s(ophisticate d o1p,⋯tio, ns a−r1e))avtoail-tmhoedierlsefeisctpievrefnoersmseidnarsetporbeestetnertinungdcearrsetaenrdhaisntdorcoiemsp.aFroer able, we take the simple approach of learning a linearthis, we use the industry classification task as proposed transformation between both vectors, and optimize this in [13]. Each career history in our dataset is linked to one using the ordinary least squares regression. This proin- 24 total industries. The quality of the representation jection then allows us to write down the text-based model, when kept frozen and combined with a simple scoring function as follows: classification layer, should correlate with performance on this prediction task. We follow the same setup as1[3] which is to sample 80% of all histories for training and the other 20% for validation. This is measured across 10 dif TEXT(( 1, 2, … , ), ) ferent random splits. We use a one-vs-all support-vector = cosim( ((( 1, ⋯ , ))) , ( ) ) machine (SVM) for the classification. Table 2 shows the average accuracy across the 10 random runs, as well as with cosim(, ) ≜ ‖‖ ⋅⋅ ‖‖ tahneyi rfinesttaunndinagrdisdienvciluadtieodnfso.rTrheefeprreentrcae.inWede ombosdeerlvweitthhaotut

CareerBERT-ALL leads to the highest performance in this case.

We split our dataset randomly into a train, validation and5.2. Career Path Prediction test subset (80%/10%/10%), stratified along the industries We include a simple baseline system “reversed history” to maintain diverse profiles in each. The statistics of each which simply predicts the ESCO occupations present in subset are shown inTable 1. the input, ranked most to least recent. Our formulation of skill-based career path prediction has no parameters that

Career Histories Experiences can be tuned, so we directly report performance on the TVraaliidnation 1271270 7995172 test set. For the text-based prediction, no hyperparameTest 227 1050 ter needs to be tuned. Therefore, for eacChareerBERT strategy, we directly train the linear projection on the Table 1 combined train and validation set to report performance Statistics of the train, validation and test subsets of the on the test set. We include the pretrained encoder model dataset. without any finetuning for comparison. Also, for each text representation model, we measure rank-based re

The diferent CareerBERT models are trained on the sults with and without the linear projection, to estimate train subset, for a maximum of 2 epochs. During training, the impact of this stage. Finally, for the hybrid prediction we measure the loss on the validation set every 10% of method, the parameter needs to be tuned. We perform an epoch, and keep the best performing checkpoint. We a grid search for values between 0 and 1 with increments refer toAppendix B for further details about the trainingof 0.1 and measure performance for each value on the valprocedure. In the rest of this section, we first validate idation set, as shown inFig. 4. As text-based method for the quality of eachCareerBERT strategy through the this grid search, we decide to use thCeareerBERT-ALL industry classification task in § 5.1. Then the main task method as it seems to perform favorably. The projection of career path prediction is evaluated §in5.2. in this case is optimized on just the train set, as to not Pretrained CareerBERT-FULL CareerBERT-LAST CareerBERT-ALL

Accuracy (%) 61.82 ±1.70 67.14 ±1.72 66.40 ±1.37 68.94 ±1.70 overfit on the validation set for this grid search. Based simple baseline. The skill-based prediction method suron this grid search, the value for was set to 0.8 for passes the baseline with close to 9 %-points recall@10. best results. All results on the test set are compiled in Among the text-based prediction methods, we observe table Table 3. that CareerBERT-ALL performs strongest. This validates our assumption that stronger representation models (as represented on the industry classification task) indeed lead to stronger results for career path prediction as well. Adding the linear projection increases performance in general, although recall@10 seems to go down a bit in some cases. Finally, we show that skill-based and text-based prediction are complementary, as the hybrid approach reaches the overall best results on all metrics.

6. Conclusion and Future Work We develop and release a new dataset of over 2,164 anony

mous work histories annotated with ESCO occupations.

The dataset is unique in its focus on the free-form tex

tual descriptions that come with work experiences in resumes. Through this dataset, we formulatedCareerBERT, a novel representation learning technique tailored for work history texts. We study diferent approaches to trainCareerBERT and find non-trivial quality differences. The strongest performance for both industry classification and career path prediction is obtained using the CareerBERT-ALL strategy, which is in line with our expectations when designing the diferent strategies. Our research yielded two distinct models: a skill-based and a text-based model for career path prediction. Next to the textual information, underlying skills and the match between current skills and skills for future jobs plays an important role. Combining both text-based and skillbased predictions turns out to work best due to their information being complementary.

We left out the period and duration of work experiences

from our experiments, but this would be interesting to include in future work. Furthermore, future work might investigate how more of the structured information in the ESCO ontology could be leveraged to increase the performance of career path prediction even more.

Acknowledgments We thank the anonymous reviewers for their valuable feedback. This project was funded by the Flemish Government, through Flanders Innovation & Entrepreneurship

Baseline Reverse history 0.211 26.37 26.49 Skill-based Prediction Skill-based prediction 0.211 29.04 35.24 Text-based Prediction Pretrained Pretrainedproj CareerBERT-FULL CareerBERT-FULLproj CareerBERT-LAST CareerBERT-LASTproj CareerBERT-ALL CareerBERT-ALLproj

We observe that the baseline using reverse history reaches 26.37% recall@5 and only 26.49% recall@10, which reflects the limited information available in this A logarithmic plot of all ESCO occupation frequencies in the dataset is shown inFig. 5 below. B. CareerBERT Training Details The contrastive training is implemented using the pop

ular SBERT implementation [15]. We keep the default value of 20 for the “scale” hyperparametearlpha. The positive pairs are randomly shufled into batches of 16.

We use the AdamW optimizer with a learning rate of 2e-5 and a “WarmupLinear” learning rate schedule with a warmup period of 5% of the training data. Automatic mixed precision was used to speed up training. All experiments where performed using an Nvidia T4 GPU. C. GPT-3.5 Prompt For Experience Reformatting Below, the exact prompt used to rewrite the working

histories is shown. The prompt makes use of the conversational interface of the GPT-3.5 model, and consists of only one user message. The position in which the original text is inserted is indicated in the prompt with text.

User: ## Resume text ## Task Rewrite the working history with the following format: Role: <role> Start: <start> End: <end> Description: <description>