<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Job Posting-Enriched Knowledge Graph for Skills-based Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maurits de Groot∗</string-name>
          <email>maurits.degroot@live.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jelle Schutte</string-name>
          <email>jelle.schutte@randstad.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Graus</string-name>
          <email>david.graus@randstadgroep.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leiden University</institution>
          ,
          <addr-line>Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Randstad Groep Nederland</institution>
          ,
          <addr-line>Diemen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Randstad</institution>
          ,
          <addr-line>Diemen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <issue>2021</issue>
      <abstract>
        <p>The labor market is constantly evolving. Occupations are changing, being added, or disappearing to fit the needs of today's market. In recent years the pace of this change has accelerated, due to factors such as globalization, digitization, and the shift to working from home. Diferent factors are relevant when selecting employment, e.g., cultural fit, compensation, provided degree of freedom. To successfully fulfill an occupation the gap between required (by the job) and possessed (by the job seeker) skills needs to be as small as possible. Decreasing this skill-gap improves the fit between a job candidate and occupation. In this paper we propose a custom-built Skills &amp; Occupation Knowledge Graph (KG) that fits the above described dynamic nature of the labor market, by leveraging existing skills and occupation taxonomies enriched with external job posting data. We leverage this KG and explore several applications for skillsbased matching of jobs to job seekers. First, we study link prediction as a means to quantify relevance of skills to occupations, which can help in prioritizing learning and development of employees. Next, we study node similarity methods and shortest path algorithms for career pathfinding. Finally, we leverage a term weighting method for identifying which skills are most “distinctive” for diferent (types of) occupations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Ontology engineering; • Theory
of computation → Graph algorithms analysis; • Information
systems → Content analysis and feature selection.
labor market, skill matching, knowledge graphs</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        In recent years the number of people that change their job is
increasing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the average duration of a position is shorter [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and
the total working population is growing [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Due to increasing
globalization, the number of possible job candidates per position
is higher. And candidates enjoy, on average, a higher level of
education compared to a number of years ago [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This results in a
rapidly increasing number of potential job candidates and the labor
market is more competitive than it has ever been [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
∗Work done while on internship at Randstad Groep Nederland.
      </p>
      <p>
        In addition, with demand of skills changing over time, having
the correct skills for specific occupations is more crucial than ever.
The increasing amount of digitization has made computer skills
more valuable [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The COVID-19 pandemic has resulted in a
double-disruption efect where technological adoption is
accelerated and companies lay of employees [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Most aging workers
do not posses the newly required technical skills which leads to
lower job opportunities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Not only technical skills are important,
having good people skills is becoming increasingly important as
well [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The volatility in the labor market results in a change of
occupations with new required skills, and being able to keep up with
the latest developments is a challenge. To find relevant vacancies
and job postings, individuals can use external services to match
their skills with their desired work. In 2019, employment
agencies were responsible for fulfilling 10% of the available jobs in the
Netherlands [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        As explained above, in recent years the labor market has become
more competitive, and requirements more dynamic. As a result of
this, there is a rising interest in skill-based matching of candidates
to jobs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], as the desired profiles for a given occupation are no
longer static and unambiguous.
1.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem Statement</title>
      <p>To facilitate candidate to job posting matching, it is important to
know which skills are relevant, in demand, and in supply. Here, the
need for a flexible data representation for skills arises. This
representation should facilitate various tasks, such as a skills similarity
metric to be able to quantify likeliness between skills,
skills-tooccupation similarity metrics, to help people navigate the labor
market and find new occupations, and understanding which skills
relate to which occupations to inform which skills are needed for
desired occupations. And since relations between skills and
occupations are not static and need robust and accurate updating methods
to ensure the information does not get outdated.</p>
      <p>In this paper we address the task of skills and occupation graph
construction which we describe in Section 2, and apply this data
representation to the following set of use-cases: link prediction
for identifying novel skills-occupation relations in Section 3,
skillsbased occupational similarity for career pathfinding in Section 4,
and identifying distinctive skills per occupational group for learning
&amp; development in Section 5.
2</p>
    </sec>
    <sec id="sec-4">
      <title>KNOWLEDGE GRAPH CONSTRUCTION</title>
      <p>Our Skills &amp; Occupational KG is based on existing structured data,
more specifically, we combine the ISCO (occupations) and ESCO
(skills) taxonomies (bottom row in Figure 1). Next, we enrich this
existing data with information from noisy, unstructured job
postings (top row in Figure 1) to ensure our KG represents the current
state of the labor market.</p>
      <p>Job Postings</p>
    </sec>
    <sec id="sec-5">
      <title>Occupations (ISCO) and skills (ESCO)</title>
      <p>
        The first step involves constructing a shared Skills &amp; Occupational
Knowledge Graph, through combining the existing ISCO and ESCO
taxonomies.
2.1.1 ISCO (occupations). The International Standard
Classification of Occupations (ISCO) is ordered as a taxonomy of
occupational groups with four granularity levels across ten diferent major
groups. An occupation is defined as “a set of jobs whose main tasks
and duties are characterized by a high degree of similarity”, where
a job is defined as “a set of tasks and duties performed, or meant to
be performed, by one person, including for an employer or in
selfemployment.” [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Take, for example: the occupation “computer
programmer,” which is defined by the level 4 ISCO code: 2132. The
occupation then belongs to the the level 3 group “computing
professionals” (ISCO-code 213), which in turn belongs the level 2 group
“computing, engineering and science professionals” (ISCO-code 21),
which, finally, falls in the level 1 group “professionals” (ISCO-code
2).
      </p>
      <sec id="sec-5-1">
        <title>Group Number</title>
      </sec>
      <sec id="sec-5-2">
        <title>Major Group Name</title>
      </sec>
      <sec id="sec-5-3">
        <title>Managers</title>
        <p>Professional
Technicians and associate professionals
Clerical support workers
Service and sales workers
Skilled agricultural, forestry and fishery workers
Craft and related trades workers
Plant and machine operators, and assemblers
Elementary occupations</p>
        <p>
          Armed forces occupations
2.1.2 ESCO (skills). We define our initial high-level occupation
groups by using the ISCO standard. For skills, we turn to The
European Skills, Competences, Qualifications and Occupations (ESCO)
taxonomy [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. ESCO defines a skill as follows:
Skill “the ability to apply knowledge and use know-how to
complete tasks and solve problems”
        </p>
        <p>The ESCO covers 13,485 skills, connected to 2,942 occupations
(in 27 languages).</p>
        <p>We link our ISCO occupations to ESCO by using the direct
links that are defined between ISCO level 4 groups (most
finegrained/lowest level of the taxonomy) and ESCO concepts, in the
ESCO. These links between ESCO and ISCO are not (necessarily)
1-to-1, as multiple ESCO occupations can be linked to a single (level
4) ISCO group.</p>
        <p>In Figure 2 we illustrate this connection between ISCO and ESCO.
ESCO occupations are shown in blue, with ISCO occupation groups
in purple. In addition to the ESCO occupations shown in the image,
ESCO also defines skills (not shown), e.g., the ESCO occupation
“Cattle breeder,” has skills linked to them such as “feed livestock”
and “assist animal birth.”
2.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>KG enrichment through job posting data</title>
      <p>Now that we have our high-level KG structure based on ISCO and
ESCO, which defines occupations and skills as nodes, and edges as
links between ESCO and ISCO objects, we turn to job posting data
to account for the dynamic nature of associations between skills and
occupations, as described in Section 1. To make sure our KG reflects
the current status of the labor market, we use information from
job postings to enrich the structure of our KG. More specifically,
we create additional edges by identifying and extracting ESCO
skills for each job posting’s ISCO occupation group, and assign
weights to edges by relying on co-occurrence statistics of skills and
occupations.</p>
      <p>
        This second step of our process revolves around extracting skills
from job postings. We describe our job posting dataset in Section
2.2.2, our approach for skill extraction in Section 2.2.2, and how we
match extracted skills to ESCO skills in 2.2.3.
2.2.1 Vacancy data. Our vacancy dataset consists of sample of
600,000 Dutch vacancies collected by Jobdigger [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], each job
posting is labeled with a level 4 ISCO code. Our sample was chosen
Skil from Job
      </p>
      <p>Posting</p>
      <p>
        Normalized
candidate skil
n-gram of
candidate skil
by selecting a uniform distribution of ISCO level 1 occupations, to
make sure our set covers the entire breadth of the labor market.
Prior to sampling our set at the ISCO level 1, the initial dataset was
cleaned by discarding low quality and noisy job postings, such as
postings that represented multiple occupations, or job postings that
contained a low number of sentences. Here, we treat vacancy data
as a proxy for the demand in the job market. By doing so, internal
promotions and career paths and informal channels are not taken
into account.
2.2.2 Skill Extraction. For skill extraction we rely on the
industrystandard Textkernel Extract [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] parser. For each vacancy text,
Textkernel Extract returns a json object with corresponding skills,
represented by the surface form identified in the job posting (skill
mention), a unique identifier representing the skill (skill id), and
ifnally, a confidence score that quantifies the likelihood of the
extracted skill to be correct.
2.2.3 Skill Matching. Given the skills extracted by Textkernel, we
match them to the skill nodes in our KG, by relying on the surface
forms of the skills (skill mentions). More specifically, we leverage
character -grams Jaccard similarity between the normalized skill
mention and the normalized ESCO skill names. We set the similarity
threshold to 0.66, which was empirically determined to be optimal
using a smaller set of our 39, 758, 827 Textkernel skills to ESCO
skill-mappings. The high-level process is shown in Figure 3.
2.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Final Skills &amp; Occupational Knowledge</title>
    </sec>
    <sec id="sec-8">
      <title>Graph</title>
      <p>Our final KG, resulting from the process shown in Figure 1 and
described in the previous section, consists of 1,220 nodes, of which
983 represent (ESCO) skills, and 237 (ISCO) occupations. These
nodes are connected through 3, 910 edges, with an average node
degree of 6.4.</p>
      <p>This KG is a subset of the full ESCO (13.485 skills), and ISCO
(436 occupations) taxonomies. There are several reasons why our
KG is a subset and does not span the entirety of the ISCO and ESCO
taxonomies.</p>
      <p>First, it is conceivable that not all ISCO occupations are in
current demand, e.g., we found that there were no vacancies for ISCO
occupation code 8111: “mining-plant operators,” which is not
surprising with currently no mines in operation in The Netherlands.
Next, it is likely we are dealing with coverage issues, from (i) the
likely incomplete coverage of the TextKernel Extract method we
use for skill extraction, and (ii) our skills matching methodology
further reducing the number of identified skills. As the focus of this
paper is on downstream applications, we consider matching out
of scope, and rely on our naive but solid character -grams-based
method.
3</p>
    </sec>
    <sec id="sec-9">
      <title>KG COMPLETION USING LINK</title>
    </sec>
    <sec id="sec-10">
      <title>PREDICTION</title>
      <p>One of the challenges of modeling skills and occupations is the
dynamic nature of the labor market. In this section we explore
our first down-stream application of our data-driven dynamically
constructed Skills &amp; Occupation Knowledge Graph: matching
occupations to skills. We focus on discovering novel connections
between skills and occupations through leveraging the structure of
our knowledge graph enriched with job posting data.</p>
      <p>More specifically, in this section we compare link prediction
algorithms, to quantify the relatedness between a skill and
occupation node, in order to discover novel connections between skills
and occupations, not present in our initial KG. We describe our two
link prediction methods in the following sections, the first,
Preferential Attachment, is described in Section 3.2.1, next, Node2Vec is
described in Section 3.2.2
3.1</p>
    </sec>
    <sec id="sec-11">
      <title>Experimental setup</title>
      <p>We employ link prediction to estimate the relatedness between skills
and occupation nodes. To evaluate and reliably compare diferent
methods, we first split our KG into train, test, and validation sets.
More specifically, we sample 55% of all edges for training the link
prediction algorithms (where applicable), leaving leave 30% for
testing, and 15% for validation. For each existing pair of occupation
and skills node — which we consider a positive sample in our train,
test and validation sets — we randomly generate a negative sample
(i.e., a pair of skills and occupation nodes that do not exist in our
KG). An overview of the number of edges in each set is shown in
Table 2.</p>
      <sec id="sec-11-1">
        <title>Positive</title>
      </sec>
      <sec id="sec-11-2">
        <title>Negative</title>
      </sec>
      <sec id="sec-11-3">
        <title>Training edges Validation edges Test edges Total</title>
        <p>
          3.2.1 Method 1: Preferential Atachment (PA). The first link
prediction method is preferential attachment [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This method takes a
set of nodes, i.e. node  and node , and calculates a closeness ()
between two nodes:
 (,  ) = |Γ () | × |Γ ( ) |,
where Γ () denotes the neighbors of .
        </p>
        <p>A higher score here corresponds to a larger probability the nodes
are connected. The intuition behind this is that if both nodes have
a high amount of neighbors the nodes might function as a hub.
Most graphs have the property that hubs have a higher chance to
be connected.</p>
        <p>
          To compute all scores, we represent our KG as a matrix, where
each node is represented as a row and a column. Note that this
matrix is symmetric since the value for row  and column  is equal
to the value at row  and column . At the intersecting cell of two
nodes, we store the preferential attachment. We normalize this
matrix by dividing each score by the maximum Closeness score, to
ensure that each value is between 0 and 1. We consider the resulting
normalized Closeness score as the probability the corresponding
nodes are related.
3.2.2 Method 2: Node2Vec (N2V). The second link prediction method
we use is the Node2Vec algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This algorithm can have
a number of configurations. For this paper we use the following
parameters:
• dimensions = 1024
• walk length = 4
• number of walks = 2500
•  (return parameter) = 1
•  (in-out parameter) = 1
        </p>
        <p>These parameters were selected after a grid search on a large
number of possible combinations of parameters.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>
        When the number of positive and negative edges in the test set
is equal, PA outperforms the more complex N2V method, with an
f1-score for the positive class of 0.78 against 0.65. In most realistic
situations however, we may want to explore how a node can be
linked to any other node, making the number of comparisons, or
edges to predict 1-to-(N-1), i.e., for each node we compare each other
node (excluding self). To approximate this real world performance
the ratio of negative to positive edges should reflect these more
realistic proportions. To do so we compute F1-score at increasing
ratios of positive-to-negative edges, ranging from 1 (as shown in
Table 3) to 7. Results are shown in Figure 4. The figure shows that
up to ratio of 3:1, N2V is on par with PA, but as ratios increase,
N2V outperforms PA, suggesting N2V is better suited for most real
world situations.
1.0
0.8
s
s
a
l 
c
ive0.6
t
s
o
p
 
e
h
tf  
o
re0.4
o
c
S
­
1
F
0.2
Now that it has been established that N2V is more suitable for our
task, we aim to employ this algorithm to predict the relationships
between occupations and skills. When doing so we need to realize
that the graph which we use as input is imperfect in terms of
correctness and completeness [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>Looking at the false positives of the algorithm, skills that are —
according to our dataset — incorrectly linked to occupations can
be identified. For KG completion, we aim to identify those skills
that are not linked to occupations, but should be. Table 4 shows a
random sample of False Positives: it reinforces our intuition that
link prediction can be employed for KG completion, as some of the
predicted edges make sense, e.g., the skill: “preparing materials for
dental procedures” is shown as a relevant skill for the occupation:
“dentist.” By consulting domain experts, skills can be eficiently
added to enrich the current graph.</p>
      <p>To further explore these intuitions, in Figure 5 we show the edges
to skill nodes predicted by N2V, for the node representing ISCO
code 2611: “Lawyers.” The y-axis shows skills edges, and the x-axes
show the link prediction probabilities, for all predictions with a
probability&gt;0.5 (i.e., positive predictions by the method). The green
bars denote True Positives (i.e., correctly predicted edges between
the skill and occupation), and blue bars depict False Positives (skills
that are predicted to have an edge with the occupation, but do not
exist in our KG). The figure shows “education law” and
“investigation research methods” as newly identified skills for lawyers, not
found in the original ESCO taxonomy nor in co-occurrences in job
postings.
4</p>
    </sec>
    <sec id="sec-13">
      <title>CAREER PATHFINDING USING SHORTEST</title>
    </sec>
    <sec id="sec-14">
      <title>PATH ALGORITHMS</title>
      <p>
        According to recent data (2019) 1.1 million people switched
occupation in the Netherlands [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. When transitioning between one job to
another, the gap between both jobs cannot be too large. This gap
can be considered too large if the required skills for one, difers too
      </p>
      <sec id="sec-14-1">
        <title>ISCO-Code Occupation Predicted Skill 1341</title>
        <p>much from the other. Consequently, occupations that share a large
number of skills should be easier to transfer between. In this
chapter we focus on leveraging skills for better informing transitions
between occupations. More specifically, we aim to leverage the KG
structure for matching occupations with occupations, to identify
how an individual can change jobs in the most optimal way.</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>4.1 Skills-based Occupation Similarity</title>
      <p>To determine the feasibility of an occupation transfer, we propose to
model the distance between occupations with Jaccard distance. We
compute Jaccard distances between occupations by representing
each occupation as the set of its required skills (which we extract
from our KG), and computing the overlap between two sets of skills.
See Figure for an illustration 6.</p>
      <p>In our KG a total of 120, 952 links can be made between pairs of
skills and pairs of occupations. From these pairs 89.3% is between
skills and 10.7% between occupations. To gain insight in the overall
similarity of skills and occupations, we study the distribution of
jaccard distances in Figure 7.</p>
      <p>Looking at the distribution of Jaccard distance one can see that
on average, skills are more similar to one another than occupations.
This becomes apparent when looking at the mean value of both
distributions: for occupations the mean is 0.96, and for skills around
A</p>
      <p>C
0.5
D
0.5
B</p>
      <p>E
F
0.88. Over 99% of occupations have a Jaccard distance between
0.8 and 1, meaning that occupations require distinct skillsets. Both
distributions are skewed to the left, meaning that the mean (average
of the observations) is left of the mode (most observed value).</p>
      <p>In the distribution we see a number of spikes, which can be
explained by the prevalence of some fractions over others, e.g., if
half of the neighbors are shared, the Jaccard distance will be 12 ,
which can be achieved in a number of diferent ways. Other spikes
occur at additional common fractions such as 23 and 34 .</p>
      <p>In Table 5 we show a description of the distance distributions.
For both skills and occupations the minimum distance is 0, meaning
that a skill is shared by every occupation where the skill is
connected to or that two occupations share every skill. An example is
“Food service counter attendants” and “Hotel receptionists,” both share
the same skillset and thus have a Jaccard distance of 0. Skills with
a distance of 0 are for example “Lop trees” and “Pruning techniques.”
count
mean
std
min
25%
50%
75%
max
107959
0.825
0.163
0.000
0.800
0.875
0.923
0.985
Skill
The highest distance found in the dataset is 0.993, this corresponds
with the occupations “Electronics engineers” and “Policy
administration professionals.” They share at least one skill but are — next to
the shared skill — completely diferent. The common skill in this
example is “perform project management.”
4.2</p>
    </sec>
    <sec id="sec-16">
      <title>Career Pathfinding using Dijkstra’s algorithm</title>
      <p>With the distances between each occupation and between skills,
we can proceed to identify the most eficient transition between
every pair of occupations. This is done by assigning the Jaccard
distance scores as edge weights between nodes in our graph, to
enable computational methods for finding the most eficient path
between a start node (the current occupation) and an end node (the
desired occupation). We show an example of such a transition in
Figure 8: here we set a threshold for the maximum possible distance
at 0.8. This threshold was determined to be optimal based on
eyeballing and comparing a diferent cutof points. If two occupations
are further apart than 0.8 we consider the step too large.</p>
      <p>W</p>
      <p>Z
0.2
0.4</p>
      <p>X
0.9
Y
0.6
0.5</p>
      <p>
        In this example we start at node  and want to go to node  .
We are not able to directly transition between  and  because
the occupations are not similar enough (0.9 &gt; 0.8).
4.2.1 Method. Finding the most eficient path in an undirected
weighted graph can be done by applying shortest path algorithms.
For this paper we turn to Dijkstra’s algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], because of its
proven speed and widespread availability of implementations.
According to Dijkstra’s algorithm, the shortest allowed path between
 and  in Figure 8 is via node  .
      </p>
      <p>We show a real world example in Figure 9. Due to the COVID-19
pandemic a lot of people find themselves out of a job, especially
individuals that work in restaurants. Using the described model
we can calculate which occupation has the smallest distance to
the occupation: “cook.” Dijkstra’s algorithm yields “bakers,
pastrycooks and confectionery makers” as most feasible transition.
Next to fine-grained analysis of occupations and skills, gaining
macro-level insights is an important task for monitoring and
understanding the labor market. The ISCO taxonomy provides multiple
levels of granularity, which allows us to aggregate the information
contained in our KG at diferent levels, too. In this section we
explore a method for identifying the most relevant skills occupations
(ISCO level 4) and aggregation of occupations (ISCO level 1-3). More
specifically, we match skills to occupations at an aggregated level.</p>
      <p>As we’ve seen in the previous section, diferent occupations may
share skills. Several skills, such as teamwork, are commonly required
for a large number of occupations, which can be considered generic
or sector-independent skills. At the other side, we may have highly
specialized skills, that are only required for specific occupations
or occupation groups. Whether a skill is specifically or generically
important can be quantified in diferent ways. For a skill to be
specific to an occupation or occupational group, we define two
criteria:
• A skill needs to be frequently required within its context
(occupation or occupation group).</p>
      <p>• A skill needs to be characteristic for its context.
5.1</p>
    </sec>
    <sec id="sec-17">
      <title>Method</title>
      <p>
        The two criteria described above fit naturally to the Term
Frequency–Inverse Document Frequency (TF-IDF) weighting scheme
for terms [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. This statistic is chosen as it directly models the
desired criteria described in the previous section, more specifically,
TF-IDF is used to assign weights to words in a corpus of documents,
where a word is deemed more important if it (i) is observed
frequently within the document but (ii) not frequently across diferent
documents in the corpus.

,
  −   (, ) =  , × log
  + 1
where  , denotes the Term Frequency of  in ,   denotes the
number of documents containing  , and  denotes the total number
of documents in the corpus.
      </p>
      <p>We “transplant” this TF-IDF weighting scheme from terms in
documents to skills associated to occupations. TF-IDF consists of
two parts: Term Frequency (TF) is the frequency of a word (skill)
used in a given document (observed with an occupation), Inverse
Document Frequency (IDF) is a way to discount highly common
terms, i.e., it is high when a word (skill) appears in a smaller
number of documents (observed with a low number of occupations).
Common terms (skills) will thus yield a lower IDF score.</p>
      <p>For our TF-IDF-based model, we consider skills identified in job
postings terms, and documents can be modeled as a collection of job
postings belonging to an ISCO group. The counts of skills, which
model term frequency, correspond to the number of times a skill is
found in a job posting associated to a certain ISCO code.
(2)
5.2</p>
    </sec>
    <sec id="sec-18">
      <title>Results and analysis</title>
      <p>5.2.1 Level 1 ISCO groups. The resulting score provides us with
skills that are common for a given occupation (group) but
uncommon in all other occupation(s) (groups). Table 6 shows the top 5
skills for the level 1 ISCO groups.</p>
      <p>In this table Microsoft Ofice appears both in the Managers and
Clerical support workers groups. For this skill to score high in
multiple contexts (occupation groups) the frequencies need to be
substantial in both, to be able to compensate for the IDF component of
the metric. In the Managers group, Microsoft Ofice has a TF of 9%
and in Clerical support workers a TF of 5%.
5.2.2 Multiple ISCO levels. ISCO level 1 helps us to understand
which skills are relevant for the least granular level; to deepen our
understanding we look at the development of multiple layers of
ISCO group 2 in Figure 10. Here, we show the 3 most relevant skills
for several ISCO levels of the “Professionals” ISCO group.</p>
      <p>We notice the following: First, communication-related skills
appear in multiple forms across occupation groups. The terms
communication, communication sciences, communication studies, ICT
communication protocols, manage online communications and
communication disorders seem to be closely related. Because these skills
are defined as distinct skills, each skill receives its own ranking.
This concept can appear multiple times.</p>
      <p>Next, “Nursing professionals” and “Nursing and midwifery
professionals” share the same set of relevant skills, which are highly
similar to those of their parent group “Health professionals”. Skills
that appear in those groups are the most frequent skills in the parent
group.</p>
      <p>Finally, the further down the figure we go, the more specialized
the skills appear to be, and more specialized skills, such as “dental
Professionals (2)
Network marketing
Manage online communications</p>
      <p>Communication</p>
      <p>Health professionals (22)
Coordinate care
Have computer literacy
Citizen involvement in healthcare</p>
      <p>Teaching professionals (23)
Communication sciences
Communication studies</p>
      <p>ICT communications protocols
Nursing and midwifery
professionals (222)
Coordinate care
Have computer literacy
Solve problems in healthcare</p>
      <p>Other health
professionals (226)
Radiofarmaceutica
Work analytical y
Analytical chemistry</p>
      <p>Other teaching
professionals (235)
Communication
Communication disorders
Microsoft Visio
Nursing professionals</p>
      <p>(2221)
Coordinate care
Have computer literacy
Solve problems in healthcare</p>
      <p>Dentists (2261)
Dental studies
Lead the dental team
Handle payments in dentistry</p>
      <p>Pharmacists (2262)
Radiopharmaceuticals
Work analytical y
Analytical chemistry</p>
      <p>Special needs teachers</p>
      <p>(2352)
Education law
Communication disorders
Pedagogy
studies,” are more commonly observed in level 4 ISCO groups. A
possible explanation for this is that specialized skills do only appear
at specialized occupations.
6</p>
    </sec>
    <sec id="sec-19">
      <title>CONCLUSION</title>
      <p>In recent years the labor market has changed drastically. This is
mostly due to increased globalization, a growing working
population and disappearing jobs due to digitalization. The COVID-19
pandemic has accelerated this change. This paper aims to explore
algorithmic and data-driven methods for exploring and improving
the fit between job seekers and vacancies by modeling skills and
occupation data in a knowledge graph. Modeling and leveraging
relationships between occupations and skills can provide insights
for job seekers with existing skill sets.</p>
      <p>After constructing our knowledge graph by relying on the
existing ISCO and ESCO taxonomies for occupations and skills, we
enrich our KG by relying on job posting data.</p>
      <p>We explore our final KG using three diferent applications.</p>
      <p>First, we study link prediction methods for quantifying the
relatedness between skills and occupations in Section 3. We compare
and evaluate two diferent link prediction methods, and find that
“Node2Vec” performs best. Next to quantifying relatedness between
occupations and skills for, e.g., ranking skills for an occupation or
using as edge weights in our KG, we explore Node2Vec for
identifying skills-to-occupation links that are not present in the original
KG.</p>
      <p>Next, in Section 4 we explore our KG for finding eficient job
transitions. When an individual is searching for a job, knowing
which occupations can help the search process. In our next
application we explore shortest path finding algorithms for identifying
potential careerpath prediction. We use a skills-based Jaccard
similarity metric to model distance between occupations. Furthermore,
we show examples of job transitions and study properties of our
KG by analyzing the distribution of distances between skills and
occupations.
1
2
3
4
5</p>
      <p>Finally, in Section 5 we study a method to determine which skills
are most relevant to diferent levels of aggregated occupations,
using the ISCO taxonomy. The skill relevance to an ISCO (group)
is calculated by taking the frequency of the skills being required
for an ISCO (group) with the uniqueness of the skill in the overall
ISCO taxonomy. Here, the uniqueness is high if a skill occurs more
often in one group compared to the other groups. The metric that
reflects this intuition is called “TF-IDF.” By doing so we construct a
birds-eye view of the labor market.</p>
      <p>The findings from the three sections described above are all
variations to the same theme, of finding or enabling the perfect fit
between a job seeker and a vacancy, by leveraging skills.
7</p>
    </sec>
    <sec id="sec-20">
      <title>DISCUSSION &amp; FUTURE RESEARCH</title>
      <p>
        In this paper we present diferent KG-driven applications for
skillsbased job matching. In principle, the methods presented are
dataagnostic, as long as similar concepts (occupations and skills) and
data (job postings with identified occupations and skills) are
available. More specifically, we leverage the ISCO and the ESCO
taxonomies, which are available in a large number of languages, and
are considered standards that are freely available. Other frameworks
could be used as well, where ESCO is widely used in Europe, the
O*NET framework [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] is often referred to as the de facto standard
in the United States.
      </p>
      <p>
        The outcome of any research is heavily dependent on the
available data. In the case of this research this data is preprocessed
in a number of steps, one of which is the skill matching step
described in Section 2.2.3. We opted for a naive character -gram
based method for matching surface forms found in a job posting
with skill names in ESCO. Obviously, more refined methods can be
employed, e.g., by considering additional representation of the skill
in the job posting (contextual words, occupation), and at the same
time additional context at the side of the KG (e.g., skill descriptions,
associated occupations, etc.). In general, this problem of matching
can be considered an entity linking task, which is considered out of
scope for this application paper. Having a flawed knowledge graph
as a result of sub-optimal prepossessing does not invalidate the
methods used. Whichever approach is used to create a knowledge
graph, the outcome will never be perfect [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Finally, two out of three applications of our KG are not validated
empirically: for both our shortest path finder (Section 4) and
identifying the most relevant skill per ISCO group (Section 5, we focused
on the analysis and interpretation of results, omitting a more formal
evaluation methodology. For future research it would be interesting
to benchmark the current against diferent career path prediction
models. Validating if, e.g., the discovered paths between
occupations indeed are practically the shortest one, requires additional
data. Unfortunately, no such data was available at the time of
writing. One place to acquire such data, is, e.g., by collecting data of
historic career paths. However, collecting such data and composing
was determined out of scope for this work. The same arose for the
method for quantifying the relevance of skills per ISCO group; these
aggregated insights were dificult to validate. We could imagine
involving human expert annotators to annotate which skills they
deem (most) relevant to a certain ISCO (group). However, similarly
to the above, collecting and analyzing such data did not fit in the
scope of our present work. In summary, our paper revolves around
studying algorithmic methods that aim to help both jobseekers and
recruiters find a better match between individuals and occupations,
we consider studies with actual end-users out of scope [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-21">
      <title>ACKNOWLEDGMENTS</title>
      <p>A special thanks to the thesis supervisors for the project Niels van
Weeren and Prof. Aske Plaat as well as everybody at Randstad
involved in this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>İ. Semih</given-names>
            <surname>Akçomak</surname>
          </string-name>
          ,
          <source>Lex Borghans, and Bas ter Weel</source>
          .
          <year>2011</year>
          .
          <article-title>Measuring and Interpreting Trends in the Division of Labour in the Netherlands</article-title>
          .
          <source>De Economist</source>
          <volume>159</volume>
          ,
          <issue>4</issue>
          (
          <issue>01</issue>
          <year>Dec 2011</year>
          ),
          <fpage>435</fpage>
          -
          <lpage>482</lpage>
          . https://doi.org/10.1007/s10645-011-9168-3 https://doi.org/10.1007/s10645-011-9168-3.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Pol</given-names>
            <surname>Antràs</surname>
          </string-name>
          , Luis Garicano, and
          <string-name>
            <surname>Esteban</surname>
          </string-name>
          Rossi-Hansberg.
          <year>2005</year>
          .
          <article-title>Ofshoring in a Knowledge Economy</article-title>
          .
          <source>Working Paper 11094. National Bureau of Economic Research</source>
          . https://doi.org/10.3386/w11094 http://www.nber.org/papers/w11094.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Evgeniy</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Constructing and Mining WebScale Knowledge Graphs: KDD 2014 Tutorial</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          (New York, New York, USA) (
          <source>KDD '14)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>1967</year>
          . https://doi.org/10.1145/2623330.2630803 https://doi.org/10. 1145/2623330.2630803.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Lex</given-names>
            <surname>Borghans</surname>
          </string-name>
          , Bas Ter Weel, and
          <string-name>
            <surname>Bruce</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberg</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>People Skills and the Labor-Market Outcomes of Underrepresented Groups</article-title>
          .
          <source>ILR Review 67</source>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>287</fpage>
          -
          <lpage>334</lpage>
          . https://doi.org/10.1177/001979391406700202 arXiv:https://doi.org/10.1177/001979391406700202 https://doi.org/10.1177/ 001979391406700202.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nicole</given-names>
            <surname>Bosch</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bas</given-names>
            <surname>Weel</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Labour-Market Outcomes of Older Workers in the Netherlands: Measuring Job Prospects Using the Occupational Age Structure</article-title>
          .
          <source>De Economist</source>
          <volume>161</volume>
          (06
          <year>2013</year>
          ). https://doi.org/10.1007/s10645-013-9202-8
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] centraal bureau voor de statistiek. [n.d.]. De arbeidsmarkt in cijfers. https: //www.cbs.nl/-/media/_pdf/
          <year>2020</year>
          /18/dearbeidsmarktincijfers2019.pdf https:// www.cbs.nl/-/media/_pdf/
          <year>2020</year>
          /18/dearbeidsmarktincijfers2019.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Edsger</surname>
            <given-names>W Dijkstra</given-names>
          </string-name>
          et al.
          <year>1959</year>
          .
          <article-title>A note on two problems in connexion with graphs</article-title>
          .
          <source>Numerische mathematik 1</source>
          ,
          <issue>1</issue>
          (
          <year>1959</year>
          ),
          <fpage>269</fpage>
          -
          <lpage>271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] european commission</article-title>
          . [n.d.]. ESCO handbook. https://ec.europa.eu/esco/ portal/document/en/0a89839c-098d
          <string-name>
            <surname>-</surname>
          </string-name>
          4e34
          <string-name>
            <surname>-</surname>
          </string-name>
          846c-54cbd5684d24 https://ec.europa. eu/esco/portal/document/en/0a89839c-098d
          <string-name>
            <surname>-</surname>
          </string-name>
          4e34
          <string-name>
            <surname>-</surname>
          </string-name>
          846c-54cbd5684d24.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Eurostat</surname>
          </string-name>
          . [n.d.].
          <article-title>Labour market transitions - annual data</article-title>
          . https://ec.europa. eu/eurostat/web/lfs/data/database https://ec.europa.eu/eurostat/web/lfs/data/ database.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] World Economic Forum.
          <year>2020</year>
          .
          <source>The Future of Jobs Report</source>
          <year>2020</year>
          . World Economic Forum, Geneva, Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Burning</given-names>
            <surname>Glass</surname>
          </string-name>
          . [n.d.].
          <article-title>Vacancy data</article-title>
          . https://www.jobdigger.nl/ https://www. jobdigger.nl/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          .
          <volume>855</volume>
          -
          <fpage>864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Francisco</surname>
            <given-names>Gutiérrez</given-names>
          </string-name>
          , Sven Charleer, Robin De Croon, Nyi Nyi Htun, Gerd Goetschalckx, and
          <string-name>
            <given-names>Katrien</given-names>
            <surname>Verbert</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Explaining and Exploring Job Recommendations: A User-Driven Approach for Interacting with Knowledge-Based Job Recommender Systems</article-title>
          .
          <source>In Proceedings of the 13th ACM Conference on Recommender Systems</source>
          (Copenhagen, Denmark) (
          <source>RecSys '19)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>60</fpage>
          -
          <lpage>68</lpage>
          . https://doi.org/10.1145/3298689.3347001
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] international labour ofice. [n.d.].
          <source>International Standard Classification of Occupations</source>
          . https://www.ilo.org/public/english/bureau/stat/isco/ https://www.ilo.org/ public/english/bureau/stat/isco/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>David</surname>
            Liben-Nowell and
            <given-names>Jon</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>The link-prediction problem for social networks</article-title>
          .
          <source>Journal of the American society for information science and technology 58</source>
          ,
          <issue>7</issue>
          (
          <year>2007</year>
          ),
          <fpage>1019</fpage>
          -
          <lpage>1031</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] OECD. [n.d.].
          <article-title>Employment by job tenure intervals - average tenure</article-title>
          . https://stats. oecd.org/Index.aspx?DataSetCode=TENURE_AVE https://stats.oecd.org/Index. aspx?DataSetCode=TENURE_AVE.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17] OECD. [n.d.].
          <article-title>FTPT employment based on national definitions</article-title>
          . https://stats. oecd.org/Index.aspx?DataSetCode=FTPTN_D https://stats.oecd.org/Index.aspx? DataSetCode=FTPTN_D.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O</given-names>
            <surname>*NET</surname>
          </string-name>
          . [n.d.]. O*
          <article-title>NET OnLine</article-title>
          . https://www.onetonline.org/ https://www. onetonline.org/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Knowledge graph refinement: A survey of approaches and evaluation methods</article-title>
          .
          <source>Semantic web 8</source>
          ,
          <issue>3</issue>
          (
          <year>2017</year>
          ),
          <fpage>489</fpage>
          -
          <lpage>508</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Gang</given-names>
            <surname>Peng</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Do computer skills afect worker employment? An empirical study from CPS surveys</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>74</volume>
          (
          <year>2017</year>
          ),
          <fpage>26</fpage>
          -
          <lpage>34</lpage>
          . https://doi.org/10.1016/j.chb.
          <year>2017</year>
          .
          <volume>04</volume>
          .013 http://www.sciencedirect.com/science/ article/pii/S0747563217302510.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Hinrich</surname>
            <given-names>Schütze</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christopher D Manning</surname>
            , and
            <given-names>Prabhakar</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Introduction to information retrieval</article-title>
          . Vol.
          <volume>39</volume>
          . Cambridge University Press Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Textkernel</surname>
          </string-name>
          . [n.d.]. Extract. https://www.textkernel.com/nl/solution/extract/ https://www.textkernel.com/nl/solution/extract/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>