<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Recommender Systems, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predictions using Neural Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roan Schellingerhout</string-name>
          <email>roan.schellingerhout@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Medentsiy</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten Marx</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Advanced Computing Sciences</institution>
          ,
          <addr-line>Paul-Henri Spaaklaan 1, 6229 EN, Maastricht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IRLab Informatics Institute</institution>
          ,
          <addr-line>Science Park 904, 1098 XH, Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Randstad Groep Nederland</institution>
          ,
          <addr-line>Diemermere 25, 1112 TC, Diemen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Career path prediction aims to determine a potential employee's next job, based on the jobs they have had until now. While good performance on this task has been achieved in recent years, the models making career predictions often function as black boxes. By integrating components of explainable artificial intelligence (XAI), this paper aims to make these predictions explainable and understandable. To study the efects of explainability on performance, three non-explainable baselines were compared to three similar, but explainable, alternatives. Furthermore, user testing was performed with recruiters in order to determine the sensibility of the explanations generated by the models. Results show that the explainable alternatives perform on-par with their non-explainable counterparts. In addition, the explainable models were determined to provide understandable and useful explanations by recruiters.</p>
      </abstract>
      <kwd-group>
        <kwd>Career path prediction</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Sequence classification</kwd>
        <kwd>Neural networks</kwd>
        <kwd>User testing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the rise of the modern gig economy, it has become
more dificult for job seekers to find stable positions of
employment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition, due to the average
education level of the workforce having increased considerably
in recent years, potential employees are faced with more
nificantly more dificult for job seekers, and employment
agencies alike, to find positions that fit their needs. To
aided HR matchmaking (e.g., machine learning) to find
suitable positions for individuals, and capable employees
for companies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This task is called career path
prediction, which aims to predict a person’s next position of
employment, given their career up until this point.
      </p>
      <sec id="sec-1-1">
        <title>Previous research on automated career path prediction</title>
        <p>
          tends to share a common flaw: a lack of explainability
[
          <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
          ]. While deep learning tends to deliver good
performance, these models often function as a black box.
        </p>
        <p>Although good results that are dificult to interpret are
acceptable in many use cases, choosing a new career is
such an impactful event in a person’s life that it is
unrealistic to expect users to blindly trust the models. This
is why explainability is such a crucial requirement for
nEvelop-O</p>
        <p>predictions explainable impact performance?
• RQ3: Which explainable model is the most useful</p>
        <p>for recommending jobs to candidates?
This paper is structured as follows: first, an overview
2.1. Career path predictions
of the current state of the art in terms of model per- another multivariate sequence classification task (gold
formance and explainability is given. Then, Randstad’s price forecasting), outperforming every alternative
archidataset is described in detail. Afterwards, the methods tecture tested.
used to answer the research questions are explained. Sub- While the aforementioned models make up the current
sequently, the research questions are answered, after state of the art for career path predictions, they all share
which their answers are discussed. a common flaw: they function as black boxes. As a result,
their outputs are hard to interpret for both recruiters and
job seekers. Considering the impact a career change can
2. Related Work have on an individual’s life, this can make the models
dificult to use in real-world scenarios.</p>
        <p>
          The goal of career path prediction is to determine what 2.2. Explainability in deep learning
position of employment is a logical next step given a job
seeker’s career [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Considering the number of diferent Explainability and performance are often considered
incareer opportunities and factors which have an influ- verses of each other in the field of AI. A simple, easy
ence on the career steps (e.g., previous job experiences, to explain model is likely to perform mediocre at best,
educational background, interests of a job seeker), the while a complex, dificult to explain model is more likely
career prediction problem is incredibly dificult to model to perform well [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. A common example of this inverse
by hand. relationship can be seen in the diference between
deci
        </p>
        <p>
          In recent years a lot of progress has been achieved sion trees and random forests: random forests are based
within the field of career path prediction. The first no- on decision trees, but with a higher degree of
complextable paper to use machine learning for career path pre- ity, which strongly increases performance at the cost of
diction, was that by Liu et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In this paper, Liu explainability.
et al. scraped individuals’ social media profiles to gen- However, with the increasing interest in explainable AI,
erate a dataset, after which they predict when an em- more and more solutions have been brought up that can
ployee would be ready to move to a higher-paying posi- make even the most complex deep learning models
extion within their current field (e.g. moving from junior plainable to a degree [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Most commonly, this
explainsoftware developer to senior software developer). Meng ability takes the shape of visualizations of the networks’
et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] then extended this task by not just considering behaviour. Saliency maps and attention distributions
within-field switches, but general job mobility. Their are capable of visualizing the importance of diferent
custom LSTM, the hierarchical career-path-aware neural variables, usually through some type of colour scheme
network (HCPNN), was thus tasked to predict individ- indicating higher or lower feature importance. Initially,
uals’ next employer, regardless of their current field of Springenberg et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] used guided backpropagation
employment. The HCPNN has shown impressive results, to visualize the features learned by convolutional layers.
outperforming every model that forewent it. Extending past guided backpropagation, Selvaraju et al.
        </p>
        <p>
          Similarly, He et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] attempted to predict individu- [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] created Grad-CAM, which could not only visualize
als’ next job based on features they extracted from their general learned features, but also determine which
fearesume. Unlike Meng et al., they made use of a convolu- tures were important for a specific predicted class. Since
tional neural network (CNN) for the predictions. With these post-hoc interpretability techniques merely look at
this CNN they tried to implement a multi-purpose model the behaviour of the model, they do not alter their
perforthat could not only predict talents’ next job position, but mance. However, it is often necessary to make alterations
also their salary and the size of the company they would to the models’ architecture to allow good explanations to
be working at. Out of those three tasks, their CNN proved be generated (e.g., they only work on convolutional
layto perform the best on career path predictions. ers, and preferably only on the final convolutional layer
        </p>
        <p>
          At their core, Meng et al.’s LSTM and He et al.’s CNN of a model) [
          <xref ref-type="bibr" rid="ref17 ref18">18, 17</xref>
          ]. As a result, such techniques either
are simply feature extractors which feed their output into do not change performance at all, or decrease it slightly.
a dense layer. While both perform well on their own, it In contrast, while both aforementioned methods were
is common to combine these two architectures within created for computer vision, Vaswani et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] proposed
the field of sequence classification [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ]. Although ‘attention mechanisms’ for natural language processing.
such an architecture has not yet been used for career These attention mechanisms cause the models to predict
path predictions specifically, they have been shown to the importance of each feature per time step (or the
imperform exceedingly well on other multivariate sequence portance of a given time step in general) which can then
classification problems [
          <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
          ]. Especially Livieris be visualized. As a result, Vaswani et al. made it possible
et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] their CNN-LSTM has shown good results on for diferent model architectures to become explainable,
while simultaneously improving their performance.
105
s
cen104
e
rr
u
fcco103
o
r
e
ubm102
N
101
        </p>
        <p>Distribution of ISCO job types</p>
        <p>
          Distribution of job functions
105
2.3. Explainability in sequence company, the used dataset only contains data pertaining
classification to candidates living in the Netherlands. For each job, the
dataset includes a number of relevant features, such as
Sequence classification brings an additional factor into the company for which the person worked, the period
the mix: the temporal dimension. Simply visualizing within which they worked, ISCO1 classifications of the
which features garner the most attention thus becomes job, and the specific function that was performed. While
insuficient in this scenario. While a given variable might job function and ISCO type both represent job positions,
be highly important to the network initially, it could be- the former is more granular as it takes over 3000 unique
come less relevant as time progresses. Thus, to make ex- values, while the latter takes a mere 355.
plainable sequence classifications, not only should there Additionally, Randstad stores structured and
unstrucbe an explanation of which variables contributed the most tured profile-specific data, which can be used to describe
to the final prediction, but also at what moment their the profile of a candidate. The structured data includes:
values were most decisive [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Nonetheless, saliency
maps are still useful in this scenario, as a multivariate se- • education history, which includes education level,
quence can be treated as a 2-dimensional image of shape completion status, the start and (if applicable) end
(      ×  ℎ) . However, these date;
saliency maps do not necessarily reach the level of finesse • skills (e.g. ‘programming: Python’, ‘operating a
required to generate understandable explanations for se- forklift’, ‘Microsoft Word’, etc.);
quence. As a result, saliency maps are often combined • languages;
with attention mechanisms. By combining saliency maps • driving licenses;
with attention distributions, it is possible to improve the • location.
quality of the explanations [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Description of the Data</title>
      <sec id="sec-2-1">
        <title>3.2. Data imbalance</title>
        <sec id="sec-2-1-1">
          <title>Randstad’s dataset consists of over two million jobs relating to more than 500 thousand individuals. These jobs span over multiple decades, going back as far as the early twentieth century. Although Randstad is a multinational</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>The data on which the models were trained, configured,</title>
          <p>and tested, was provided by Randstad NV (Randstad). There is a huge imbalance in work experience and
edDue to the nature of Randstad’s operations, they have an ucation levels of candidates present in the data. The
exhaustive data lake consisting of temporal employee- imbalance in work experience occurs in job positions,
related data. which are represented by ISCO job types and job
functions (see Figure 1a and 1b respectively), and the number
3.1. Overview of the datasets of positions candidates have had (see Figure 2). We
addressed the skew in the number of jobs a candidate had
by limiting the job history to the 25 most recent jobs.</p>
          <p>The imbalance in education levels (see Figure 3) is less</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>1https://www.ilo.org/public/english/bureau/stat/isco/isco08/</title>
          <p>The unstructured data is represented by curriculum vitaes
(CVs), which are user-generated documents.
impactful, as the education level of candidates is merely jobs were zero-padded to prevent mismatched sequence
a predictor, unlike the ISCO job types and job functions, lengths. This section outlines how candidates’ careers
both of which could be used as the actual labels to be were converted into sequences, as well as how those
predicted. To construct the final dataset we sequences were fed into diferent models.</p>
          <p>Lastly, an overview of the models used is given. The
• limited the job history of candidates to the 25 used models can be split into three separate categories:
most recent jobs; non-neural baselines, non-explainable neural models3,
• dropped candidates with fewer than two jobs in and explainable neural models. 80% of the data was used
the dataset, due to the inability to convert their as a training set, 10% of the data was used as a validation
careers to a sequence; set, on which the optimal hyperparameters were
deter• balanced class labels distribution through mined, and the last 10% of the data was used as a test
weighted sampling during training. set to evaluate model performance on unseen data. We
used weighted sampling during training to address the
imbalance within the class labels distribution.</p>
          <p>This resulted in our final dataset consisting of the
careers of 113724 candidates, each being limited to the 25
most recent jobs they had. For each job, the (normalized2)
time spent working there, the ISCO function level of the
job, the highest education enjoyed up until then, the
company for which the candidate worked, the specific job
function ID, the ISCO job type, and the most recent CV
were stored. Additionally, the zip code, obtained
certificates, mastered languages, skills, and driving licenses
of candidates were stored as static variables, since they
rarely changed in between jobs.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Methodology</title>
      <p>In order to make career path predictions, candidates’
profiles were turned into sequences which could be fed
into diferent (deep learning) models. For each
candidate we used the last 25 jobs along with profile-specific
features as input for the models, after which the models
would predict their next job in the form of its ISCO job
type. Candidate profiles that consisted of fewer than 25</p>
      <sec id="sec-3-1">
        <title>4.1. Data prepocessing</title>
        <p>
          Due to the availability of temporal data, candidates’
career paths were turned into sequences. For these
sequences, each job held by a candidate was considered
to be one time step. The order of the time steps was
determined by the date at which the candidate started
the position. As a result, every career was turned into
a sequence, in which each time step was a candidate’s
current job, combined with their location and the skills,
certificates, languages, and education they had achieved
at the time of starting the position. To also include
candidates’ curriculum vitaes (CVs) at each time step, the
most recent CV uploaded by a candidate at each time
step was converted to numerical features using averaged
Word2Vec [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] embeddings and combined with the other
features.
        </p>
        <p>
          Candidates’ career paths were turned into sequences
2Normalization was done through Z-transformation in order to main- 3The neural models were created in PyTorch and trained on an
tain a common scale for all features. NVIDIA tesla K80 GPU [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
x as follows:
        </p>
        <p>x = [x(1), ..., x(T)],
 ℎ
x(t) = [xj(ot)b; xstructured; x(CtV)]</p>
        <p>
          (t)
where the order of timesteps  is determined by the date
at which the candidate started the job. Every element x(t)
of the sequence x consists of a feature vector xj(ot)b, which
represents candidate’s current job at a timestep  , feature
vector x(stt)ructured, which represents their location, skills,
certificates, languages, and education they had achieved
at the time of starting the position, and feature vector
x(CtV), which represents the most recent CV uploaded by a
candidate at each time step (embedded using Word2Vec).
4.2. Baselines and Models
(1)
(2)
tested on Randstad’s dataset. The performance of these
models will function as a non-explainable baseline, with
which the performance of the explainable alternatives
can be compared. The following models were used:
LSTM : The LSTM -based model used in this paper is
based on the HCPNN by Meng et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. While
the original HCPNN combines candidate-specific
data with company-specific data, its modular
architecture allows for the removal of some of the
model’s components. As a result, the HCPNN
was implemented using only candidate-specific
features. This results in a model that takes
embedded position features, feeds them into an LSTM,
runs the LSTM’s output through an attention
layer, and combines that output with a
candidate’s embedded static features, after which a
fully-connected layer makes a prediction.
        </p>
        <p>
          Considering the fact that careers do not necessarily
follow a logical trend, they can be rather dificult to model
properly. For example, a person might have a job for CNN : The CNN -based model used in this paper is
a while not because they want to, but because they are that of He et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This architecture feeds the
forced to do so in order to support themselves. A person input data into a 2D convolutional layer, followed
going from a position as a software engineer to a store by a pooling layer. The output is then flattened
clerk does not constitute a logical progression, but can and ran through a drop-out layer. Lastly, a
fullyobviously occur in the real world whenever someone connected layer is used to do the final prediction.
gets laid of and needs to work a temporary job while CNN-LSTM : The CNN-LSTM -based model used in
they search for new alternatives. This makes career path this paper is based on the model created by
prediction a notoriously dificult problem for deep learn- Livieris et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It uses two sequential 2D
ing models [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. To evaluate the added value of using convolutional layers, followed by a pooling layer.
such models, and to allow for better contextualization, The pooled features then get fed into an LSTM,
baselines were set with three non-deep learning (but co- after which a fully-connected layer is responsible
incidentally highly explainable) models. The first one is for the final predictions of the model.
a simple majority class baseline, which always predicts
the most common job in the dataset. The second baseline To evaluate performance, accuracy @  ( ∈ {1, 5, 10} )
is the majority switch, which always predicts the most was used, which shows how often the correct answer
common job following the current job of the candidate. was within the top  predictions given by the model [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
The last simple baseline is more sophisticated: k-nearest Considering the fact that candidates could not be
interneighbors based on the dynamic time warping distance ested in a specific job type (e.g. no open vacancies, not
between candidates that had the same previous job (KNN- interesting enough, it pays too little), it is expected of
reDTW). This baseline uses dynamic time warping [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to cruiters that they can provide multiple recommendations
determine which candidates have had the most similar for the candidate, allowing them to choose and consider
careers, and then uses k-nearest neighbors to make a multiple options. As a result, the models provide multiple
prediction based on these similarities. For each candi- predictions, which can be evaluated using accuracy @  .
date, all candidates that had the same previous job were
compared in terms of DTW distance (i.e., the numerical 4.2.2. RQ2 - Explainable models
distance between the sequences); the k-nearest neighbors
algorithm was then used to find the closest neighbors for
each candidate, based on whom a prediction would be
made.
        </p>
        <sec id="sec-3-1-1">
          <title>Although the explainable models’ architectures difer slightly from the aforementioned state-of-the-art models to allow for improved explainability, they are largely identical.</title>
          <p>4.2.1. RQ1 - State of the art</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>To study the impact of explainability mechanisms on</title>
          <p>
            model performance, three state-of-the-art models, each
with a unique architecture (Section 2.1), were trained and
Explainable LSTM : The explainable LSTM -based
model (eLSTM) used in this paper is based on
the spatiotemporal attention LSTM (STA-LSTM)
by Ding et al. [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]. This architecture starts of by
determining spatial attention; it runs each individ- catenated to the last hidden state of the LSTM,
ual time step through a linear layer, after which and fed into fully-connected layer, which makes
the Hadamard product between the linear layer’s the final prediction.
output and the features per time step is taken to
determine the importance of each feature at each 4.2.3. RQ3 - Real-world utility
time step. The output hereof is then fed into an
LSTM, after which the temporal attention is cal- To measure the adequacy of the explanations generated
culated. This is done by flattening the output of by the models, user testing was performed. Potential
the LSTM and running it through another linear users of the models (e.g. Randstad’s recruiters), were
layer. This calculates a normalized importance of tasked to determine which variables were most relevant
each time step, based on that step’s hidden values. for a prediction made by the system. Six recruiters were
The dot product between the linear layer’s output split into three groups based on their recruiting expertise
and the LSTM’s hidden output is then calculated, (finance, customer support, health care), and shown three
which is fed into a fully-connected layer to make separate predictions within that industry (one per model).
the final predictions. For each predictions, they were tasked to distribute 100
‘relevance points’ over all of the features used by the
Explainable CNN : The explainable CNN -based models (previous jobs, education, skills, etc.), after which
model (eCNN) used in this paper is based on the their distribution was compared to that of the models. In
explainable convolutional neural network for mul- order to determine the sensibility of the models’
explanativariate time series classification (XCM) by Fauvel tions, the Pearson’s correlation, root mean squared error
et al. [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]. It makes use of two stages which run (RMSE), and mean absolute error (MAE) of each models’
in parallel. The first stage (top) uses a 2D con- distributions compared to the recruiters’ distributions
volutional layer with kernel size (   × 1 ) were calculated. Furthermore, the recruiters were
prethat generates  1 feature maps. A (1 × 1) 2D sented with the explanations generated by each model,
convolutional layer is then used to summarize and tasked to judge each part of the explanations
(spathose  1 feature maps into a single feature map. tial/feature attention, temporal attention, and
spatiotemThe other stage (bottom), running independently, poral attention), as well as the general usefulness of the
uses a 1D convolutional layer with kernel size explanations for finding a suitable position for a
candi(   ×     ) and also generates  1 date. By averaging the scores given by the recruiters, the
feature maps, which are summarized by a (1 × 1) real-world utility of each explanation was determined.
1D convolutional layer. The two feature maps
generated by the two stages are then
concatenated in the feature-dimension, after which a 1D 5. Results
convolutional layer with kernel size (  ×
(  +1) ) generates  2 feature maps. These 5.1. RQ1 - State of the art
feature maps are then ran through a pooling layer, To better convey the performance gained by using deep
which is also responsible for the predictions.  1 , learning models, the score of each model will be directly
 2 , and    are three separate hyperpa- compared to that of the best-performing baseline. Of
rameters for this model. the three simple baselines, the majority switch baseline
Explainable CNN-LSTM : The explainable CNN- performed the best, reaching 19.1% accuracy @ 1, 46.6%
LSTM -based model (eCNN-LSTM) used in this pa- accuracy @ 5, and 61.3% accuracy @ 10. KNN-DTW
per is based on that of Schockaert et al. [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]. This performed worse initially, but converged to the
majormodel runs the input data through a 2D convolu- ity switch baseline as the number of neighbors ( )
aptional layer with kernel size ( ℎ × 1 ), proached infinity. With low values of  , e.g. 5, it failed to
whose output gets concatenated to the original break even 10% accuracy @ 1. However, using a higher
sequential data. This combined output gets fed value for  , e.g. 100, greatly improved this score,
reachinto an LSTM. All but the last hidden state of the ing 18.1% accuracy @ 1, 46.4% accuracy @ 5, and 58.1%
LSTM get passed through a temporal attention accuracy @ 10, showing a sub-linear performance gain
mechanism. This temporal attention mechanism as  increased. The majority class baseline performed
runs each hidden state through a fully-connected significantly worse, only reaching 10.5% accuracy @ 1,
layer which attributes it a given amount of atten- 36.8% accuracy @ 5, and 49.1% accuracy @ 10. As a
retion. These attention values are then normalized, sult, the performance of the deep learning models was
after which the dot product of the attention vec- compared against the scores achieved by the majority
tor and the hidden states is calculated to create switch baseline.
a context vector. This context vector is then
conMajority switch
          </p>
          <p>While similar architectures were used for the explain- user research was done with Randstad’s recruiters. After
able and non-explainable models, diferent hyperparam- providing the recruiters with the predictions made by
eter configurations led to diferent performance for each the model, they were asked to estimate which variables
architecture. The results shown in Table 1 only indicate were most important. The averaged estimates made by
the performance given by the best hyperparameter con- the recruiters and models can be seen in Figure 4 (for
ifguration found for each model. For a full overview of the comparison per model see Appendix C). The results
hyperparameter configurations and their related perfor- indicate that the models’ explanations were positively
mance see Appendix B. correlated with those made by the recruiters (Table 2. For
the eCNN-LSTM, this correlation was moderate, while
5.2. RQ2 - Explainable models for the eCNN and eLSTM, it was quite weak. In general,
the models considered more ‘job-specific’ features such
Out of all the models, the CNN-LSTMs performed the as the previous functions, companies, ISCO job types,
best. Unlike what was hypothesized, the explainable and ISCO job levels to be highly important, while the
models were not inferior to their non-explainable coun- recruiters leaned more towards ‘general’ features such
terparts (Table 1). In fact, the eLSTM provides a higher as education and skills.
accuracy than the non-explainable LSTM by a slight mar- To measure the sensibility of each model’s
explanagin, although this diference falls within the confidence tions, three metrics were calculated for each of them:
intervals of the scores, and is therefore not significant
( &gt; .05 ). The explainable CNN took a slight (but
statistically significant) hit in performance in exchange for the
increase in explainability, especially sufering at higher
values of  .</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>5.3. RQ3 - Real-world utility</title>
        <sec id="sec-3-2-1">
          <title>Each explainable model is able to generate three sepa</title>
          <p>rate explanations for a prediction: (i) the weight of each
feature, (ii) the weight of each time step, and (iii) a time Models
step/feature interaction map (spatiotemporal attention). Recruiters
The way in which these explanations are generated dif- 15 10 5 Feature im0portance 5 10 15
fers per model, but the final visualizations are the same,
regardless of the method used to generate them (Figure 10, Figure 4: Average distribution of feature importance of the
11, and 12 in Appendix E). three explainable models compared to that of Randstad’s
reIn order to verify the integrity of these explanations, cruiters (N = 18).</p>
          <p>Pearson’s r ⇑ RMSE ⇓ MAE ⇓ 6.0% to 7.3% accuracy @ 1. Although this is a larger
eLSTM 0.142 4.661 4.094 improvement than that of the HCPNN compared to the
eCNN-LSTM 0.436 6.014 4.847 majority switch baseline presented in this paper (14.6%
eCNN 0.152 5.594 4.518 increase in accuracy @ 1), this result can still be
considTable 2 ered a confirmation of Meng et al. their findings. The
The Pearson correlation, RMSE, and MAE of each model com- smaller relative improvement could in part be caused by
pared to the scores given by the recruiters (N = 6). For each the fact that Randstad’s dataset includes data that has
feature, both the models and the recruiters gave a score; the been manually input by candidates themselves. This data,
scores are calculated based on those two scores. as opposed to that input by Randstad’s recruiters, has
not been verified, and could therefore include errors, a
substantial amount of missing values, etc. While these
RMSE, MAE, and Pearson correlation. This was done data points could have been removed from the dataset to
by calculating the diference between the average score improve performance, a conscious decision was made not
that recruiters gave to each feature and the attention put to. Removing all data entered by candidates themselves
towards that feature by the models (RMSE and MAE), as would get rid of more than half the dataset, in exchange
well as the correlation between the models’ values and for a relatively minor improvement in performance (in
the recruiters’ values (Pearson correlation). The results the neighborhood of 5-10%, absolute). Additionally, in
can be seen in Table 2. real-world use, providing candidates with the ability to</p>
          <p>Additionally, the recruiters were asked how sensible enter their own career into Randstad’s system and
inthey found the models’ explanations, as well as how use- stantly being able to receive job recommendations is very
ful they considered the models (including their explana- valuable.
tions) for helping candidates find a new job. The averaged As opposed to the CNN and LSTM, the CNN-LSTM
scores for each model is shown in Table 3. showed a major improvement over the baseline. This</p>
          <p>
            In general, the recruiters showed a preference for the is in accordance with the results found by Livieris et al.
feature explanations, and to a lesser extent the spatiotem- [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], who showed that their CNN-LSTM significantly
poral explanations. The temporal explanations were con- outperformed a bare LSTM baseline. Considering the
sidered the least sensible, failing to reach a suficient fact that both the convolutional layers and LSTM layers
grade (i.e., above a 5.5/10 on average). While the eCNN are used as feature extractors, this result is expected. By
was judged to deliver the worst explanations, receiving combining the two layer types, the model is able to learn
barely a 5/10 on average, the eCNN-LSTM’s and eLSTM’s more abstract representations of the data, allowing it to
explanations were considered suficient by the recruiters. generalize better [
            <xref ref-type="bibr" rid="ref28 ref29">28, 29, 30</xref>
            ].
          </p>
          <p>Out of these two, the eCNN-LSTM was determined to
provide the best explanations, scoring the highest aver- 6.1.2. Explainability’s impact on performance
age rating in each category. Regardless of the insuficient
grades reached by some explanations/models, all three
models were considered generally useful for
recommending a job to a candidate.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Discussion and conclusion</title>
      <sec id="sec-4-1">
        <title>6.1. Interpretation of the results</title>
        <p>6.1.1. State of the art performance
Although career path prediction is a notoriously dificult
problem in deep learning, the state-of-the-art models
used on Randstad’s dataset ended up performing
commendably. All three models ended up achieving
significantly ( &lt; .05 ) higher scores than the majority switch
baseline, which already performed well. However, this
improvement is relatively small for the CNN and LSTM.</p>
        <p>
          This marginal increase over the baseline is largely in
line with the results found in previous research. Meng
et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] found that the HCPNN outperformed non-neural
baselines by about 20% on their dataset; improving from
Though it was initially expected that the inclusion of
explainability mechanisms would impact model
performance to a degree [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the experiments have shown that
this is not the case. While for Grad-CAM (CNN) this
result might seem obvious, considering this technique
does not alter the model, but merely looks at the model’s
gradients, this is still surprising. Despite the fact that the
technique itself is not intrusive, the model’s architecture
still needed to be altered in order to create sensible
explanations (e.g. the eCNN’s parallel design), as shown
by Fauvel et al. [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Regardless of this architectural
change, however, the explainable model still performed
on-par with its counterpart. Similarly, the explainable
CNN-LSTM, which uses not only guided
backpropagation, but also an attention mechanism, showed roughly
equal performance to the non-explainable CNN-LSTM.
        </p>
        <p>For the LSTM, the addition of explainability even
improved the model’s performance (in terms of accuracy
@ 1), although this improvement was not statistically
significant. Thus, the experiments show that
explainability mechanisms can be used in deep learning models
Feature explanation</p>
        <p>Temporal explanation</p>
        <p>Spatiotemporal explanation</p>
        <p>
          General usability
eLSTM
eCNN
eCNN-LSTM
for career path prediction without hindering the mod- 6.2. Potential biases
els’ predictive powers. For the most part, this is in line
with the results of previous research on the topic [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ]. While the models performed commendably, and the
exHowever, the fact that the attention mechanisms used planations were determined to be satisfactory, it is
imporin the eCNN-LSTM and eLSTM did not improve model tant to consider the impact of biases in the training data
accuracy in a statistically significant manner is in stride on the predictions. Although protected features, such
with the results found by Schockaert et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and Ding as gender, race, and age were removed from the dataset,
et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. This is likely caused by the diferences be- correlation between such features and input features may
tween their datasets and the one provided by Randstad. still have caused discrimination [31]. For example, while
For example, the majority of candidates in Randstad’s age was not explicitly present in the data, the models
dataset only had one job on record. In such a scenario, could still roughly determine a candidate’s age based
temporal attention adds no value, as all attention will be on their total number of days worked across all jobs (a
directed towards that single time step. person with a few hundred total days worked is likely
to be in their twenties, while someone with over ten
thousand days worked is probably nearing retirement).
6.1.3. Real-word utility The models’ ability to ‘retrieve’ such protected features
may have negatively afected the recommendations for
specific candidates. Future research could look into the
extent to which this occurs, as well as methods to
alleviate this efect.
        </p>
        <p>User testing showed that recruiters consider the
explainable models usable in a real-world scenario. Although
they were quite critical, giving mostly suficient (but not
outstanding) grades, they determined that each model
type would at least be helpful to a degree in finding a
job for a candidate. The individual explanation types 6.3. Limitations and expansion
tended to score lower than the models as a whole,
indicating that the current implementation of the models’ Due to the lack of a publicly available dataset,
determinexplanations (i.e. the visualizations in Appendix E) might ing state-of-the-art performance is complicated for career
require some tuning or extra clarification in order to be path prediction. Even within Randstad’s own dataset,
used eficiently by recruiters. Regardless, the recruiters performance could be increased by simply filtering out
did indicate that they considered the current implemen- data entered by candidates. To advance the field of career
tation useful as is. Considering the environment for user path prediction, future research should focus on creating
testing is quite bare-bones (Appendix D), this is a pos- a general dataset that can be used to directly compare
itive indication for the actual usability of the models’ model performance within the field (in the same vein
explanations. Thus, to to allow further capitalization as ImageNet for image classification 4 and TREC for text
of the explanations, a more user-friendly interface (e.g. retrieval5). This benchmarking dataset should consist of
interactive explanations, clear textual descriptions of the relatively clean, GDPR compliant, exhaustive career data
data) could be used. In doing so, the models might also of a large variety of candidates. Using this dataset, future
become usable by candidates themselves. Considering research will be able to better gauge the performance of
the inference time of the models (less than a second), can- diferent architectures used for career path prediction
didates could enter their careers into Randstad’s system, (e.g. LSTMs, CNNs, temporal graphs) and draw direct
and instantly be provided a list of job recommendations, comparisons between models. Thus, having a clear and
accompanied by explanations. However, more research definite state of the art will most certainly advance the
will need to be done to determine if this is preferable for ifeld as a whole.
candidates over having recruiters interpret the models’ Another limitation posed in this paper, is the lack of
predictions. hardware resources. The NVIDIA Tesla K80 used to train
4https://www.image-net.org/
5https://trec.nist.gov/data.html
the models fell short when training the CNN-based
models. Because of the low CUDA core count of 2496, and
the limited 12 gigabytes of VRAM, the convolutional
models had to be limited in terms of kernel size, output
channels, embedding sizes, epochs, and batch sizes to
decrease VRAM usage and keep training time reasonable.</p>
        <p>Consequently, not all possible hyperparameter
configurations could be tested, possibly leaving better model
configurations unexplored.</p>
        <p>Furthermore, the small sample size used for user
testing is an important limitation to acknowledge. Because
the participating recruiters were on payroll, it was
dificult to get their managers’ approval, as well as to schedule
a moment to perform the tests. Subsequently, the results
gathered by the user testing are subject to high variance
and are therefore dificult to use as conclusive evidence.</p>
        <p>Increasing the sample size by also performing user
testing on candidates themselves would have helped solve
this issue and might have provided additional insights.</p>
        <p>Also, improving the clarity of the UI used for user testing
and the models’ explanations could have led to lower
variance, making the results more conclusive.</p>
        <p>Additionally, while only including career switches in
the training data strongly improved the models’ usability,
it also hinders individuals who are looking for new work
within their current field from receiving
recommendations. To account for such candidates, future work could
expand upon the current pipeline by including a
recommendation on whether a candidate should stay within
their current field, or pursue a position with a diferent
function. For individuals who get recommended to stay
within their profession, the models could, for example, be
altered to recommend a next employer within the field.</p>
      </sec>
      <sec id="sec-4-2">
        <title>6.4. Conclusion</title>
        <p>In the span of this paper, it was shown that career path
predictions made by deep learning models can be made
explainable to a high degree. While diferent types of
explanations made by the models can difer in terms
of how understandable they are to humans, all of them
turned out to be useful for recruiters nonetheless. Due to
the fact that these explainability mechanisms do not lead
to a decrease in performance, they form a good addition
to existing career path prediction models. This goes
especially for CNN-LSTMs, as those perform the best
as explainable and non-explainable models, while also
providing the best explanations according to recruiters.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7. Appendix</title>
      <sec id="sec-5-1">
        <title>All code used in the experiments can be found on https: //github.com/Roan-Schellingerhout/MSc_thesis.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Encoding and indexing</title>
      <p>64 feature maps. The 3D max-pooling used a
(64 × 1 × 1) kernel with (1 × 1 × 1) stride.</p>
      <sec id="sec-6-1">
        <title>CNN-LSTM : The CNN-LSTM used a batch size of 128</title>
        <p>and reached optimal performance after 20 epochs.</p>
        <p>The first 2D convolutional layer used a ( 1 × 1)
kernel, with a (1 × 1) stride and half padding,
and generated 32 feature maps. The second 2D
convolutional layer made use of the same kernel
size, stride, and padding, but generated 64 feature
maps. The following 3D average-pooling layer
used a (64 × 1 × 1) kernel and a (1 × 1 × 1) stride.</p>
        <p>Lastly, the model used a single LSTM layer with
hidden size 1000.</p>
        <p>With over 100 thousand careers, each spanning 25 time
steps, and over 1000 features per time step (embedding
values for skills, certificates, previous jobs, previous
companies, addresses, and spoken languages, as well as 300
w2v dimensions per CV), feeding the data into deep
learning models as is, turned out to be infeasible. Making use
of sparse vectors to lower memory usage also was
impossible, due to the incompatibility between CUDA and The optimal hyperparameters found for the
explainsparse vectors/matrices [32]. However, considering the able models are as follows:
large amount of duplicate data (a candidate’s
skills/certificates/CVs do not change at every time step, and can eLSTM : The explainable LSTM used a batch size of 128
therefore often be repeated), use was made of indices and reached optimal performance after 5 epochs.
in order to lower memory usage, at the cost of a slight It used a single LSTM layer with hidden size 1000.
time complexity increase. For each candidate, a loca- eCNN : The explainable CNN used a batch size of 128
tion within each index was created that contained their and reached optimal performance after 2 epochs.
unique attributes, and the time steps from which those The top part used a 2D convolutional layer with
attributes became the most recent ones. By then retriev- a (5 × 1) kernel (thus,    = 5 ), a (1 ×
ing the relevant attributes for each candidate in a batch 1) stride, half padding, and generated 8 feature
during training, the required memory usage was lowered maps (thus,  1 = 8 ). For the bottom part, the 1D
drastically. convolutional layer used a (5 ×    ) kernel,
a (1 × 1) stride, half padding, and also generated
B. Hyperparameters 8 feature maps. The final 1D convolutional layer
used a kernel size of (5 × (   + 1) ), a (1 ×
1) stride, half padding, and generated 32 feature
maps (thus,  2 = 32 ). These 32 feature maps
were then ran through an 3D average-pooling
layer with kernel size (32 × 1 × 1) and a (1 × 1 × 1)
stride.</p>
      </sec>
      <sec id="sec-6-2">
        <title>All hyperparameter tuning results can be found on</title>
        <p>GitHub. For each configuration, the models were ran
for 3 epochs. Based on the results after those 3 epochs,
the best performing configuration was ran for 20 epochs
to find the optimal number of epochs. Not every intended
hyperparameter configuration could be tested due to
hardware/time constraints. For example, the CNN-based
models needed to be limited to small kernels and output
channels to prevent running out of VRAM. Additionally,
the eCNN was only trained for a total of 3 epochs, due
to time constraints (as each epoch took nearly 8 hours).</p>
        <p>All models were optimized using the Adam optimizer
[33] (learning rate = 1 ∗ 10−3) with cross-entropy loss.</p>
        <p>The hyperparameters used for the results of the non- C. Recruiter vs. model
explainable models in Table 1 were the following:
eCNN-LSTM : The explainable CNN-LSTM used a
batch size of 2048 and reached optimal
performance after 15 epochs. Its 2D convolutional layer
used a kernel of size ( ℎ × 1) and half
padding, and was followed by a single LSTM with
hidden size 1000.
distributions</p>
      </sec>
      <sec id="sec-6-3">
        <title>LSTM : The HCPNN used a batch size of 512 and reached optimal performance after 18 epochs. It used a single LSTM layer with hidden size 1000.</title>
      </sec>
      <sec id="sec-6-4">
        <title>The distributions of feature importance on which Table 2</title>
        <p>is based can be seen in Figures 5a, 5b, and 5c. Each model
distribution is based on the average feature importance
CNN : The CNN used a batch size of 128 and reached determined by the models across the three categories
optimal performance after 11 epochs. The 2D (finance, health care, and customer support). For the
convolutional layer consisted of a (5 × 5) kernel, recruiter distribution, the average is taken over the three
with (1 × 1) padding and stride, and generated industries, as well as all recruiters within those industries
(as a result,  = 6 for all recruiter distributions).
23 25
Sequence length (log )
27</p>
        <p>29</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>D. User testing</title>
      <p>User testing was conducted using a web environment
accessible by the recruiters. The web app was hosted
using Amazon ec2 in combination with Docker, and built
using Flask, JQuery, Jinja, and AJAX. The recruiters were
tasked to enter their e-mail address (to allow follow-up
questions if needed) and select their expertise (finance,
health care, customer support). Afterwards, they were</p>
    </sec>
    <sec id="sec-8">
      <title>E. Explanation examples</title>
      <sec id="sec-8-1">
        <title>The explanations provided by the three diferent models</title>
        <p>for the same candidate can be found in Figures 10, 11,
and 12. The correct label for this candidate was Survey
and market research interviewer.</p>
        <p>3
Time step
4</p>
        <p>5
3
Time step
1
3
Time step
1
3
Time step</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Parigi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>The gig economy</article-title>
          ,
          <source>XRDS: Crossroads, The ACM Magazine for Students</source>
          <volume>23</volume>
          (
          <year>2016</year>
          )
          <fpage>38</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hanson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Checked</surname>
          </string-name>
          , Educational attainment statistics [2022]
          <article-title>: Levels by demographic</article-title>
          ,
          <year>2021</year>
          . URL: https://educationdata.org/ education-attainment-statistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kotschenreuther</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>Data-driven hr-r\'esum\'e analysis based on natural language processing and machine learning</article-title>
          ,
          <source>arXiv preprint arXiv:1606.05611</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>A hierarchical career-path-aware neural network for job mobility prediction</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Nie,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rosenblum</surname>
          </string-name>
          ,
          <article-title>Fortune teller: predicting your career path</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>30</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kokkodis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          ,
          <article-title>Demand-aware career path recommendations: A reinforcement learning approach</article-title>
          ,
          <source>Management Science</source>
          <volume>67</volume>
          (
          <year>2021</year>
          )
          <fpage>4362</fpage>
          -
          <lpage>4383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Zhang,</surname>
          </string-name>
          <article-title>Career trajectory prediction based on cnn</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Service Operations and Logistics</source>
          , and
          <string-name>
            <surname>Informatics</surname>
          </string-name>
          (SOLI), IEEE,
          <year>2019</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stefik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          , G.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Xai-explainable artificial intelligence</article-title>
          ,
          <source>Science Robotics</source>
          <volume>4</volume>
          (
          <year>2019</year>
          )
          <article-title>eaay7120</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Department</surname>
          </string-name>
          , Stafing industry: Leading companies worldwide,
          <year>2022</year>
          . URL: https://www.statista.com/statistics/257876/ staffing-companies
          <article-title>-worldwide-by-revenue/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vidal</surname>
          </string-name>
          , W. Kristjanpoller,
          <article-title>Gold volatility prediction using a cnn-lstm approach</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>157</volume>
          (
          <year>2020</year>
          )
          <fpage>113481</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A cnn-lstm-based model to forecast stock prices</article-title>
          ,
          <year>Complexity 2020</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rick</surname>
          </string-name>
          , L. Berton,
          <article-title>Energy forecasting model based on cnn-lstm-ae for many time series with unequal lengths</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>113</volume>
          (
          <year>2022</year>
          )
          <fpage>104998</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>T.-Y. Kim</surname>
          </string-name>
          , S.-B. Cho,
          <article-title>Predicting residential energy consumption using cnn-lstm neural networks</article-title>
          ,
          <source>Energy</source>
          <volume>182</volume>
          (
          <year>2019</year>
          )
          <fpage>72</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Livieris</surname>
          </string-name>
          , E. Pintelas,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pintelas</surname>
          </string-name>
          ,
          <article-title>A cnn-lstm model for gold price time-series forecasting</article-title>
          ,
          <source>Neural computing and applications 32</source>
          (
          <year>2020</year>
          )
          <fpage>17351</fpage>
          -
          <lpage>17360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <article-title>Evolving cnnlstm models for time series prediction using enhanced grey wolf optimizer</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>161519</fpage>
          -
          <lpage>161541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choo</surname>
          </string-name>
          , S. Liu,
          <article-title>Visual analytics for explainable deep learning</article-title>
          ,
          <source>IEEE computer graphics and applications 38</source>
          (
          <year>2018</year>
          )
          <fpage>84</fpage>
          -
          <lpage>92</lpage>
          . [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Lstm
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ried- networks for mobile human activity recognition, miller, Striving for simplicity: The all</article-title>
          convolutional in: 2016
          <source>International conference on artificial innet, arXiv preprint arXiv:1412.6806</source>
          (
          <year>2014</year>
          ).
          <article-title>telligence: technologies and applications</article-title>
          , Atlantis
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
          </string-name>
          , Press,
          <year>2016</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>53</lpage>
          . D.
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
            , Grad-cam: Visual explanations [31]
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Träuble</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Creager</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kilbertus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Locatello, from deep networks via gradient-based localization, A</article-title>
          .
          <string-name>
            <surname>Dittadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Schölkopf</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bauer</surname>
          </string-name>
          ,
          <source>On in: Proceedings of the IEEE international confer- disentangled representations learned from correence on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          . lated data, in: International Conference on Machine
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          , Learning, PMLR,
          <year>2021</year>
          , pp.
          <fpage>10401</fpage>
          -
          <lpage>10412</lpage>
          . L.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>A. N.</given-names>
          </string-name>
          <string-name>
            <surname>Gomez</surname>
            , Ł. Kaiser,
            <given-names>I. Polosukhin</given-names>
          </string-name>
          , At- [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Contributors</surname>
          </string-name>
          , Torch.sparse¶,
          <year>2022</year>
          .
          <article-title>URL: https: tention is all you need</article-title>
          , Advances in neural infor- //pytorch.org/docs/stable/sparse.html.
          <source>mation processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ). [33]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochas-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rojat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puget</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filliat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gelin</surname>
          </string-name>
          , tic optimization,
          <source>arXiv preprint arXiv:1412</source>
          .6980
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <source>Explainable artificial intelli-</source>
          (
          <year>2014</year>
          ).
          <article-title>gence (xai) on timeseries data: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2104.00950</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schockaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leperlier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moawad</surname>
          </string-name>
          ,
          <article-title>Attention mechanism for multivariate time series recurrent model interpretability applied to the ironmaking industry</article-title>
          , arXiv preprint arXiv:
          <year>2007</year>
          .
          <volume>12617</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          , et al.,
          <article-title>Pytorch: An imperative style, highperformance deep learning library</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Berndt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cliford</surname>
          </string-name>
          ,
          <article-title>Using dynamic time warping to find patterns in time series</article-title>
          ., in: KDD workshop, volume
          <volume>10</volume>
          , Seattle, WA, USA:,
          <year>1994</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>D. M. Powers</surname>
          </string-name>
          ,
          <article-title>Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>16061</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Z. Cheng,
          <article-title>Interpretable spatio-temporal attention lstm model for flood forecasting</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>403</volume>
          (
          <year>2020</year>
          )
          <fpage>348</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fauvel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          , V. Masson, É. Fromont,
          <string-name>
            <given-names>A.</given-names>
            <surname>Termier</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xcm:</surname>
          </string-name>
          <article-title>An explainable convolutional neural network for multivariate time series classification</article-title>
          ,
          <source>Mathematics</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>3137</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Shamir,</surname>
          </string-name>
          <article-title>The power of depth for feedforward neural networks</article-title>
          ,
          <source>in: Conference on learning theory, PMLR</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>907</fpage>
          -
          <lpage>940</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Subasi</surname>
          </string-name>
          , Chapter 5
          <article-title>- other classification examples</article-title>
          , in: A.
          <string-name>
            <surname>Subasi</surname>
          </string-name>
          (Ed.),
          <source>Practical Machine Learning for Data Analysis Using Python</source>
          , Academic Press,
          <year>2020</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>390</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/B9780128213797000059. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 1 6 / B 9 7</source>
          <volume>8 - 0 - 1 2 - 8 2 1 3 7 9 - 7 . 0 0 0 0 5 - 9</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>0.05 0.10 Gradient 0.15 0.20 0.05 0.10 0.15 0.20 0.25 0.30 0</source>
          .
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>