<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Healthy Friction: Determining Stakeholder Requirements of Job Recom mendation Explanations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roan Schellingerhout</string-name>
          <email>roan.schellingerhout@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Barile</string-name>
          <email>f.barile@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nava Tintarev</string-name>
          <email>n.tintarev@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Explainable User Interfaces, Explainable Recommender Systems, User Studies, Job recommendation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Furthermore</institution>
          ,
          <addr-line>based on our thematic analysis, we find that</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maastricht University</institution>
          ,
          <addr-line>Paul-henri Spaaklaan 1, 6229 EN Maastricht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The increased use of information retrieval in recruitment, primarily through job recommender systems (JRSs), can have a large impact on job seekers, recruiters, and companies. As a result, such systems have been determined to be high-risk in recent legislature. This requires JRSs to be trustworthy and transparent, allowing stakeholders to understand why specific recommendations were made. To fulfill this requirement, the stakeholders' exact preferences and needs need to be determined. To do so, we evaluated an explainable job recommender system using a realistic, task-based, mixed-design user study ( = 30 ) in which stakeholders had to make decisions based on the model's explanations. This mixed-methods evaluation consisted of two objective metrics - correctness and eficiency, along with three subjective metrics - trust, transparency, and usefulness. These metrics were evaluated twice per participant, once using real explanations and once using random explanations. The study included a qualitative analysis following a think-aloud protocol while performing tasks adapted to each stakeholder group. We find that providing stakeholders with real explanations does not significantly improve decision-making speed and accuracy. Our results showed a non-significant outperform the random ones on perceived trust, usefulness, and transparency of the system for all stakeholder types. We determine that stakeholders benefit more from interacting with explanations as decision support capable of providing healthy friction, rather than as previously-assumed persuasive tools.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems have found their way into many
aspects of daily life. Even highly impactful decisions, such as
the job that one applies to, and vice versa, the applicants
a company considers, are often influenced by so-called job
recommender systems. When it comes to such impactful
scenarios, blindly relying on algorithms to make the correct
decision can be risky and may lead to unintended
consequences. As a result, there is a growing demand for
explainable artificial intelligence (XAI) within the field of job
recommendation, which aims to provide transparent and
interpretable insights into the decision-making process of
such systems [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
      </p>
      <p>
        Most research on XAI, however, focuses on assisting
developers and other users with prior knowledge of AI [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ],
with the amount of user-centered research staying rather
limited [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Considering that the overwhelming
majority of users of recommender systems tend to be lay users,
such in-depth, technical, and often complicated
explanations ofer little value. Therefore, it is crucial to design
explanations in such a way that they are accessible not
just to AI experts, but also to everyday users with diferent
levels of expertise. Within job recommender systems, the
everyday users are threefold: candidates - those looking
for a job; recruiters - those whose job it is to match
candidates to vacancies; and company representatives - those who
are responsible for hiring in companies. Considering these
stakeholders all perform diferent tasks, their explanation
requirements also tend to difer, making tailor-made solutions
and compromises necessary. Previous work investigated
the preferences related to three explanation types (textual,
ACM RecSys in HR ’24: 4th Workshop on Recommender Systems for Human
∗Corresponding author.
tions. All stakeholders agreed that the explanations could
be improved by showing a more direct relation to the
CV/vacancy for which recommendations were made, e.g., by
reiterating exact phrases in the explanation, and showing
how those were incorporated. How diferent features
contributed to the recommendation (positively/negatively, and
CEUR
      </p>
      <p>ceur-ws.org
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
to what extent) should also be made explicit, to minimize
the risk of misinterpretations. Therefore, we determine that
a focus on decision support over persuasion is preferable.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work and Hypotheses</title>
      <p>
        As determined by the European Union in the AI act [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the
usage of AI in recruitment can be considered a high-risk
scenario. Due to the large impact that career choices can have
on individuals’ lives, as well as the fact that recruitment
often deals with large amounts of sensitive data, job
recommender systems require a more tailored approach
compared to less impactful recommender systems (e.g., music
recommenders) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, current state-of-the-art
approaches often fail to make use of such tailored approaches,
causing aspects such as explainability to be largely ignored
in current literature [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this section, we provide an
overview of the current works of explainability in job
recommendation, both for experts and lay users. Additionally,
we formulate hypotheses for our research questions based
on existing literature.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Explainable job recommendation</title>
        <p>
          While a number of previous works have incorporated
explainability within their JRSs, the explanations often have
limited expressive value or were not the main focus of the
system [
          <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
          ]. Even when explainability has been
included, authors usually fail to consider all stakeholders,
tailoring the explanations to only one group (e.g., developers or
users only) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Furthermore, explanations are often solely
evaluated anecdotally, leaving their quality up for debate
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. One could argue that easy-to-understand
explainability should be at the core of the models’ design in a high-risk,
multi-stakeholder domain such as recruitment. Previous
research, however, often does not explicitly consider the
understandability of their explanations: while their models
can technically explain some part of their predictions, the
explanations tend to be unintuitive and/or limited, either
staying too vague [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ] or being hard to understand [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
for the intended users. Furthermore, baselines are rarely
used for evaluating the efect of explanations, leaving their
actual added benefit up for debate. It has been shown that
lay users may positively evaluate explanations, even when
they do not properly understand them [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. By comparing
two explanations (in our study, a random baseline vs a real
explanation) in the same environment, it is possible to
determine whether explanations actually add value for the
users. Although explanations should always be available
[
          <xref ref-type="bibr" rid="ref10 ref17">10, 17</xref>
          ], they will not necessarily always be useful [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
Explanations have been shown to be mostly used whenever
the user finds themselves in contention (e.g., whenever they
disagree with the recommendations or do not find any of
the items suitable) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. As a result, these dificult choices
are likely to be alleviated by proper explanations, allowing
the users to make the correct decision more quickly and
often. Thus, whenever the explanation helps the user make
a decision, it will also improve their view of the system as
a whole. This leads to the following hypotheses for SQ1
(To what extent do the explanations assist the stakeholders in
their decision-making process? ):
H1a: When provided with the real explanations,
participants will be able to find matches more quickly, and
make the correct decision more often, compared to
when they are provided the random explanations.
H1b: Participants will respond more positively to a
recommendation environment that includes the real
explanations than to a similar one that includes
random explanations. This will improve metrics such
as perceived trust, transparency, and usefulness.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explanations for lay users</title>
        <p>
          When dealing with users with limited AI knowledge (e.g.,
recruiters, job seekers, and most company representatives),
having clear, straightforward explanations is crucial [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
While explainability methods such as feature attribution
maps (usually in the form of bar charts) can be suficient
for AI experts to get a better understanding of a model, this
is not necessarily the case for lay users. Although such
‘technical’ explanations often look intuitive, they can be
deceptive by giving the users a false sense of understanding
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In another study, specifically on job
recommendations, textual explanations were found to be preferred by
the majority of lay users [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], but those take additional time
to read and understand, limiting their real-world usability.
On the other hand, visual explanations tend to include more
detail, allowing more experienced users (such as company
representatives) to get more value from them. As a result,
hybrid combinations of explanation interfaces can be used
to make the explanations feel accessible, while still being
suficiently comprehensive. Even when using hybrid
combinations, however, unique characteristics of each stakeholder
type still play a role. We expect the same preferences to
be indicated in our study, as we conduct our experiment
in a similar (but more realistic) setting. This leads us to
formulate the following hypothesis for SQ2 (What is the
impact of diferent explanation components (textual, bar chart,
and graph-based) on the stakeholders’ understanding of the
explanation? ):
H2: Candidates and recruiters will mainly use textual
explanations to understand the recommendations, while
company representatives prefer graph-based
explanations. The bar chart will be considered useful as
a supportive tool, but will be insuficient as a sole
explanation by all stakeholders.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Implementing lay user explanations</title>
        <p>
          When trying to explain recommendations to lay users,
multiple design factors need to be considered. Not just the way in
which explanations are presented, but also how they are
generated, should be carefully taken into account beforehand.
Previous works have shown that model-agnostic
explainability methods, such as LIME [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and SHAP [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], can fall short
when trying to support non-expert users [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ]. Common
feature attribution methods, such as LIME, tend to provide
limited expressiveness (i.e., they only show the extent to
which diferent features contributed to the prediction, but do
not include any sort of interaction or higher-level relations
between the features). On the contrary, model-intrinsic
methods, such as attention [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and integrated gradients
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], can be quite intuitive, even to people with less
expertise [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Additionally, such methods lend themselves to
the use of graph-based models, which can use knowledge
graphs to incorporate additional expressiveness by actually
        </p>
        <p>Age
 = 26.6,  = 10.08, ℎ = 56,  = 22
 = 47.7,  = 11.52, ℎ = 69,  = 32
 = 38.0,  = 9.79, ℎ = 52,  = 24
Gender
6M, 3F, 1X
5M, 5F
4M, 6F</p>
        <p>Industry
IT, insurance, sociology, finance, law, etc.</p>
        <p>HR, IT, accounting, education, finance, etc.</p>
        <p>
          HR, IT, service industry, marketing, etc.
including such high-level relations between features [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
However, importance weights like attention and integrated
gradients are limited to the range of [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], even when a
feature contributes “negatively” to a prediction (e.g., a feature
that had a very strong, negative impact on the prediction,
could still have an attention value of 0.9). This is likely to
be considered confusing by some users, as such features
should intuitively be given a negative score [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. For
recommendations specifically, items are often presented in a
list. Therefore, the need can arise to know why a specific
item was rated higher than others. For example, when users
consider the second-highest rated item to be the best match,
they might be interested to see what ultimately caused the
model to put another item over their preferred choice [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
As a result, we formulate the following hypotheses for SQ3
(How can the explanations be improved to better assist the
diferent stakeholders? ):
H3a: The explanations frame the model’s reasoning in a
‘positive’ way, which could be perceived as
confusing, lowering the usefulness and transparency of
and trust in - the model. All stakeholders will
therefore benefit from additionally including negative
attention weights.
        </p>
        <p>H3b: There are no comparative (i.e., list-wise) explanations
available for the list of recommendations. An
additional explanation that explains why
recommendation X was ranked higher than Y and Z will therefore
be desirable for all stakeholder types.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. User studies for evaluation</title>
        <p>
          While some ofline evaluation metrics exist for
explanations [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], it is common practice to evaluate explanations
in a realistic setting using individuals from the group that
is supposed to use the explanations. The subjectivity of
user preference generally requires an approach that allows
participants to freely express themselves, as forcing
participants to choose from pre-determined answers is likely to
lack depth during the formative phase of a system.
Therefore, studies evaluating user experience and preference of
AI-systems often use a combination of think-aloud
protocols and (semi-)structured interviews. For instance, Degen
et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], conducted interviews with 11 energy engineers
to design an explainable system for such highly expert users.
Furthermore, Nelson et al. [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] used semi-structured
interviews to get insights from 48 patients on the use of AI for
skin cancer screening. Similarly, Zhu et al. [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] combined
a think-aloud protocol (i.e., having participants say their
thoughts out loud during the experiment) with post-test
interviews to evaluate the user experience of an AI-based
ifnancial advisory system of 24 users with strongly diverse
demographics - both in terms of personal characteristics
and level of expertise.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Contributions</title>
        <p>This paper aims to evaluate explanations for job
recommendations in a realistic setting with a suficient sample
of stakeholders. This is done with the aim of addressing
the lack of multi-stakeholder focus and evaluation that is
present in previous research. Firstly, we create a novel
explainable job recommender system that can generate unique
explanations for diferent stakeholders tailored to their
specific needs. Furthermore, we evaluate these explanations
through a user study in which the participants have to
complete a task that mimics their day-to-day activities.
Specifically, we compare a combination of diferent objective and
subjective metrics across two settings, one in which
participants see genuine explanations, and one in which they see
randomized explanations, in an attempt to determine the
explanations’ real-world impact.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        To answer our research question, we performed a
preregistered user study. We created an online environment
that allows the diferent stakeholders to perform tasks that
are similar to their day-to-day tasks, e.g., looking for
suitable candidates, finding interesting vacancies, or matching
candidates to vacancies. An overview of the environment
can be seen in Fig. 1. A total of 10 participants from each
stakeholder type were asked to use the environment, for a
total sample of 30 individuals. Previous works utilizing
similar approaches showed that, due to the quality and amount
of data collected through qualitative user studies, such a
sample size is suficient [
        <xref ref-type="bibr" rid="ref30 ref31 ref32 ref33">30, 33, 31, 32</xref>
        ]. The participants
were recruited from a wide range of backgrounds (e.g., area
of expertise, age, gender identity, etc.) to mitigate possible
biases from arising. Participants were gathered in two ways:
through personal and professional networks, as well as in
collaboration with Randstad N.V.2 (Randstad), the world’s
largest recruitment agency. They were asked to
participate en masse over e-mail and were provided an
information letter including details on what the research would
entail. Our final sample consisted of 14 women, 15 men,
and 1 non-binary person with an average age of 37.4 years
( = 13.486,  = 22, ℎ = 69 ). The participants had widely
varying backgrounds (e.g., IT, HR, finance, sociology, law,
marketing), leading to levels of expertise w.r.t. AI
ranging from no knowledge whatsoever to a Master’s degree
in a related field. A full overview of the descriptives per
stakeholder type can be seen in Table 1. Both qualitative
and quantitative data were gathered during the experiment
through a previously validated (semi-structured) interview
guide [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This allowed participants to freely speak their
minds, while also allowing for statistical analyses to be
performed.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Procedure</title>
        <p>Participants were given the choice to conduct the
experiment online or in person. All but one participant preferred
to participate online; as a result, 29 out of the 30 interviews
were done in a video call wherein the participant shared
their screen to allow for active monitoring of their actions.
After being sent a link to the online environment, agreeing
to a consent form, and filling in their demographics, each
participant was asked to perform a task related to their
specific stakeholder type: (i) for candidates, this translated to
ifnding the most interesting vacancy from a given list of
recommendations; (ii) company representatives were tasked
to find the most suitable candidate for an existing vacancy
based on a list of recommendations; (iii) recruiters, on the
other hand, had to attempt to find the best match between
a vacancy and a list of possible candidates.</p>
        <p>These recommendations were based on a third-party
dataset (Section 3.2) and were therefore not related to the
participants themselves. Considering the dataset we used
did not contain written CVs (but only semi-structured lists
of work history and skills) or any personal data, we
synthetically generated CVs based on skills and work histories
using ChatGPT-3.5-turbo. During the experiment,
participants had to roleplay as an imaginary candidate/company
on whose behalf they were asked to make a decision. The
synthetic data was not used during the training process and
was only used to allow participants to more easily take on
the identity of the entity on whose behalf they were judging,
as a written CV is more user-friendly and realistic than an
unstructured list of work experience and skills. Still, the
information within the CV was the same as that within
the list, complemented only by information like company
names, years of employment, and more verbose descriptions.
To ease the process of empathizing with the CVs and job
listings, we gave clear instructions to the users so that they
did not base their selections on personal preferences, but
on relevance to the CV/job posting instead.3</p>
        <p>The aforementioned tasks were all executed within the
same environment (Fig. 1) only difering in what data was
being shown to the participants. The participants were
allowed to inspect all the recommended items and
explanations at their own pace and were not instructed to focus on
any given explanation type. They were asked to talk aloud
during their task to allow us to get an understanding of how
the environment was initially perceived, and what aspects
the participants found most interesting. Most participants
were able to quickly browse through the five recommended
items, causing the selection process to take just over 5
minutes on average. The participants performed their allocated
task twice - once with randomized explanations, and once
with explanations generated by an explainable graph neural
network (eGNN) (Section 3.2) - in random order to minimize
learning efects. During both iterations, the participants
were monitored to evaluate objective metrics, such as their
eficiency and the correctness of their decisions. After each
repetition, the participants were interviewed to collect
ratings of the environment (by asking participants how they
would rate certain aspects on a scale from 1 to 10) and
determine where improvements could be made (by probing for
more information on why they gave said ratings). In total,
the entire process took approximately 30 minutes per
participant. Participating candidates were paid the equivalent
of Dutch minimum wage (i.e., just over 5 euros) in the form
of a bol.com (a popular Dutch online retailer, similar to, e.g.,
Amazon) gift card for their time. Considering recruiters
and company representatives participated during working
hours, they were not compensated for their time beyond
3The exact instructions, as well as implementations and model
parameters, are available on https://github.com/Roan-Schellingerhout/
evaluating_job_recommendations
their regular payroll.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model and data</title>
        <p>
          The heterogeneous explainable graph neural network
(eGNN) used to generate the recommendations was
implemented using PyTorch geometric [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. Its architecture
builds upon existing model designs [
          <xref ref-type="bibr" rid="ref28 ref35 ref36 ref37">35, 36, 37, 28</xref>
          ], but is
altered to allow for heterogeneous data to be considered,
while also generating separate predictions and explanations
for both the user (candidate) and provider (company). Our
implementation consisted of a node and edge embedding
layer for both textual (based on DPR [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]) and categorical
data. Textual data was pre-tokenized to adhere to PyTorch
Geometric’s data conventions. Then, during inference, the
tokens were retrieved, embedded, and readded to the graph.
Categorical data was embedded using one-hot encoding.
After having embedded the non-numerical data, the entire
graph was fed into a general sub-graph embedding layer
(using a GATv2conv [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]). The output of this layer was
a generic embedding of the entire sub-graph as a whole,
which was then fed into two parallel stakeholder-specific
scoring layers (using Graph Transformers [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]). Both
parallel stakeholder-specific layers provided a ‘matching score’
based on the sub-graph embedding as their outputs, which
were then combined and fed into a linear layer to make a
ifnal prediction. The model was trained on a public job
recommendation dataset provided by Zhaopin, China’s largest
online recruitment platform.4 This publicly available dataset
contains (i) 4.5 thousand job seekers, who are represented
by features such as their age, education, experience, and
desires (e.g., preferred city and industry); (ii) 4.78 million
job postings, which contain information on the specific job,
as well as general details of the company; and (iii) 700
thousand recorded interactions between the two, which consist
of four stages: no interaction, browsed (either party showed
interest by looking at the other’s CV/posting online),
delivered (the parties were presented to each other), and satisfied
(the parties were actually matched up). To use these labels
as ground truth values in the user experiment, we
considered the highest-rated item (candidate or job) in the list to
be the ‘correct’ answer, with the second highest-rated item
to be the second-best answer, etc. making sure there were
no ties for the highest-rated position.
        </p>
        <p>We converted this tabular dataset to a knowledge graph
using a manually defined ontology. 5 The ontology was
created based on the relations between the available features
in the dataset. We converted feature types to edges, and
feature values to nodes. E.g., if a candidate had the value
‘transport’ for their ‘current industry’ feature, the
resulting triple would be (candidate, worksInIndustry, transport).
This allowed for previously non-existing connections
between candidates and vacancies to arise (e.g., a path from
a candidate to a vacancy based on the candidate sharing a
common skill with another person who had previously
fuliflled the position). This approach led to a final knowledge
graph consisting of over 280 thousand nodes and nearly
1.6 million edges, with every node having 5.6 neighbors on
average.</p>
        <p>
          By then creating sub-graphs from this knowledge graph
through the use of a k-random walk algorithm ( = 7 , 50
walks per match)[
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] between job seekers and vacancies,
4https://www.zhaopin.com/
5The ontology can be found in our GitHub Repository
we created a graph ranking dataset. This was done ofline
and before the experiment was conducted. We trained the
eGNN on this newly created dataset by performing a grid
search of hyperparameters (learning rate, embedding sizes,
attention heads, etc.) on a randomly-sampled training set,
consisting of 80% of the data. The diferent configurations
were evaluated on a validation set, consisting of 10% of the
data. The last 10% of the data were used for a test set, which
served to estimate the real-world performance of the model.
The version of the model used to generate the explanations
for the experiment had a normalized discounted cumulative
gain (nDCG) of 0.606. Considering our focus lies on
explainability over model performance, we considered this score to
be acceptable for our case.
        </p>
        <p>Explanations were generated using the attention weights
of the model, which indicated the importance of the
diferent nodes and edges in the graphs. For the simplified view of
the explanations (which was enabled by default, and rarely
altered), only the top three paths that had received the
highest attention scores were considered. To compare during
the experiments, we also generated random explanations,
which simply randomized the attention scores provided by
the model (while still adding up to 1.0).</p>
        <p>
          The explanations received from the model were turned
into JSON-objects to be used in the web environment and
were displayed as weighted graphs using vis.js6 (Fig. 1
component 4). Furthermore, the graphs were turned into bar
charts by summing the incoming attention for each node
(Fig. 1 component 3), and into textual explanations (Fig. 1
component 2) by feeding the JSONs into chatGPT [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ].7 This
strategy to generate textual explanations has been shown to
be apt in previous works [
          <xref ref-type="bibr" rid="ref41 ref9">9, 41</xref>
          ], given that proper
instructions are provided.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Variables</title>
        <p>To determine the objective benefit of adding explanations to
the recommendations, we compared multiple metrics that
represent diferent aspects of usability and usefulness of the
environment. These metrics were compared for the two
independent variables present in our design:
Scenario: the type of explanation presented to the user
(within-subject, categorical: random or real);
Stakeholder type: the participant’s stakeholder type
(between-subject, categorical: candidate, recruiter,
or company representative).</p>
        <p>For these settings, we then compare the values of the five
dependent variables:
Correctness: whether the decision of the participant is
correct, based on the ground truth values in the
data (scale from 0-3, where higher values are more
correct);
Eficiency: the time in seconds it takes the participants to
decide;
Transparency: how much the explanation helps the
participant understand how the model made the
recommendation (scale from 1-10);</p>
        <sec id="sec-3-3-1">
          <title>6https://visjs.org/</title>
          <p>7The prompt used to generate the explanations is included in our GitHub
repository.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>SQ1: To what extent do the explanations assist the
stakeholders in their decision-making process?
Correctness</p>
      <p>When focusing on correctness, we find that
participants often did not change their decision in
between rounds (only 7 participants switched between rounds),
meaning the diference in correctness between the random
and real explanations was small, and not significant
(companies:  = 42,  = 20,  = .579
, candidates:  = 51,  =
20,  = 1.000 , recruiters:  = 40,  = 20,  = .481
).
However, the trend was for the random explanations to lead to
more correct answers than the genuine ones (c.f., Table 2).
Participants often used their own knowledge to come to
a conclusion, even if that knowledge did not align with
the explanation. For example, the candidate most often
selected by company representatives and recruiters had work
experience similar to the vacancy. While this experience
was also the most important feature for the real
explanation, it was of below-average importance in the random
explanation. Regardless, the participants still described it
as the main argument of the model, even when viewing the
random explanation. In other words, they considered the
weighted arguments provided by the model based more on
their own intuition, rather than the importance prescribed
by the model. This decision-making process occurred with
a broad range of participants, especially those who were
more reluctant to trust the system.</p>
      <p>Eficiency</p>
      <p>In terms of eficiency , both scenarios
performed similarly (companies:  = 53,  = 20,  = .853
candidates:  = 40.5,  = 20,  = .481
, recruiters:  =
44,  = 20,  = .684</p>
      <p>). Regardless of the order in which the
scenarios were presented, the one shown second usually led
to a faster decision. This shows that, since the participants
had already decided in the first round, they only had to
confirm this decision in the second round (as the same list
of options was presented in both rounds). We conducted a
post-hoc analysis to better determine the contribution of
order efects on eficiency (Section</p>
      <sec id="sec-4-1">
        <title>5.1.1). Furthermore, the</title>
        <p>participants largely indicated that they had become more
familiar with the environment the second time, making them
more adept and eficient when comprehending the
explanations (e.g., P16: “Yes, well, it does take some time; you need to
get into it for a bit. But it’s slowly starting to make sense now.
Indeed, it’s not something you immediately grasp and say, oh
yes, that’s how it works”).</p>
        <p>Transparency</p>
      </sec>
      <sec id="sec-4-2">
        <title>Regarding the perceived transparency, we</title>
        <p>notice that most participants had some dificulty in
finding the diference between the two models; largely because
the same list of recommendations was presented for both.
Recall that participants viewed the same underlying data
and only the importance of diferent features changed.
However, upon further inspection, participants did determine
the genuine explanations to be slightly better at
explaining the match, mostly due to them feeling more ‘sensible’
and ‘descriptive.’ This diference in transparency was,
however, not statistically significant (companies:  = 57,  =
20,  = .684 , candidates:  = 52,  = 20,  = .912
, recruiters:
 = 59.5,  = 20,  = .481</p>
        <p>).</p>
        <p>Participants who used the graph-based explanation more,
found the diference in edge and node weights provided by
the genuine explanation to be useful when mentally parsing
the graph. Since the diference between path weights was
larger in the real scenario (i.e., in the random setting, most
paths had fairly equal weights - roughly 1 with  being the
number of paths. Conversely, in the real setting, paths
de
termined to be important by the model had noticeably more
weight than the rest - e.g., &gt; 0.9), they could more quickly
determine what arguments the model had determined to be
important, after which they could judge if they agreed with
those arguments.</p>
        <p>Usefulness</p>
        <p>The real explanations also had a positive, but
not significant, impact on participants’ perceived
usefulness
(companies:  = 55,  = 20,  = .739
, candidates:  =
56.5,  = 20,  = .631
, recruiters:  = 63,  = 20,  = .353
).</p>
        <p>In both the random and real scenarios, participants indicated
the explanations as being helpful as a push in the right
direction, but insuficiently detailed to base an actual decision on.
Participants who used the textual explanation most found
very little diference in the degree to which the explanations
helped them make a decision. This was caused by the fact
1.10 (1.04)
0.90 (0.32)
7.02 (1.22)
7.39 (1.18)
6.70 (1.35)
7.00 (1.25)
that both texts included the same arguments (because both
were based on the same data), mainly difering in diferent
aspects being described as more or less important. However,
this diference was quite nuanced, presenting itself in
subtle diferences of phrasing (e.g., “somewhat important” vs.
“very important”), causing it to not be noticed much. In the
bar chart and graph, the diference in perceived usefulness
was more noticeable, and participants who focused mainly
on those explanation types rated the real explanation as
slightly more useful.</p>
        <p>Trust Regarding the perceived trust, the
recommendations were rated as slightly more trustworthy when the
participants were presented with real explanations because
those were perceived as better at explaining why a match
was made. However, this increase was not statistically
significant for any stakeholder group (companies:  = 55.5,  =
20,  = .684 , candidates:  = 59,  = 20,  = .529 , recruiters:
 = 52.5,  = 20,  = .853 ). One thing that limited the trust
in both scenarios was the inclusion of other candidates in
the explanation (e.g., including that a candidate with similar
skills to the recommended candidate has also fulfilled a
specific vacancy in the past), which often led to confusion and
uncertainty. This was perceived as a weak or nonsensical
argument by participants, which decreased the trust they
had in the system. Some participants mentioned this could
be improved by rephrasing the argument to be more general,
rather than referring to individuals, e.g., by mentioning that
‘similar candidates’ fulfilled this vacancy in the past.
SQ2: What is the impact of diferent explanation
components on the stakeholders’ understanding of the
explanation? While trying to find a match, recruiters
and candidates strongly preferred the textual explanations,
as they found those easiest to work with. Especially those
without a ‘technical’ background were reluctant when
using the visual explanations, as those required participants
to compare and evaluate diferent numbers. Some
participants even ignored the visualizations entirely, as they found
the textual explanations to be suficient. Company
representatives indicated that the graph-based explanation
was helpful, but the opinion was split, and understanding
how to read the graph was often reliant on reading the
text as well. The complexity of the graphs was exacerbated
by overlapping edges, which added an additional layer of
dificulty (as they required extra efort to be understood
properly). However, once the company representatives had
familiarized themselves with the environment, most
indicated that they could see themselves using the graph a lot
more in the future, as it did give them a quick overview of
the connections between the vacancy and candidates. Some
participants mentioned the graph-based explanation could
be made more user-friendly by relating it more directly to
the vacancy at hand, e.g., if they could customize the
outgoing edges of the vacancy, limiting the graph to connections
directly related to what they considered the most important
aspects of the vacancy. The bar chart was not used much by
most stakeholders, only being used actively by a handful of
participants, as it provided too little context to substantiate
a decision. Since a lot of the explanations were based on
connections between diferent data points, simply having
feature attributions did not paint a suficiently
comprehensive picture. Those who did use the bar chart, used it mostly
as a supporting tool that could help them determine what
parts of the textual explanation were most important.
SQ3: How can the explanations be improved to better
assist the diferent stakeholders? Most participants
indicated that the explanations could be helpful when
providing an overview of why things were recommended.
However, because the explanations only touched on a limited
number of features (i.e., at most three paths in the graph)
they found them hard to trust, as there could be additional
factors that they would consider important, but had not
been considered by the AI. As a result, participants
overwhelmingly indicated that the explanations could be used
as a ‘push in the right direction,’ but would need additional
verification/exploration to be actually used. Furthermore,
the explanations were indicated to be too ‘generic’ by
multiple participants. They indicated that explanations would
be more useful if they included explicit references to the
CV/vacancy (e.g., “The vacancy requires two years of
experience, which the candidate possesses”). This was especially
important for hard requirements (such as minimal education
or experience), which should be verified before considering
the rest of the recommendation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>We now discuss our results relating to the three sub-research
questions and their accompanying hypotheses. We also
address the ethical concerns and limitations of this work
and make recommendations for future research directions.</p>
      <sec id="sec-5-1">
        <title>5.1. Assisting in decision-making</title>
        <p>Based on our findings we reject both hypotheses H1a ( real
explanations will help participants find matches more quickly,
and make the correct decision more often, compared to
random explanations) and H1b (participants will respond more
positively to a recommendation environment that includes the
real explanations). For H1a, we found no evidence that the
real explanations allowed participants to make the correct
decision more often compared to the random explanations,
and we merely found a weak trend that the real explanations
enabled participants to decide more quickly.</p>
        <p>There was a large diference in eficiency between the first
and second rounds, regardless of which of the two scenarios
was presented first. Since the lists of items shown in both
scenarios were identical (with only their content changing),
the participants simply looked for any large diferences that
could change their minds, rather than going through the
full explanations a second time (sometimes even explicitly
stating that they would not need to give the explanations
another look). We return to the measurement of eficiency
in the post-hoc analysis in Section 5.1.1.</p>
        <p>Although we did find some trends in line with H1b, these
ifndings were not statistically significant. We determine
that this is in large part caused by the fact that most
participants did not actively engage with the explanations. While
most participants gravitated to the textual explanations,
they often used these explanations to look for additional
information on the candidate and vacancy, rather than to
understand why a match was made. They would then use
the information they had gathered from the explanations
to manually decide for themselves, regardless of what was
mentioned in the explanations. This behavior was
exacerbated by the fact that the diference between scenarios for
textual explanations was relatively small, e.g., alterations in
phrasing, such as changing from “very important” to “with
limited impact.” These small diferences often went
unnoticed, causing participants to rely on their own expertise
rather than the model’s recommendation. Therefore, when
using textual explanations to substantiate a
recommendation, the phrasing should be precise, strongly stressing to
what degree, and in what way, diferent factors contributed
to the recommendation; simply listing arguments seems
insuficient.
5.1.1. Post-hoc analyses
We additionally investigated whether the lack of diference
in eficiency was influenced by the fact that participants
were exposed to both scenarios. To do this, we only
consider the values from the first run of each participant, which
was randomly selected to be real or random with a 50%
chance). When doing so, we did not find any statistically
significant diferences in terms of eficiency for any of the
stakeholder types ( = 11.0,  = 10,  = .914 for
recruiters,  = 8.5,  = 10,  = .667 for candidates, and
 = 15.0,  = 10,  = .690 for company representatives).
As in the analysis with the full sample, the trend was
toward real explanations allowing the participants to decide
more quickly. This lends further support to the conclusion
that the diference between the two conditions is small and
unlikely to lead to large diferences in eficiency.</p>
        <p>Surprisingly, there was a trend for participants to make
incorrect decisions more often when presented with real
explanations. To better understand which kinds of mistakes
were made, we analyzed the justifications given by
participants who incorrectly switched when presented with the
real explanation, or correctly switched when presented with
the random explanation.</p>
        <p>We find that the random explanation was more likely to
enable participants to decide using their prior knowledge:</p>
        <sec id="sec-5-1-1">
          <title>P23: “Yes, because a little more emphasis</title>
          <p>is placed on his qualifications in accounting
and finance, and his experience”</p>
          <p>In contrast, the real explanations were more likely to steer
them into a specific decision:</p>
          <p>P14: “At least, because those lines are
thicker, they reflect the pattern much more
of, hey, what are the core elements? And
then I actually see this line compared to these
two lines, then I think you see very clearly
that connection is much stronger with
[incorrect candidate] than it is with [correct
candidate]”</p>
          <p>Once more, this indicates a lack of engagement with the
explanations from the participants. As long as there was
no large discrepancy between the participants’ initial
decision and the explanation provided by the model, they
would use the explanation to justify their choice, regardless
of whether or not that justification was grounded in the
content of the explanation. Considering the random
explanations did not have any “strong” arguments in the
subgraph (i.e., paths with a significantly higher attention value
attributed to them), participants were more often able to
disregard the model’s arguments and use their prior
knowledge instead. Therefore, we conclude that participants
experienced more (healthy) ‘friction’ when interacting
with the real explanations (as those sometimes disagreed
with them), while they could nearly always justify their
personal reasoning using the random explanations (as those
were not as ‘decisive’ with their weights).</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Explanation components’ impact</title>
        <p>Considering the responses to the interview questions, we
can accept H2 (Candidates and recruiters will mainly use
textual explanations ... company representatives will gain
more from graph-based explanations). Although company
representatives were more split on the graph-based
explanations than expected, they still viewed it more positively
than the other stakeholders. While some struggled with it
initially, they could see its benefits after comparing it to the
text. Especially those with more technical backgrounds
(engineering, finance, AI, etc.) were quick to master the graphs.
As expected, candidates and recruiters stuck mostly to
textual explanations, indicating that those were suficiently
expressive while not being overwhelming or intimidating.
For some, the texts were suficient, and those participants
did not view the visual explanations at all (three participants
in total). This again shows that most participants were more
inclined to use the textual explanations to create a clear
image of the situation rather than actually considering it as
an explanation. I.e., they often picked the facts from the
textual explanation (e.g., “this candidate has X work
experience, which is relevant to the job”) and then ‘manually’
weighed those individual facts to come to a decision, rather
than using the model’s assigned relevance.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Improving the explanations</title>
        <p>As for H3a (... All stakeholders will benefit from additionally
including negative attention weights), our findings are in line
with our hypothesis: all stakeholders will benefit from
additionally including negative attention weights. Although
not mentioned explicitly by any participant, it is clear that
most struggled to make sense of some of the weights in the
explanation (e.g., being confused why something they
considered irrelevant had a relatively high weight), showing
that it was unclear to them that this was a strong
argument against the match. This was likely exacerbated by the
fact that some participants (incorrectly) assumed the five
matches presented in the environment were the top five
recommendations. They, therefore, assumed that all arguments
presented by the explanation were meant to convince them
of why the item was a good match, rather than possibly
explaining why it was not.</p>
        <p>
          On the other hand, we found no evidence for H3b (An
explanation that explains why recommendation X was ranked
higher than Y and Z will be desirable for all stakeholders), as
an additional explanation that justifies why one
recommendation was ranked higher than another was not desirable
for all stakeholder types within the current interface. The
participants sometimes struggled to choose a single best
option (candidate or vacancy), but did so anyway by manually
evaluating the explanations for the individual items and
analyzing the possibilities. Rather than needing clarification of
the diference in ratings between two items, the participants
used their own reasoning and prior knowledge (e.g.,
regardless of what the model said, what they would consider to
be relevant work experience) to determine which item was
most suitable. Considering participants were sometimes
already overwhelmed by the amount of information shown in
the interface, additionally including list-wise explanations
is likely to work counterproductively. As a result, we
determine that the focus of XAI systems in high-risk domains
should lie more on decision support, rather than persuading
users of the model’s prediction’s correctness [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ].
Concretely, this could take the shape of a system wherein the
most promising matches are presented in a list, with their
drawbacks (the arguments with the most negative values)
and benefits (the arguments with the most positive values)
clearly listed.
        </p>
        <p>The need for interactivity within the system was also
implicitly stressed multiple times, e.g., by participants
wondering why a specific feature was (not) included in the
explanation. By allowing users to pick which features they want
to evaluate manually, interactivity could be enabled, while
simultaneously lowering the risk of information overload
that would occur by showing all arguments at once.
Therefore, one approach could be to present the CV/vacancy to the
user, with the aspects mentioned in the explanations (e.g.,
work experience deemed relevant, or a lack of proficiency in
a skill) highlighted and selectable – ideally even with a clear
distinction between hard and soft requirements. Users could
then choose which aspects they want to consider in their
decision based on both what the model recommended, and
what they deem important. This would give users access
to all of the data, while allowing them to focus their
attention on the features most likely to be important, thereby
balancing the need for interactivity, the risk of information
overload, and the focus on decision-support over
persuasion.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Ethical concerns</title>
        <p>This paper was granted ethical approval by the ethical
committee of our university8 before the experiments were
conducted. However, some ethical concerns are still present.</p>
        <p>Considering the large efects job recommender systems
can have on stakeholders’ lives (or operations), they should
be used with great caution and suficient security checks
in place. The main approach to this is to always have a
human in the loop who is responsible for making a final
decision (e.g., a recruiter who interprets the model’s output
and decides whether to accept it).</p>
        <p>
          This also alleviates the second ethical concern present
within the field of JRSs: the fear of being replaced that was
expressed by recruiters. While the current system did not
perform well enough for recruiters to be concerned for now,
they did indicate that research on such systems made them
feel somewhat uneasy, as they ultimately worried about
being replaced. While legislature such as the EU AI act [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
prevents recruiters from being replaced entirely, their fear is
still justified, as such systems, when implemented without
proper ethical guidelines, could allow companies to hire
significantly fewer recruiters. To reassure recruiters, JRS
research should focus strongly on solely supporting recruiters,
rather than doing their job for them. This will
simultaneously ensure adherence to the current legislature and relieve
recruiters. Without suficient human recruiters, the field
of job recommendation will ultimately stagnate, as ground
truth values used to train models are still overwhelmingly
human-generated. Without such human inputs, the field
will ultimately dig its own grave as models will be trained on
(possibly biased and incorrect) outputs of previous models.
        </p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Limitations and future work</title>
        <p>The examples shown in the environment were manually
selected to allow for a larger variety of choices and
explanations, e.g., similar/dissimilar items (e.g., vacancies in
different industries) and straightforward/complicated
explanatory paths (explanation graphs ranged from having 6 to 12
total edges). During this selection process, any explanations
of noticeably low quality were discarded. However, any
limitations in model accuracy (model performance was not
our main focus) could also influence the perceived quality
of explanations, as explanations for very poor
recommendations are unlikely to be perceived well. By performing
the manual curation process for the explanations, however,
we also ensured that the recommendations shown to the
participants were at least of moderate quality; we,
therefore, expect explanations generated by a better-performing
model to be received similarly. Regardless, our future work
will use a better-performing model allowing us to explain
8The Ethical Review Committee Inner City Faculties of
Maastricht University (https://www.maastrichtuniversity.nl/
ethical-review-committee-inner-city-faculties-ercic)
well a wider range of recommended items, enabling, for
example, online testing.</p>
        <p>
          Similarly, we specifically opted for the use of
modelintrinsic explanations, as the use of attention mechanisms
allowed us to generate separate candidate- and company-side
explanations. However, it would have been interesting to
compare these explanations to those generated by existing,
state-of-the-art, post-hoc methods, namely SHAP [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and
LIME [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Considering those methods do not allow the
generation of such separate explanations, we did not consider
them to be within the scope of this research. Regardless,
future work could compare the use of stakeholder-specific
to ‘generic’ explanations to evaluate to what extent such
specific explanations benefit the stakeholders.
        </p>
        <p>One aspect that has likely dampened the diference
between the real and random scenario, is how the explanations
were generated. Although the explanation weights were
diferent in the two scenarios, the data on which the
explanation was generated was the same, and the randomization
only occurred after a list of recommendations was created.
This was done to allow for a fair comparison between the
two scenarios (as otherwise the list of recommended items
would have changed), but ended up making the diference
too small to notice for most. This was further exacerbated
by the fact that most participants gravitated towards the
textual explanations, in which exact values were not shown.
Considering most participants did not pay attention to the
exact weights of diferent components, instead focusing
on what they considered important, having both scenarios
include the same list of features, with only their weights
altered, ended up strongly diminishing the noticeable
differences between the two settings. Therefore, future work
should use a more noticeably diferent baseline to compare
explanations to, e.g., a fully randomized explanation.</p>
        <p>
          Additionally, to assist users in actually making use of all
important information within explanations, our next work
will attempt to mitigate the aforementioned cognitive bias
by trying to present explanations that increase healthy
friction [
          <xref ref-type="bibr" rid="ref44 ref45 ref46 ref47">44, 45, 46, 47</xref>
          ], without being overly coercive. As
mentioned, we predict interactive interfaces could be a fitting
medium for this; however, such interfaces generally have a
higher barrier of entry – it would therefore be interesting to
see if and how users improve (in terms of correctness, trust,
etc.) over time when exposed to interactive environments.
        </p>
        <p>Furthermore, while the company representatives and
recruiters could easily identify with the vacancy, some
candidates struggled to make decisions on behalf of someone
else. This was boosted by the fact that the synthetic
candidate they had to roleplay was not customizable. This made
it dificult for those with niche backgrounds to determine
what would be relevant for such a person. In our next steps,
we plan to address this by allowing candidates to either use
their own data to create recommendations or provide
diferent user profiles which would allow candidates to select a
CV most closely related to their personal expertise. Using
such a setup, it will be possible to conduct an experiment in
a more natural setting, wherein participants would need to
make actual decisions where disagreement with the system
could be studied in depth.</p>
        <p>Lastly, we expect to find similar results when performing
our experiment with stakeholders from diferent high-stakes
domains (e.g., medicine, finance). Based on our findings, we
predict similar domain experts with little AI knowledge to
be likely to show similar behavior, i.e., not actively engaging
with the explanations, and instead prioritizing their own
expertise. Therefore, we expect the demand for
supportive explanations over persuasive explanations to be a
general trend, not solely relevant to recruitment. However, as
stakeholders’ needs and preferences can be highly
domainspecific, future research should verify these expectations by
creating a similar environment catered to the needs of the
aforementioned domains’ specific stakeholders.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we find that job recommender system
stakeholders struggle to identify diferences in the feature
importance of recommendation explanations. We determined that
this is because most participants struggled to diferentiate
between the two scenarios.</p>
      <p>We also believe that this dificulty in distinguishing real
and random explanations is influenced by their pre-existing
expectation and knowledge about the domain. When
indicating that they struggled to find a diference between
the two settings, participants often mentioned that they
attributed more value to the high-level arguments (e.g.,
“candidate X had experience in field Y” ) that were being provided
by the explanations, rather than the details (e.g., “this
experience had a rather limited impact on the prediction”) of
how much those arguments contributed to the final
decision. Therefore, we determine that job recommendation
explanations are mainly useful when providing users with
a general overview of the ‘big picture.’ Current
implementations, such as feature attribution and textual explanations
tend to be insuficient when providing users with detailed
information on how much diferent features weighed in the
prediction. However, such details could be especially useful
when trying to alleviate confirmation bias.</p>
      <p>We found that focusing on the big picture is especially
important for textual explanations, as diferences in
importance can be hard to relativize. Considering textual
explanations were the preferred medium for both recruiters and
candidates, future systems should make sure those are
implemented in strong accordance with stakeholders’ demands.
For company representatives, on the other hand,
visualizations were more useful. While nuanced diferences are
easier to distinguish in such explanations, the higher-level
arguments being included should still be carefully
considered, as simply including all arguments (i.e., showing the
entire graph at once) will often lead to information
overload. Therefore, regardless of the explanation medium being
used, interactive interfaces could assist users by allowing
them to view all available data, while minimizing the risk
of information overload.</p>
      <p>Furthermore, designing the explanations in such a way
that they function as decision-support tools, rather than as
persuasive tools to convince users of the model’s
correctness, would be beneficial for all stakeholders. Providing
the explanations as a clear description of the strengths and
weaknesses of each of the recommendations will make the
explanations more clear and allow users to make up their
minds independently. We, therefore, recommend
practitioners pivot their focus away from designing persuasive
explanations for JRSs, instead focusing on
decision-supportoriented approaches.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Appendix</title>
      <p>1. Correct interpretation
Questions
Probing questions
2. Transparency
3. Usefulness
4. Trust
5. Preference
To assess whether or not
the stakeholder can
correctly interpret the
explanation.</p>
      <p>To determine the
explanation’s efect on
understanding the model’s inner
workings.</p>
      <p>To evaluate how useful the
explanations are considered
to be.</p>
      <p>To gauge the explanation’s
impact on the model’s
trustworthiness.</p>
      <p>To figure out the personal
preference of the
stakeholder.
1.1 What
information/features do
you think were most
important for this
prediction?
1.2 What was the least</p>
      <p>important?
1.3 How would you put the
model’s explanation
into your own words?
2.1 Does the explanation
help you comprehend
why the system gave
the recommendation?
3.1 Does the explanation</p>
      <p>make sense to you?
3.2 Does the explanation
help you make a
decision?
3.3 How could you see
yourself using the
explanation in your
daily work/task?
4.1 Do you think the
prediction made by the
model is reliable?
4.2 If this recommendation
was made for you,
would you trust the
model to have made
the right decision?
5.1 What would you like to
see added to the
current explanation?
5.2 What would you
consider to be
redundant within this
explanation?
1.1.1 What did you look at to come
to that conclusion?
2.1.1 What components help you
specifically?
2.1.2 What information is missing
that could allow you to get a
better understanding of the
model’s recommendation
3.1.1 What do you consider
sensible (e.g., focus on specific
features)?
3.1.2 What do you consider</p>
      <p>insensible?
3.2.1 Would you prefer a model
with explanations over one
without?
4.2.1 Anything specific that makes
you say that (e.g., something
makes no sense, or is very
similar to how you look at
things)?
5.1.1 Any specific information that
is missing?
5.1.2 Any functionality that could
be useful?
5.2.1 Anything that should be
removed?
5.2.2 Or be made less prevalent?</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>C. De Ruijt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bhulai</surname>
          </string-name>
          ,
          <article-title>Job recommender systems: A review</article-title>
          ,
          <source>arXiv preprint arXiv:2111.13576</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Geyik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mithal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <article-title>Explainable ai in industry</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3203</fpage>
          -
          <lpage>3204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Explainable recommendation: A survey and new perspectives</article-title>
          ,
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , L. Rosado,
          <article-title>Xai systems evaluation: A review of human and computercentred methods</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>9423</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tomsett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          , Stakeholders in explainable ai, arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>00184</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Charleer</surname>
          </string-name>
          , R. De Croon,
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Htun</surname>
          </string-name>
          , G. Goetschalckx,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbert</surname>
          </string-name>
          ,
          <article-title>Explaining and exploring job recommendations: a user-driven approach for interacting with knowledge-based job recommender systems</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Conference on Recommender Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Millecamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Htun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Conati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbert</surname>
          </string-name>
          ,
          <article-title>To explain or not to explain: the efects of personal characteristics when explaining music recommendations</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>397</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbert</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Vanden Abeele, Designing and evaluating explainable ai for non-ai experts: challenges and opportunities</article-title>
          ,
          <source>in: Proceedings of the 16th ACM Conference on Recommender Systems</source>
          , RecSys '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>735</fpage>
          -
          <lpage>736</lpage>
          . URL: https://doi. org/10.1145/3523227.3547427. doi:
          <volume>10</volume>
          .1145/3523227. 3547427.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Schellingerhout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>A co-design study for multi-stakeholder job recommender system explanations</article-title>
          ,
          <source>in: World Conference on Explainable Artificial Intelligence</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <article-title>Proposal for a regulation of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts</article-title>
          ,
          <year>2021</year>
          . URL: https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=celex%3A52021PC0206.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mashayekhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lijfijt</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. De Bie</surname>
          </string-name>
          ,
          <article-title>A challenge-based survey of e-recruitment recommendation systems</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Hu,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Towards efective and interpretable person-job fitting</article-title>
          ,
          <source>International Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          (
          <year>2019</year>
          )
          <fpage>1883</fpage>
          -
          <lpage>1892</lpage>
          . URL: https://doi.org/10.1145/3357384.3357949. doi:
          <volume>10</volume>
          . 1145/3357384.3357949.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abu-Rasheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fathi</surname>
          </string-name>
          ,
          <article-title>Explainable job-posting recommendations using knowledge graphs and named entity recognition</article-title>
          ,
          <source>Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics</source>
          (
          <year>2021</year>
          )
          <fpage>3291</fpage>
          -
          <lpage>3296</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMC52423.
          <year>2021</year>
          .
          <volume>9658757</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Yıldırım</surname>
          </string-name>
          , P. Azad, Ş. G.
          <article-title>Öğüdücü, bideepfm: A multiobjective deep factorization machine for reciprocal recommendation</article-title>
          ,
          <source>Engineering Science and Technology, an International Journal</source>
          <volume>24</volume>
          (
          <year>2021</year>
          )
          <fpage>1467</fpage>
          -
          <lpage>1477</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nauta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trienes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          , E. Nguyen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schlötterer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van Keulen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seifert</surname>
          </string-name>
          ,
          <article-title>From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Millecamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbert</surname>
          </string-name>
          ,
          <article-title>Visual, textual or hybrid: the efect of user expertise on diferent explanations</article-title>
          ,
          <source>in: 26th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>European</surname>
            <given-names>Parliament</given-names>
          </string-name>
          ,
          <article-title>Council of the European Union, Regulation (EU) 2016/679 of the European Parliament and of the Council,</article-title>
          <year>2016</year>
          -
          <volume>05</volume>
          -
          <fpage>04</fpage>
          . URL: https://data. europa.eu/eli/reg/2016/679/oj.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Robin</surname>
            <given-names>Burke</given-names>
          </string-name>
          ,
          <article-title>Closing: Whats's next for recsys?</article-title>
          ,
          <year>2023</year>
          -
          <volume>06</volume>
          -
          <fpage>16</fpage>
          . URL: https://www.dropbox.com/sh/ 8u3ye01cdnffcbz/AABB82TvbsLs1QKDjjkqvb0Ja?
          <article-title>dl= 0&amp;preview=S18-closing-slides</article-title>
          .pptx.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Vasconcelos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jörke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grunde-McLaughlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gerstenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <article-title>Explanations can reduce overreliance on ai systems during decision-making</article-title>
          ,
          <source>Proceedings of the ACM on HumanComputer Interaction</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Schellingerhout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Medentsiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <article-title>Explainable career path predictions using neural models (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , ”
          <article-title>why should i trust you?” explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chromik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eiband</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Buchner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krüger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Butz</surname>
          </string-name>
          ,
          <article-title>I think i get your point, ai! the illusion of explanatory depth in explainable ai</article-title>
          ,
          <source>in: 26th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jesus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Belém</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Balayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Saleiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bizarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <article-title>How can i choose an explainer? an application-grounded evaluation of post-hoc explanations</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>805</fpage>
          -
          <lpage>815</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3319</fpage>
          -
          <lpage>3328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Veličković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cucurull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Graph attention networks</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Teufel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Torresi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Friederich</surname>
          </string-name>
          ,
          <article-title>Megan: Multi-explanation graph attention network</article-title>
          ,
          <source>in: World Conference on Explainable Artificial Intelligence</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>338</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <article-title>Listwise explanations for ranking models using multiple explainers</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>653</fpage>
          -
          <lpage>668</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>H.</given-names>
            <surname>Degen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Budnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Conte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lintereur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <article-title>How to explain it to energy engineers? a qualitative user study about trustworthiness, understandability, and actionability</article-title>
          , in: International Conference on Human-Computer Interaction, Springer,
          <year>2022</year>
          , pp.
          <fpage>262</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Pérez-Chada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Creadore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manjaly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Pournamdari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Ko</surname>
          </string-name>
          , et al.,
          <article-title>Patient perspectives on the use of artificial intelligence for skin cancer screening: a qualitative study</article-title>
          ,
          <source>JAMA dermatology 156</source>
          (
          <year>2020</year>
          )
          <fpage>501</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-L. S.</given-names>
            <surname>Pysander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.-L.</given-names>
            <surname>Söderberg</surname>
          </string-name>
          ,
          <article-title>Not transparent and incomprehensible: A qualitative user study of an ai-empowered financial advisory system</article-title>
          ,
          <source>Data and Information Management</source>
          (
          <year>2023</year>
          )
          <fpage>100041</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>J. M. Morse</surname>
          </string-name>
          , Determining sample size,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Lenssen</surname>
          </string-name>
          ,
          <article-title>Fast graph representation learning with PyTorch Geometric</article-title>
          ,
          <source>in: ICLR Workshop on Representation Learning on Graphs and Manifolds</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Brody</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Alon</surname>
          </string-name>
          , E. Yahav,
          <article-title>How attentive are graph attention networks?</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Masked label prediction: Unified message passing model for semi-supervised classification</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , M. Liu, Chua, Kgat:
          <article-title>Knowledge graph attention network for recommendation</article-title>
          .,
          <source>in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, Anchorage</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oğuz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. T. Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          ,
          <source>in: 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics (ACL</article-title>
          ),
          <year>2020</year>
          , pp.
          <fpage>6769</fpage>
          -
          <lpage>6781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lovász</surname>
          </string-name>
          , Random walks on graphs, Combinatorics, Paul erdos is eighty 2 (
          <year>1993</year>
          )
          <article-title>4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Chatgpt:
          <article-title>Openai's conversational ai</article-title>
          , https: //openai.com/blog/chatgpt,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>T.</given-names>
            <surname>Susnjak</surname>
          </string-name>
          ,
          <article-title>Beyond predictive learning analytics modelling and onto explainable artificial intelligence with prescriptive analytics and chatgpt</article-title>
          ,
          <source>International Journal of Artificial Intelligence in Education</source>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>V.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <article-title>Thematic analysis</article-title>
          ., American Psychological Association,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explainable ai is dead, long live explainable ai! hypothesis-driven decision support using evaluative ai</article-title>
          ,
          <source>in: Proceedings of the 2023 ACM conference on fairness, accountability, and transparency</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Buçinca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Malaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Z.</given-names>
            <surname>Gajos</surname>
          </string-name>
          ,
          <article-title>To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making</article-title>
          ,
          <source>Proc. ACM Hum.-Comput. Interact</source>
          .
          <volume>5</volume>
          (
          <year>2021</year>
          ). URL: https: //doi.org/10.1145/3449287. doi:
          <volume>10</volume>
          .1145/3449287.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hertwig</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Grüne-Yanof,
          <article-title>Nudging and boosting: Steering or empowering good decisions</article-title>
          ,
          <source>Perspectives on Psychological Science</source>
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <fpage>973</fpage>
          -
          <lpage>986</lpage>
          . doi:
          <volume>10</volume>
          . 1177/1745691617702496, pMID:
          <fpage>28792862</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lorenz-Spreen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lewandowsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Sunstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hertwig</surname>
          </string-name>
          ,
          <article-title>How behavioural sciences can promote truth, autonomy and democratic discourse online</article-title>
          ,
          <source>Nature human behaviour 4</source>
          (
          <year>2020</year>
          )
          <fpage>1102</fpage>
          -
          <lpage>1109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Draws</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Nudges to mitigate confirmation bias during web search on debated topics: Support vs</article-title>
          . manipulation,
          <source>ACM Transactions on the Web</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>