=Paper= {{Paper |id=Vol-2903/IUI21WS-TExSS-12 |storemode=property |title=A Study on Fairness and Trust Perceptions in Automated Decision Making |pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-TExSS-12.pdf |volume=Vol-2903 |authors=Jakob Schoeffer,Yvette Machowski,Niklas Kuehl |dblpUrl=https://dblp.org/rec/conf/iui/SchofferMK21 }} ==A Study on Fairness and Trust Perceptions in Automated Decision Making== https://ceur-ws.org/Vol-2903/IUI21WS-TExSS-12.pdf

A Study on Fairness and Trust Perceptions in
Automated Decision Making
Jakob Schoeffera , Yvette Machowskia and Niklas Kuehla
a Karlsruhe Institute of Technology (KIT), Germany

Abstract
Automated decision systems are increasingly used for consequential decision making—for a variety of
reasons. These systems often rely on sophisticated yet opaque models, which do not (or hardly) allow
for understanding how or why a given decision was arrived at. This is not only problematic from a legal
perspective, but non-transparent systems are also prone to yield undesirable (e.g., unfair) outcomes
because their sanity is difficult to assess and calibrate in the first place. In this work, we conduct a
study to evaluate different attempts of explaining such systems with respect to their effect on people’s
perceptions of fairness and trustworthiness towards the underlying mechanisms. A pilot study revealed
surprising qualitative insights as well as preliminary significant effects, which will have to be verified,
extended and thoroughly discussed in the larger main study.

Keywords
Automated Decision Making, Fairness, Trust, Transparency, Explanation, Machine Learning

1. Introduction [1, 5]. One widespread assumption is that
ADS can also avoid human biases in the deci-
Automated decision making has become sion making process [1]. However, ADS are
ubiquitous in many domains such as hir- typically based on artificial intelligence (AI)
ing [1], bank lending [2], grading [3], and techniques, which, in turn, generally rely on
policing [4], among others. As automated historical data. If, for instance, this underly-
decision systems (ADS) are used to inform ing data is biased (e.g., because certain socio-
increasingly high-stakes consequential deci- demographic groups were favored in a dis-
sions, understanding their inner workings is proportional way in the past), an ADS may
of utmost importance—and undesirable be- pick up and perpetuate existing patterns of
havior becomes a problem of societal rele- unfairness [6]. Two prominent examples of
vance. The underlying motives of adopting such behavior from the recent past are the
ADS are manifold: They range from cost- discrimination of black people in the realm
cutting to improving performance and en- of facial recognition [7] and recidivism pre-
abling more robust and objective decisions diction [8]. These and other cases have put
ADS under enhanced scrutiny, jeopardizing
Joint Proceedings of the ACM IUI 2021 Workshops, April trust in these systems.
13–17, 2021, College Station, USA
" jakob.schoeffer@kit.edu (J. Schoeffer);
In recent years, a significant body of re-
yvette.machowski@alumni.kit.edu (Y. Machowski); search has been devoted to detecting and
niklas.kuehl@kit.edu (N. Kuehl) mitigating unfairness in automated decision
0000-0003-3705-7126 (J. Schoeffer); making [6]. Yet, most of this work has fo-
0000-0002-9271-6342 (Y. Machowski);
0000-0001-6750-0876 (N. Kuehl)
cused on formalizing the concept of fairness
© 2021 Copyright for this paper by its authors. Use permit-
ted under Creative Commons License Attribution 4.0 Inter-
and enforcing certain statistical equity con-
national (CC BY 4.0). straints, often without explicitly taking into
CEUR
http://ceur-ws.org
CEUR Workshop Proceedings
(CEUR-WS.org)
Workshop ISSN 1613-0073
Proceedings
account the perspective of individuals af- 2. Background and Related
fected by such automated decisions. In addi-
tion to how researchers may define and en-
Work
force fairness in technical terms, we argue It is widely understood that AI-based tech-
that it is vital to understand people’s percep- nology can have undesirable effects on hu-
tions of fairness—vital not only from an ethi- mans. As a result, topics of fairness, ac-
cal standpoint but also with respect to facili- countability and transparency have become
tating trust in and adoption of (appropriately important areas of research in the fields of
deployed) socio-technical systems like ADS. AI and human-computer interaction (HCI),
Srivastava et al. [9], too, emphasize the need among others. In this section, we provide an
for research to gain a deeper understanding overview of relevant literature and highlight
of people’s attitudes towards fairness in ADS. our contributions.
A separate, yet very related, issue re-
volves around how to explain automated de-
Explainable AI Despite being a popular
cisions and the underlying processes to af-
topic of current research, explainable AI
fected individuals so as to enable them to
(XAI) is a natural consequence of design-
appropriately assess the quality and origins
ing ADS and, as such, has been around at
of such decisions. Srivastava et al. [9] also
least since the 1980s [15]. Its importance,
point out that subjects should be presented
however, keeps rising as increasingly so-
with more information about the workings
phisticated (and opaque) AI techniques are
of an algorithm and that research should
used to inform evermore consequential deci-
evaluate how this additional information in-
sions. XAI is not only required by law (e.g.,
fluences people’s fairness perceptions. In
GDPR, ECOA2 ); Eslami et al. [16], for in-
fact, the EU General Data Protection Regu-
stance, have shown that users’ attitudes to-
lation (GDPR)1 , for instance, requires to dis-
wards algorithms change when transparency
close “the existence of automated decision-
is increased. When sufficient information
making, including [. . . ] meaningful informa-
is not presented, users sometimes rely too
tion about the logic involved [. . . ]” to the
heavily on system suggestions [17]. Yet, both
“data subject”. Beyond that, however, such
quantity and quality of explanations mat-
regulations remain often vague and little ac-
ter: Kulesza et al. [18] explore the effects
tionable. To that end, we conduct a study to
of soundness and completeness of explana-
examine in more depth the effect of different
tions on end users’ mental models and sug-
explanations on people’s perceptions of fair-
gest, among others, that oversimplification is
ness and trustworthiness towards the under-
problematic. We refer to [15, 19, 20] for more
lying ADS in the context of lending, with a
in-depth literature on the topic of XAI.
focus on
• the amount of information provided, Perceptions of fairness and trustworthi-
ness A relatively new line of research in
• the background and experience of peo- AI and HCI has started focusing on percep-
ple, tions of fairness and trustworthiness in auto-
• the nature of the decision maker (hu- mated decision making. For instance, Binns
man vs. automated). 2 Equal Credit Opportunity Act: https://www.cons
1 https://eur-lex.europa.eu/eli/reg/2016/679/oj (last umer.ftc.gov/articles/0347-your-equal-credit-opportu
accessed Jan 3, 2021) nity-rights (last accessed Jan 3, 2021)
Table 1
Overview of related work.
Amount of
Computer / AI Human involvement
Explanation provided Understandability
Reference experience in context
styles provided information tested
evaluated considered
evaluated
Binns et al.
distinct no single question no no
[10]
Dodge
distinct no not mentioned no no
et al. [11]
knowledge of individual in
Lee [12] distinct no no
algorithms management context
programming /
Lee and n/a due to study group decision in fair
no no algorithm
Baykal [13] setup division context
knowledge
algorithmic decision,
Wang et al.
distinct partly no computer literacy reviewed by group in
[14]
crowdsourcing context

individual in
distinct and construct with
Our work yes AI literacy provider-customer
combined multiple items
context

et al. [10] and Dodge et al. [11] compare fair- Our contribution We aim to complement
ness perceptions in ADS for four distinct ex- existing work to better understand how much
planation styles. Lee [12] compares percep- of which information of an ADS should be
tions of fairness and trustworthiness depend- provided to whom so that people are opti-
ing on whether the decision maker is a per- mally enabled to understand the inner work-
son or an algorithm in the context of manage- ings and appropriately assess the quality
rial decisions. Lee and Baykal [13] explore (e.g., fairness) and origins of such decisions.
how algorithmic decisions are perceived in Specifically, our goal is to add novel in-
comparison to group-made decisions. Wang sights in the following ways: First, our ap-
et al. [14] combine a number of manipula- proach combines multiple explanation styles
tions, such as favorable and unfavorable out- in one condition, thereby disclosing varying
comes, to gain an overview of fairness per- amounts of information. This differentiates
ceptions. An interesting finding by Lee et al. our method from the concept of distinct indi-
[21] suggests that fairness perceptions de- vidual explanations adopted by, for instance,
cline for some people when gaining an un- Binns et al. [10]. We also evaluate the under-
derstanding of an algorithm if their personal standability of explanations through multiple
fairness concepts differ from those of the al- items; and we add a novel analysis of the ef-
gorithm. Regarding trustworthiness, Kizil- fect of people’s AI literacy [23] on their per-
cec [22], for instance, concludes that it is im- ceptions of fairness and trustworthiness. Fi-
portant to provide the right amount of trans- nally, we investigate whether perceptions of
parency for optimal trust effects, as both too fairness and trustworthiness differ between
much and too little transparency can have having a human or an automated decision
undesirable effects. maker, controlling for the provided explana-
tions. For brevity, we have summarized rel-
evant aspects where our work can comple- list of all constructs and associated measure-
ment existing literature in Table 1. ment items for the case of automated deci-
sions. Note that for each construct we mea-
sure multiple items.
3. Study Design and Our analyses are based on a publicly avail-
Methodology able dataset on home loan application de-
cisions3 , which has been used in multiple
With our study, we aim to contribute novel Kaggle competitions. Note that compa-
insights towards answering the following rable data—reflecting a given finance com-
main questions: pany’s individual circumstances and ap-
proval criteria—might in practice be used
Q1 Do people perceive a decision process
to train ADS. The dataset at hand consists
to be fairer and/or more trustworthy if
of 614 labeled (loan Y/N) observations and
more information about it is disclosed?
includes the following features: applicant
Q2 Does people’s experience / knowledge income, co-applicant income, credit history,
in the field of AI have an impact on dependents, education, gender, loan amount,
their perceptions of fairness and trust- loan amount term, marital status, property
worthiness towards automated deci- area, self-employment. After removing data
sion making? points with missing values, we are left with
480 observations, 332 of which (69.2%) in-
Q3 How do people perceive human ver-
volve the positive label (Y) and 148 (30.8%)
sus automated (consequential) deci-
the negative label (N). We use 70% of the
sion making with respect to fairness
dataset for training purposes and the remain-
and trustworthiness?
ing 30% as a holdout set.
We choose to explore the aforementioned As groundwork, after encoding and scal-
relationships in the context of lending—an ing the features, we trained a random for-
example of a provider-customer encounter. est classifier with bootstrapping to predict
Specifically, we confront study participants the held-out labels, which yields an out-of-
with situations where a person was denied bag accuracy estimate of 80.1%. Our first ex-
a loan. We choose a between-subjects design planation style, (F), consists of disclosing the
with the following conditions: First, we re- features including corresponding values for
veal that the loan decision was made by a an observation (i.e., an applicant) from the
human or an ADS (i.e., automated). Then holdout set whom our model denied the loan.
we provide one of four explanation styles We refer to such an observation as a setting.
to each study participant. Figure 1 contains In our study, we employ different settings in
an illustration of our study setup, the ele- order to ensure generalizability. Please re-
ments of which will be explained in more de- fer to Appendix B for an excerpt of question-
tail shortly. Eventually, we measure four dif- naires for one exemplary setting (male ap-
ferent constructs: understandability (of the plicant). Note that all explanations are de-
given explanations), procedural fairness [24], rived from the data—they are not concocted.
informational fairness [24], and trustworthi- Next, we computed permutation feature im-
ness (of the decision maker); and we com- portances [25] from our model and obtained
pare the results across conditions. Addition-
ally, we measure AI literacy of the study par- 3 https://www.kaggle.com/altruistdelhite04/loan-pr

ticipants. Please refer to Appendix A for a ediction-problem-dataset (last accessed Jan 3, 2021)

Decision Maker: Human Decision Maker: ADS

Features (F) Features

Features + Feature Importance (FFI) Feature Importance + Features

Features + Feature Importance + Counterfactuals (FFICF) Counterfactuals + Feature Importance + Features

Counterfactuals (CF) Counterfactuals

Figure 1: Graphical representation of our study setup. Thick lines indicate the subset of conditions
from our pilot study.

the following hierarchy, using “≻” as a short- styles. We employ only model-agnostic ex-
hand for “is more important than”: credit planations [20] in a way that they could plau-
history ≻ loan amount ≻ applicant income ≻ sibly be provided by both humans and ADS.
co-applicant income ≻ property area ≻ mar-
ital status ≻ dependents ≻ education ≻ loan
amount term ≻ self-employment ≻ gender. Re- 4. Preliminary Analyses
vealing this ordered list of feature impor- and Findings
tances in conjunction with (F) makes up our
second explanation style (FFI). To construct Based on Section 3, we conducted an online
our third and fourth explanation styles, we pilot study with 58 participants to infer pre-
conducted an online survey with 20 quan- liminary insights regarding Q1 and Q2 and to
titative and qualitative researchers to ascer- validate our study design. Among the partic-
tain which of the aforementioned features ipants were 69% males, 29% females, and one
are actionable—in a sense that people can person who did not disclose their gender; 53%
(hypothetically) act on them in order to in- were students, 28% employed full-time, 10%
crease their chances of being granted a loan. employed part-time, 3% self-employed, and
According to this survey, the top-5 actionable 5% unemployed. The average age was 25.1
features are: loan amount, loan amount term, years, and 31% of participants have applied
property area, applicant income, co-applicant for a loan before. For this pilot study, we
income. Our third explanation style (FFICF) is only included the ADS settings (right branch
then—in conjunction with (F) and (FFI)—the in Figure 1) and limited the conditions to
provision of three counterfactual scenarios (F), (FFI), and (FFICF). The study participants
where one actionable feature each is (mini- were randomly assigned to one of the three
mally) altered such that our model predicts conditions, and each participant was pro-
a loan approval instead of a rejection. The vided with two consecutive questionnaires
last explanation style is (CF), without ad- associated with two different settings—one
ditionally providing features or feature im- male and one female applicant. Participants
portances. This condition aims at testing for this online study were recruited from all
the effectiveness of counterfactual explana- over the world via Prolific4 [26] and asked
tions in isolation, as opposed to providing to rate their agreement with multiple state-
them in conjunction with other explanation
4 https://www.prolific.co/
Table 2
Pearson correlations between constructs for pilot study.

Construct 1 Construct 2 Pearson’s 𝒓
Procedural Fairness Informational Fairness 0.47
Procedural Fairness Trustworthiness 0.78
Procedural Fairness Understandability 0.23
Informational Fairness Trustworthiness 0.72
Informational Fairness Understandability 0.69
Trustworthiness Understandability 0.41

ments on 5-point Likert scales, where a score derstandability. Overall, we found significant
of 1 corresponds to “strongly disagree”, and a correlations (𝑝 < 0.05) between all constructs
score of 5 denotes “strongly agree”. Addition- besides procedural fairness and understand-
ally, we included multiple open-ended ques- ability.
tions in the questionnaires to be able to carry
out a qualitative analysis as well. Insights regarding Q1 We conducted
multiple ANOVAs followed by Tukey’s tests
4.1. Quantitative Analysis for post-hoc analysis to examine the effects
of our three conditions. The individual scores
Constructs As mentioned earlier, we mea- for each construct and condition are provided
sured four different constructs: understand- in Table 3. We found a significant effect be-
ability (of the given explanations), procedu- tween different conditions on fairness per-
ral fairness [24], informational fairness [24], ceptions for procedural fairness (𝐹 (2, 55) =
and trustworthiness (of the decision maker); 3.56, 𝑝 = 0.035) as well as for informational
see Appendix A for the associated measure- fairness (𝐹 (2, 55) = 10.90, 𝑝 < 0.001). Tukey’s
ment items. Note that study participants re- test for post-hoc analysis showed that the ef-
sponded to the same (multiple) measurement fect for procedural fairness was only signif-
items per construct, and these measurements icant between the conditions (F) and (FFICF)
were ultimately averaged to obtain one score (𝑝 = 0.040). When controlling for different
per construct. We evaluated the reliability variables, such as study participants’ gender,
of the constructs through Cronbach’s alpha— the effect for procedural fairness is reduced
all values were larger than 0.8 thus showing to marginal significance (𝑝 > 0.05). For in-
good reliability for all constructs [27]. We formational fairness the effect in the post-hoc
proceeded to measure correlations between analysis without control variables is signifi-
the four constructs with Pearson’s 𝑟 to obtain cant between (F) and (FFICF) (𝑝 < 0.001) as
an overview of the relationships between our well as between (FFI) and (FFICF) (𝑝 = 0.042),
constructs. Table 2 provides an overview of and it is marginally significant between (F)
these relationships: Procedural fairness and and (FFI) (𝑝 = 0.072). Controlling for study
informational fairness are each strongly cor- participants’ gender reduces the significance
related with trustworthiness, and informa- between (FFI) and (FFICF) to marginal signif-
tional fairness is strongly correlated with un- icance (𝑝 = 0.059); controlling for study par-
Table 3
Construct scores by condition for pilot study. The scores, ranging from 1 (low) to 5 (high), were ob-
tained by averaging across all measurement items for each construct.

Construct (F) (FFI) (FFICF)
Understandability 3.17 3.87 4.12
Procedural Fairness 3.28 3.40 3.91
Informational Fairness 2.79 3.33 3.92
Trustworthiness 2.92 3.39 3.83

ticipants’ age removes the significance be- experience in this field.
tween these two conditions altogether.
Interestingly, significant effects on under- 4.2. Qualitative Analysis
standability between conditions (𝐹 (2, 55) =
7.52, 𝑝 = 0.001) came from (F) and (FFICF) In the following, we provide a summary of
(𝑝 = 0.001) as well as (F) and (FFI) (𝑝 = 0.020). insightful responses to open-ended questions
Significant effects of the conditions on trust- from our questionnaires.
worthiness (𝐹 (2, 55) = 4.94, 𝑝 = 0.011) could
only be observed between (F) and (FFICF) Regarding automated decision making
(𝑝 = 0.007). In general, we urge to exercise Perhaps surprisingly, many participants ap-
utmost caution when interpreting the quanti- proved of the ADS as the decision maker.
tative results of our pilot study as the sample They perceived the decision to be less biased
size is extremely small. We hope to gener- and argued that all applicants are treated
ate more reliable and extensive insights with equally, because the ADS makes its choices
our main study and a much larger number of based on facts, not based on the likeabil-
participants. ity of a person: “I think that an automated
system treats every individual fairly because
Insights regarding Q2 We calculated everybody is judged according to the same
Pearson’s 𝑟 between each of our fair- rules.” Some participants directly compared
ness measures including trustworthiness the ADS to human decision makers: “I think
and the study participants’ AI literacy. that [the decision making procedures] are fair
All three measures, procedural fairness because they are objective, since they are au-
(𝑟 = 0.35, 𝑝 = 0.006), informational fairness tomated. Humans usually [can’t] make de-
(𝑟 = 0.52, 𝑝 < 0.001) and trustworthiness cisions without bias.” Other participants re-
(𝑟 = 0.48, 𝑝 < 0.001) demonstrate a signif- sponded with a (somewhat expected) disap-
icant positive correlation with AI literacy. proval towards the ADS. Participants criti-
Therefore, within the scope of our pilot cized, for instance, that the decisions “are
study, we found that participants with more missing humanity in them” and how an auto-
knowledge and experience in the field of AI mated decision based “only on statistics with-
tend to perceive the decision making process out human morality and ethics” simply can-
and the provided explanations of the ADS not be fair. One participant went so far as
at hand to be fairer and more trustworthy to formulate positive arguments for human
than participants with less knowledge and bias in decision making procedures: “I do not
believe that it is fair to assess anything that 5. Outlook
greatly affects an individual’s life or [liveli-
hood] through an automated decision system. The potential of automated decision making
I believe some bias and personal opinion is of- and its benefits over purely human-made de-
ten necessary to uphold ethical and moral stan- cisions are obvious. However, several in-
dards.” Finally, some participants had mixed stances are known where such automated
feelings because they saw the trade-off be- decision systems (ADS) are having undesir-
tween a “cold approach” that lacks empathy able effects—especially with respect to fair-
and a solution that promotes “equality with ness and transparency. With this work, we
others” because it “eliminates personal bias”. aim to contribute novel insights to better un-
derstand people’s perceptions of fairness and
Regarding explanations Study partici- trustworthiness towards ADS, based on the
pants had strong opinions on the features provision of varying degrees of information
considered in the loan decision. Most partic- about such systems and their underlying pro-
ipants found gender to be the most inappro- cesses. Moreover, we examine how these
priate feature. The comments on this feature perceptions are influenced by people’s back-
ranged from “I think the gender of the per- ground and experience in the field of arti-
son shouldn’t matter” to considering gender ficial intelligence. As a first step, we have
as a factor being “ethically wrong” or even conducted an online pilot study and obtained
“borderline illegal”. Education and property preliminary results for a subset of conditions.
area were named by many participants as be- Next, we will initiate our main study with
ing inappropriate factors as well: “I think ed- a larger sample size and additional analyses.
ucation, gender, property area [. . . ] are in- For instance, we will also explore whether
appropriate factors and should not be consid- people’s perceptions of fairness and trust-
ered in the decision making process.” On av- worthiness change when the decision maker
erage, the order of feature importance was is claimed to be human (as opposed to purely
rated as equally appropriate as the features automated). We hope that our contribution
themselves. Some participants assessed the will ultimately help in designing more equi-
order of feature importance in general and table decision systems as well as stimulate fu-
came to the conclusion that it is appropri- ture research on this important topic.
ate: “The most important is credit history in
this decision and least gender so the order is References
appropriate.” At the same time, a few partic-
ipants rated the order of feature importance [1] N. R. Kuncel, D. S. Ones, D. M. Klieger,
as inappropriate, for instance because “some In hiring, algorithms beat instinct, Har-
things are irrelevant yet score higher than loan vard Business Review (2014). URL: http
term.” In the first of two settings, the coun- s://hbr.org/2014/05/in-hiring-algorith
terfactual for property area was received neg- ms-beat-instinct.
atively by some: “It shouldn’t matter where [2] S. Townson, AI can make bank loans
the property is located.” Yet, most participants more fair, Harvard Business Review
found the counterfactual explanations in the (2020). URL: https://hbr.org/2020/11/
second setting to be appropriate: “The three ai-can-make-bank-loans-more-fair.
scenarios represent plausible changes the indi- [3] A. Satariano, British grading debacle
vidual could perform [. . . ]” shows pitfalls of automating govern-
ment, The New York Times (2020). URL: Conference on Human Factors in Com-
https://www.nytimes.com/2020/08/20 puting Systems, 2018, pp. 1–14.
/world/europe/uk-england-grading-alg [11] J. Dodge, Q. V. Liao, Y. Zhang, R. K.
orithm.html. Bellamy, C. Dugan, Explaining mod-
[4] W. D. Heaven, Predictive policing al- els: An empirical study of how explana-
gorithms are racist. They need to be tions impact fairness judgment, in: Pro-
dismantled, MIT Technology Review ceedings of the 24th International Con-
(2020). URL: https://www.technology ference on Intelligent User Interfaces,
review.com/2020/07/17/1005396/predic 2019, pp. 275–285.
tive-policing-algorithms-racist-disman [12] M. K. Lee, Understanding perception
tled-machine-learning-bias-criminal- of algorithmic decisions: Fairness, trust,
justice/. and emotion in response to algorith-
[5] J. G. Harris, T. H. Davenport, Auto- mic management, Big Data & Society
mated decision making comes of age, 5 (2018) 1–16.
MIT Sloan Management Review (2005). [13] M. K. Lee, S. Baykal, Algorithmic medi-
URL: https://sloanreview.mit.edu/articl ation in group decisions: Fairness per-
e/automated-decision-making-comes- ceptions of algorithmically mediated vs.
of-age/. discussion-based social division, in:
[6] S. Barocas, M. Hardt, A. Narayanan, Proceedings of the 2017 ACM Confer-
Fairness and machine learning, 2019. ence on Computer-Supported Coopera-
URL: http://www.fairmlbook.org. tive Work and Social Computing, 2017,
[7] J. Buolamwini, T. Gebru, Gender pp. 1035–1048.
shades: Intersectional accuracy dispari- [14] R. Wang, F. M. Harper, H. Zhu, Fac-
ties in commercial gender classification, tors influencing perceived fairness in
in: Conference on Fairness, Account- algorithmic decision-making: Algo-
ability and Transparency, 2018, pp. 77– rithm outcomes, development proce-
91. dures, and individual differences, in:
[8] J. Angwin, J. Larson, S. Mattu, L. Kirch- Proceedings of the 2020 CHI Confer-
ner, Machine bias, ProPublica (2016). ence on Human Factors in Computing
URL: https://www.propublica.org/artic Systems, 2020, pp. 1–14.
le/machine-bias-risk-assessments-in- [15] R. Goebel, A. Chander, K. Holzinger,
criminal-sentencing. F. Lecue, Z. Akata, S. Stumpf, P. Kiese-
[9] M. Srivastava, H. Heidari, A. Krause, berg, A. Holzinger, Explainable AI:
Mathematical notions vs. human per- The new 42?, in: International Cross-
ception of fairness: A descriptive ap- Domain Conference for Machine Learn-
proach to fairness for machine learn- ing and Knowledge Extraction, 2018,
ing, in: Proceedings of the 25th ACM pp. 295–303.
SIGKDD International Conference on [16] M. Eslami, K. Vaccaro, M. K. Lee,
Knowledge Discovery & Data Mining, A. Elazari Bar On, E. Gilbert, K. Kara-
2019, pp. 2459–2468. halios, User attitudes towards algorith-
[10] R. Binns, M. Van Kleek, M. Veale, mic opacity and transparency in online
U. Lyngs, J. Zhao, N. Shadbolt, ‘It’s re- reviewing platforms, in: Proceedings
ducing a human being to a percentage’; of the 2019 CHI Conference on Human
perceptions of justice in algorithmic de- Factors in Computing Systems, 2019,
cisions, in: Proceedings of the 2018 CHI pp. 1–14.
[17] A. Bussone, S. Stumpf, D. O’Sullivan, Learning 45 (2001) 5–32.
The role of explanations on trust and re- [26] S. Palan, C. Schitter, Prolific.ac—a sub-
liance in clinical decision support sys- ject pool for online experiments, Jour-
tems, in: IEEE International Confer- nal of Behavioral and Experimental Fi-
ence on Healthcare Informatics, 2015, nance 17 (2018) 22–27.
pp. 160–169. [27] J. M. Cortina, What is coefficient alpha?
[18] T. Kulesza, S. Stumpf, M. Burnett, An examination of theory and applica-
S. Yang, I. Kwan, W.-K. Wong, Too tions, Journal of Applied Psychology 78
much, too little, or just right? Ways (1993) 98–104.
explanations impact end users’ mental [28] V. McKinney, K. Yoon, F. M. Zahedi, The
models, in: 2013 IEEE Symposium on measurement of web-customer satisfac-
Visual Languages and Human-Centric tion: An expectation and disconfirma-
Computing, 2013, pp. 3–10. tion approach, Information Systems Re-
[19] C. Molnar, Interpretable machine learn- search 13 (2002) 296–315.
ing, 2020. URL: https://christophm.git [29] J. A. Colquitt, J. B. Rodell, Measuring
hub.io/interpretable-ml-book/. justice and fairness, in: R. S. Cropan-
[20] A. Adadi, M. Berrada, Peeking inside zano, M. L. Ambrose (Eds.), The Oxford
the black-box: A survey on explainable Handbook of Justice in the Workplace,
artificial intelligence (XAI), IEEE Ac- Oxford University Press, 2015, pp. 187–
cess 6 (2018) 52138–52160. 202.
[21] M. K. Lee, A. Jain, H. J. Cha, S. Ojha, [30] C.-M. Chiu, H.-Y. Lin, S.-Y. Sun, M.-H.
D. Kusbit, Procedural justice in al- Hsu, Understanding customers’ loyalty
gorithmic fairness: Leveraging trans- intentions towards online shopping: An
parency and outcome control for fair al- integration of technology acceptance
gorithmic mediation, Proceedings of model and fairness theory, Behaviour &
the ACM on Human-Computer Interac- Information Technology 28 (2009) 347–
tion 3 (2019) 1–26. 360.
[22] R. F. Kizilcec, How much information? [31] L. Carter, F. Bélanger, The utilization
Effects of transparency on trust in an of e-government services: Citizen trust,
algorithmic interface, in: Proceedings innovation and acceptance factors, In-
of the 2016 CHI Conference on Human formation Systems Journal 15 (2005) 5–
Factors in Computing Systems, 2016, 25.
pp. 2390–2395. [32] A. Wilkinson, J. Roberts, A. E. While,
[23] D. Long, B. Magerko, What is AI lit- Construction of an instrument to mea-
eracy? Competencies and design con- sure student information and communi-
siderations, in: Proceedings of the 2020 cation technology skills, experience and
CHI Conference on Human Factors in attitudes to e-learning, Computers in
Computing Systems, 2020, pp. 1–16. Human Behavior 26 (2010) 1369–1376.
[24] J. A. Colquitt, D. E. Conlon, M. J. Wes-
son, C. O. Porter, K. Y. Ng, Justice
at the millennium: A meta-analytic re-
view of 25 years of organizational jus-
tice research, Journal of Applied Psy-
chology 86 (2001) 425–445.
[25] L. Breiman, Random forests, Machine
A. Constructs and Items for Automated Decisions
All items within the following constructs were measured on a 5-point Likert scale and mostly
drawn (and adapted) from previous studies.
1. Understandability
Please rate your agreement with the following statements:
• The explanations provided by the automated decision system are clear in mean-
ing. [28]
• The explanations provided by the automated decision system are easy to com-
prehend. [28]
• In general, the explanations provided by the automated decision system are un-
derstandable for me. [28]
2. Procedural Fairness
The statements below refer to the procedures the automated decision system uses to
make decisions about loan applications. Please rate your agreement with the following
statements:
• Those procedures are free of bias. [29]
• Those procedures uphold ethical and moral standards. [29]
• Those procedures are fair.
• Those procedures ensure that decisions are based on facts, not personal biases
and opinions. [29]
• Overall, the applying individual is treated fairly by the automated decision sys-
tem. [29]
3. Informational Fairness
The statements below refer to the explanations the automated decision system offers
with respect to the decision-making procedures. Please rate your agreement with the
following statements:
• The automated decision system explains decision-making procedures thor-
oughly. [29]
• The automated decision system’s explanations regarding procedures are reason-
able. [29]
• The automated decision system tailors communications to meet the applying in-
dividual’s needs. [29]
• I understand the process by which the decision was made. [10]
• I received sufficient information to judge whether the decision-making proce-
dures are fair or unfair.
4. Trustworthiness
The statements below refer to the automated decision system. Please rate your agree-
ment with the following statements:
• Given the provided explanations, I trust that the automated decision system
makes good-quality decisions. [12]
• Based on my understanding of the decision-making procedures, I know the au-
tomated decision system is not opportunistic. [30]
• Based on my understanding of the decision-making procedures, I know the au-
tomated decision system is trustworthy. [30]
• I think I can trust the automated decision system. [31]
• The automated decision system can be trusted to carry out the loan application
decision faithfully. [31]
• In my opinion, the automated decision system is trustworthy. [31]
5. AI Literacy
• How would you describe your knowledge in the field of artificial intelligence?
• Does your current employment include working with artificial intelligence?
Please rate your agreement with the following statements:
• I am confident interacting with artificial intelligence. [32]
• I understand what the term artificial intelligence means.

B. Explanation Styles for Automated Decisions and One
Exemplary Setting (Male Applicant)
Explanation Style (F)

A finance company offers loans on real estate in urban, semi-urban and ru-
ral areas. A potential customer first applies online for a specific loan, and
afterwards the company assesses the customer’s eligibility for that loan.
An individual applied online for a loan at this company. The company denied
the loan application. The decision to deny the loan was made by an automated
decision system and communicated to the applying individual electronically
and in a timely fashion.

The automated decision system explains that the following factors (in alphabetical
order) on the individual were taken into account when making the loan application
decision:
• Applicant Income: $3,069 per month
• Co-Applicant Income: $0 per month
• Credit History: Good
• Dependents: 0

• Education: Graduate
• Gender: Male
• Loan Amount: $71,000

• Loan Amount Term: 480 months
• Married: No
• Property Area: Urban
• Self-Employed: No

Explanation Style (FFI)

• . . . that different factors are of different importance in the decision. The fol-
lowing list shows the order of factor importance, from most important to least
important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant
Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount
Term ≻ Self-Employed ≻ Gender

Explanation Style (FFICF)

The automated decision system explains . . .
• . . . that the following factors (in alphabetical order) on the individual were taken
into account when making the loan application decision:
– Applicant Income: $3,069 per month
– Co-Applicant Income: $0 per month
– Credit History: Good
– Dependents: 0
– Education: Graduate
– Gender: Male
– Loan Amount: $71,000
– Loan Amount Term: 480 months
– Married: No
– Property Area: Urban
– Self-Employed: No
• . . . that different factors are of different importance in the decision. The fol-
lowing list shows the order of factor importance, from most important to least
important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant
Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount
Term ≻ Self-Employed ≻ Gender
• . . . that the individual would have been granted the loan if—everything else
unchanged—one of the following hypothetical scenarios had been true:
– The Co-Applicant Income had been at least $800 per month
– The Loan Amount Term had been 408 months or less
– The Property Area had been Rural