=Paper= {{Paper |id=Vol-2903/IUI21WS-TExSS-12 |storemode=property |title=A Study on Fairness and Trust Perceptions in Automated Decision Making |pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-TExSS-12.pdf |volume=Vol-2903 |authors=Jakob Schoeffer,Yvette Machowski,Niklas Kuehl |dblpUrl=https://dblp.org/rec/conf/iui/SchofferMK21 }} ==A Study on Fairness and Trust Perceptions in Automated Decision Making== https://ceur-ws.org/Vol-2903/IUI21WS-TExSS-12.pdf
A Study on Fairness and Trust Perceptions in
Automated Decision Making
Jakob Schoeffera , Yvette Machowskia and Niklas Kuehla
a Karlsruhe Institute of Technology (KIT), Germany



                                       Abstract
                                       Automated decision systems are increasingly used for consequential decision making—for a variety of
                                       reasons. These systems often rely on sophisticated yet opaque models, which do not (or hardly) allow
                                       for understanding how or why a given decision was arrived at. This is not only problematic from a legal
                                       perspective, but non-transparent systems are also prone to yield undesirable (e.g., unfair) outcomes
                                       because their sanity is difficult to assess and calibrate in the first place. In this work, we conduct a
                                       study to evaluate different attempts of explaining such systems with respect to their effect on people’s
                                       perceptions of fairness and trustworthiness towards the underlying mechanisms. A pilot study revealed
                                       surprising qualitative insights as well as preliminary significant effects, which will have to be verified,
                                       extended and thoroughly discussed in the larger main study.

                                       Keywords
                                       Automated Decision Making, Fairness, Trust, Transparency, Explanation, Machine Learning


1. Introduction                                                                                   [1, 5]. One widespread assumption is that
                                                                                                  ADS can also avoid human biases in the deci-
Automated decision making has become                                                              sion making process [1]. However, ADS are
ubiquitous in many domains such as hir-                                                           typically based on artificial intelligence (AI)
ing [1], bank lending [2], grading [3], and                                                       techniques, which, in turn, generally rely on
policing [4], among others. As automated                                                          historical data. If, for instance, this underly-
decision systems (ADS) are used to inform                                                         ing data is biased (e.g., because certain socio-
increasingly high-stakes consequential deci-                                                      demographic groups were favored in a dis-
sions, understanding their inner workings is                                                      proportional way in the past), an ADS may
of utmost importance—and undesirable be-                                                          pick up and perpetuate existing patterns of
havior becomes a problem of societal rele-                                                        unfairness [6]. Two prominent examples of
vance. The underlying motives of adopting                                                         such behavior from the recent past are the
ADS are manifold: They range from cost-                                                           discrimination of black people in the realm
cutting to improving performance and en-                                                          of facial recognition [7] and recidivism pre-
abling more robust and objective decisions                                                        diction [8]. These and other cases have put
                                                                                                  ADS under enhanced scrutiny, jeopardizing
Joint Proceedings of the ACM IUI 2021 Workshops, April                                            trust in these systems.
13–17, 2021, College Station, USA
" jakob.schoeffer@kit.edu (J. Schoeffer);
                                                                                                     In recent years, a significant body of re-
yvette.machowski@alumni.kit.edu (Y. Machowski);                                                   search has been devoted to detecting and
niklas.kuehl@kit.edu (N. Kuehl)                                                                   mitigating unfairness in automated decision
 0000-0003-3705-7126 (J. Schoeffer);                                                             making [6]. Yet, most of this work has fo-
0000-0002-9271-6342 (Y. Machowski);
0000-0001-6750-0876 (N. Kuehl)
                                                                                                  cused on formalizing the concept of fairness
                                    © 2021 Copyright for this paper by its authors. Use permit-
                                    ted under Creative Commons License Attribution 4.0 Inter-
                                                                                                  and enforcing certain statistical equity con-
                                    national (CC BY 4.0).                                         straints, often without explicitly taking into
 CEUR
               http://ceur-ws.org
                                    CEUR   Workshop                        Proceedings
                                    (CEUR-WS.org)
 Workshop      ISSN 1613-0073
 Proceedings
account the perspective of individuals af-                  2. Background and Related
fected by such automated decisions. In addi-
tion to how researchers may define and en-
                                                               Work
force fairness in technical terms, we argue                 It is widely understood that AI-based tech-
that it is vital to understand people’s percep-             nology can have undesirable effects on hu-
tions of fairness—vital not only from an ethi-              mans. As a result, topics of fairness, ac-
cal standpoint but also with respect to facili-             countability and transparency have become
tating trust in and adoption of (appropriately              important areas of research in the fields of
deployed) socio-technical systems like ADS.                 AI and human-computer interaction (HCI),
Srivastava et al. [9], too, emphasize the need              among others. In this section, we provide an
for research to gain a deeper understanding                 overview of relevant literature and highlight
of people’s attitudes towards fairness in ADS.              our contributions.
   A separate, yet very related, issue re-
volves around how to explain automated de-
                                                            Explainable AI Despite being a popular
cisions and the underlying processes to af-
                                                            topic of current research, explainable AI
fected individuals so as to enable them to
                                                            (XAI) is a natural consequence of design-
appropriately assess the quality and origins
                                                            ing ADS and, as such, has been around at
of such decisions. Srivastava et al. [9] also
                                                            least since the 1980s [15]. Its importance,
point out that subjects should be presented
                                                            however, keeps rising as increasingly so-
with more information about the workings
                                                            phisticated (and opaque) AI techniques are
of an algorithm and that research should
                                                            used to inform evermore consequential deci-
evaluate how this additional information in-
                                                            sions. XAI is not only required by law (e.g.,
fluences people’s fairness perceptions. In
                                                            GDPR, ECOA2 ); Eslami et al. [16], for in-
fact, the EU General Data Protection Regu-
                                                            stance, have shown that users’ attitudes to-
lation (GDPR)1 , for instance, requires to dis-
                                                            wards algorithms change when transparency
close “the existence of automated decision-
                                                            is increased. When sufficient information
making, including [. . . ] meaningful informa-
                                                            is not presented, users sometimes rely too
tion about the logic involved [. . . ]” to the
                                                            heavily on system suggestions [17]. Yet, both
“data subject”. Beyond that, however, such
                                                            quantity and quality of explanations mat-
regulations remain often vague and little ac-
                                                            ter: Kulesza et al. [18] explore the effects
tionable. To that end, we conduct a study to
                                                            of soundness and completeness of explana-
examine in more depth the effect of different
                                                            tions on end users’ mental models and sug-
explanations on people’s perceptions of fair-
                                                            gest, among others, that oversimplification is
ness and trustworthiness towards the under-
                                                            problematic. We refer to [15, 19, 20] for more
lying ADS in the context of lending, with a
                                                            in-depth literature on the topic of XAI.
focus on
     • the amount of information provided,                  Perceptions of fairness and trustworthi-
                                                            ness A relatively new line of research in
     • the background and experience of peo-                AI and HCI has started focusing on percep-
       ple,                                                 tions of fairness and trustworthiness in auto-
     • the nature of the decision maker (hu-                mated decision making. For instance, Binns
       man vs. automated).                                      2 Equal Credit Opportunity Act: https://www.cons
    1 https://eur-lex.europa.eu/eli/reg/2016/679/oj (last   umer.ftc.gov/articles/0347-your-equal-credit-opportu
accessed Jan 3, 2021)                                       nity-rights (last accessed Jan 3, 2021)
Table 1
Overview of related work.
                                    Amount of
                                                                     Computer / AI       Human involvement
                  Explanation        provided    Understandability
 Reference                                                            experience            in context
                styles provided    information       tested
                                                                       evaluated            considered
                                    evaluated
 Binns et al.
                    distinct           no         single question           no                    no
    [10]
   Dodge
                    distinct           no         not mentioned             no                    no
  et al. [11]
                                                                       knowledge of         individual in
   Lee [12]         distinct           no               no
                                                                        algorithms       management context
                                                                      programming /
  Lee and       n/a due to study                                                         group decision in fair
                                       no               no              algorithm
 Baykal [13]         setup                                                                  division context
                                                                        knowledge
                                                                                          algorithmic decision,
 Wang et al.
                    distinct          partly            no           computer literacy    reviewed by group in
   [14]
                                                                                         crowdsourcing context

                                                                                              individual in
                  distinct and                    construct with
  Our work                             yes                              AI literacy        provider-customer
                   combined                       multiple items
                                                                                                context



et al. [10] and Dodge et al. [11] compare fair-       Our contribution We aim to complement
ness perceptions in ADS for four distinct ex-         existing work to better understand how much
planation styles. Lee [12] compares percep-           of which information of an ADS should be
tions of fairness and trustworthiness depend-         provided to whom so that people are opti-
ing on whether the decision maker is a per-           mally enabled to understand the inner work-
son or an algorithm in the context of manage-         ings and appropriately assess the quality
rial decisions. Lee and Baykal [13] explore           (e.g., fairness) and origins of such decisions.
how algorithmic decisions are perceived in            Specifically, our goal is to add novel in-
comparison to group-made decisions. Wang              sights in the following ways: First, our ap-
et al. [14] combine a number of manipula-             proach combines multiple explanation styles
tions, such as favorable and unfavorable out-         in one condition, thereby disclosing varying
comes, to gain an overview of fairness per-           amounts of information. This differentiates
ceptions. An interesting finding by Lee et al.        our method from the concept of distinct indi-
[21] suggests that fairness perceptions de-           vidual explanations adopted by, for instance,
cline for some people when gaining an un-             Binns et al. [10]. We also evaluate the under-
derstanding of an algorithm if their personal         standability of explanations through multiple
fairness concepts differ from those of the al-        items; and we add a novel analysis of the ef-
gorithm. Regarding trustworthiness, Kizil-            fect of people’s AI literacy [23] on their per-
cec [22], for instance, concludes that it is im-      ceptions of fairness and trustworthiness. Fi-
portant to provide the right amount of trans-         nally, we investigate whether perceptions of
parency for optimal trust effects, as both too        fairness and trustworthiness differ between
much and too little transparency can have             having a human or an automated decision
undesirable effects.                                  maker, controlling for the provided explana-
                                                      tions. For brevity, we have summarized rel-
evant aspects where our work can comple-        list of all constructs and associated measure-
ment existing literature in Table 1.            ment items for the case of automated deci-
                                                sions. Note that for each construct we mea-
                                                sure multiple items.
3. Study Design and                                Our analyses are based on a publicly avail-
      Methodology                               able    dataset on home loan application de-
                                                cisions3 , which has been used in multiple
With our study, we aim to contribute novel Kaggle competitions. Note that compa-
insights towards answering the following rable data—reflecting a given finance com-
main questions:                                 pany’s individual circumstances and ap-
                                                proval criteria—might in practice be used
  Q1 Do people perceive a decision process
                                                to train ADS. The dataset at hand consists
       to be fairer and/or more trustworthy if
                                                of 614 labeled (loan Y/N) observations and
       more information about it is disclosed?
                                                includes the following features: applicant
  Q2 Does people’s experience / knowledge income, co-applicant income, credit history,
       in the field of AI have an impact on dependents, education, gender, loan amount,
       their perceptions of fairness and trust- loan amount term, marital status, property
       worthiness towards automated deci- area, self-employment. After removing data
       sion making?                             points with missing values, we are left with
                                                480 observations, 332 of which (69.2%) in-
  Q3 How do people perceive human ver-
                                                volve the positive label (Y) and 148 (30.8%)
       sus automated (consequential) deci-
                                                the negative label (N). We use 70% of the
       sion making with respect to fairness
                                                dataset for training purposes and the remain-
       and trustworthiness?
                                                ing 30% as a holdout set.
We choose to explore the aforementioned            As groundwork, after encoding and scal-
relationships in the context of lending—an ing the features, we trained a random for-
example of a provider-customer encounter. est classifier with bootstrapping to predict
Specifically, we confront study participants the held-out labels, which yields an out-of-
with situations where a person was denied bag accuracy estimate of 80.1%. Our first ex-
a loan. We choose a between-subjects design planation style, (F), consists of disclosing the
with the following conditions: First, we re- features including corresponding values for
veal that the loan decision was made by a an observation (i.e., an applicant) from the
human or an ADS (i.e., automated). Then holdout set whom our model denied the loan.
we provide one of four explanation styles We refer to such an observation as a setting.
to each study participant. Figure 1 contains In our study, we employ different settings in
an illustration of our study setup, the ele- order to ensure generalizability. Please re-
ments of which will be explained in more de- fer to Appendix B for an excerpt of question-
tail shortly. Eventually, we measure four dif- naires for one exemplary setting (male ap-
ferent constructs: understandability (of the plicant). Note that all explanations are de-
given explanations), procedural fairness [24], rived from the data—they are not concocted.
informational fairness [24], and trustworthi- Next, we computed permutation feature im-
ness (of the decision maker); and we com- portances [25] from our model and obtained
pare the results across conditions. Addition-
ally, we measure AI literacy of the study par-       3 https://www.kaggle.com/altruistdelhite04/loan-pr

ticipants. Please refer to Appendix A for a ediction-problem-dataset (last accessed Jan 3, 2021)
                                                       
                             Decision Maker: Human               Decision Maker: ADS

    Features                                            (F)                                            Features

    Features + Feature Importance                      (FFI)                      Feature Importance + Features

    Features + Feature Importance + Counterfactuals (FFICF) Counterfactuals + Feature Importance + Features

                                     Counterfactuals   (CF)     Counterfactuals

Figure 1: Graphical representation of our study setup. Thick lines indicate the subset of conditions
from our pilot study.



the following hierarchy, using “≻” as a short-                styles. We employ only model-agnostic ex-
hand for “is more important than”: credit                     planations [20] in a way that they could plau-
history ≻ loan amount ≻ applicant income ≻                    sibly be provided by both humans and ADS.
co-applicant income ≻ property area ≻ mar-
ital status ≻ dependents ≻ education ≻ loan
amount term ≻ self-employment ≻ gender. Re-                   4. Preliminary Analyses
vealing this ordered list of feature impor-                      and Findings
tances in conjunction with (F) makes up our
second explanation style (FFI). To construct                  Based on Section 3, we conducted an online
our third and fourth explanation styles, we                   pilot study with 58 participants to infer pre-
conducted an online survey with 20 quan-                      liminary insights regarding Q1 and Q2 and to
titative and qualitative researchers to ascer-                validate our study design. Among the partic-
tain which of the aforementioned features                     ipants were 69% males, 29% females, and one
are actionable—in a sense that people can                     person who did not disclose their gender; 53%
(hypothetically) act on them in order to in-                  were students, 28% employed full-time, 10%
crease their chances of being granted a loan.                 employed part-time, 3% self-employed, and
According to this survey, the top-5 actionable                5% unemployed. The average age was 25.1
features are: loan amount, loan amount term,                  years, and 31% of participants have applied
property area, applicant income, co-applicant                 for a loan before. For this pilot study, we
income. Our third explanation style (FFICF) is                only included the ADS settings (right branch
then—in conjunction with (F) and (FFI)—the                    in Figure 1) and limited the conditions to
provision of three counterfactual scenarios                   (F), (FFI), and (FFICF). The study participants
where one actionable feature each is (mini-                   were randomly assigned to one of the three
mally) altered such that our model predicts                   conditions, and each participant was pro-
a loan approval instead of a rejection. The                   vided with two consecutive questionnaires
last explanation style is (CF), without ad-                   associated with two different settings—one
ditionally providing features or feature im-                  male and one female applicant. Participants
portances. This condition aims at testing                     for this online study were recruited from all
the effectiveness of counterfactual explana-                  over the world via Prolific4 [26] and asked
tions in isolation, as opposed to providing                   to rate their agreement with multiple state-
them in conjunction with other explanation
                                                                 4 https://www.prolific.co/
Table 2
Pearson correlations between constructs for pilot study.

                  Construct 1               Construct 2              Pearson’s 𝒓
                  Procedural Fairness       Informational Fairness        0.47
                  Procedural Fairness       Trustworthiness               0.78
                  Procedural Fairness       Understandability             0.23
                  Informational Fairness    Trustworthiness               0.72
                  Informational Fairness    Understandability             0.69
                  Trustworthiness           Understandability             0.41


ments on 5-point Likert scales, where a score       derstandability. Overall, we found significant
of 1 corresponds to “strongly disagree”, and a      correlations (𝑝 < 0.05) between all constructs
score of 5 denotes “strongly agree”. Addition-      besides procedural fairness and understand-
ally, we included multiple open-ended ques-         ability.
tions in the questionnaires to be able to carry
out a qualitative analysis as well.                 Insights regarding Q1 We conducted
                                                    multiple ANOVAs followed by Tukey’s tests
4.1. Quantitative Analysis                          for post-hoc analysis to examine the effects
                                                    of our three conditions. The individual scores
Constructs As mentioned earlier, we mea-            for each construct and condition are provided
sured four different constructs: understand-        in Table 3. We found a significant effect be-
ability (of the given explanations), procedu-       tween different conditions on fairness per-
ral fairness [24], informational fairness [24],     ceptions for procedural fairness (𝐹 (2, 55) =
and trustworthiness (of the decision maker);        3.56, 𝑝 = 0.035) as well as for informational
see Appendix A for the associated measure-          fairness (𝐹 (2, 55) = 10.90, 𝑝 < 0.001). Tukey’s
ment items. Note that study participants re-        test for post-hoc analysis showed that the ef-
sponded to the same (multiple) measurement          fect for procedural fairness was only signif-
items per construct, and these measurements         icant between the conditions (F) and (FFICF)
were ultimately averaged to obtain one score        (𝑝 = 0.040). When controlling for different
per construct. We evaluated the reliability         variables, such as study participants’ gender,
of the constructs through Cronbach’s alpha—         the effect for procedural fairness is reduced
all values were larger than 0.8 thus showing        to marginal significance (𝑝 > 0.05). For in-
good reliability for all constructs [27]. We        formational fairness the effect in the post-hoc
proceeded to measure correlations between           analysis without control variables is signifi-
the four constructs with Pearson’s 𝑟 to obtain      cant between (F) and (FFICF) (𝑝 < 0.001) as
an overview of the relationships between our        well as between (FFI) and (FFICF) (𝑝 = 0.042),
constructs. Table 2 provides an overview of         and it is marginally significant between (F)
these relationships: Procedural fairness and        and (FFI) (𝑝 = 0.072). Controlling for study
informational fairness are each strongly cor-       participants’ gender reduces the significance
related with trustworthiness, and informa-          between (FFI) and (FFICF) to marginal signif-
tional fairness is strongly correlated with un-     icance (𝑝 = 0.059); controlling for study par-
Table 3
Construct scores by condition for pilot study. The scores, ranging from 1 (low) to 5 (high), were ob-
tained by averaging across all measurement items for each construct.

                         Construct                  (F)    (FFI)   (FFICF)
                         Understandability          3.17   3.87      4.12
                         Procedural Fairness        3.28   3.40      3.91
                         Informational Fairness     2.79   3.33      3.92
                         Trustworthiness            2.92   3.39      3.83


ticipants’ age removes the significance be-         experience in this field.
tween these two conditions altogether.
   Interestingly, significant effects on under-     4.2. Qualitative Analysis
standability between conditions (𝐹 (2, 55) =
7.52, 𝑝 = 0.001) came from (F) and (FFICF)          In the following, we provide a summary of
(𝑝 = 0.001) as well as (F) and (FFI) (𝑝 = 0.020).   insightful responses to open-ended questions
Significant effects of the conditions on trust-     from our questionnaires.
worthiness (𝐹 (2, 55) = 4.94, 𝑝 = 0.011) could
only be observed between (F) and (FFICF)            Regarding automated decision making
(𝑝 = 0.007). In general, we urge to exercise        Perhaps surprisingly, many participants ap-
utmost caution when interpreting the quanti-        proved of the ADS as the decision maker.
tative results of our pilot study as the sample     They perceived the decision to be less biased
size is extremely small. We hope to gener-          and argued that all applicants are treated
ate more reliable and extensive insights with       equally, because the ADS makes its choices
our main study and a much larger number of          based on facts, not based on the likeabil-
participants.                                       ity of a person: “I think that an automated
                                                    system treats every individual fairly because
Insights regarding Q2 We calculated                 everybody is judged according to the same
Pearson’s 𝑟 between each of our fair-               rules.” Some participants directly compared
ness measures including trustworthiness             the ADS to human decision makers: “I think
and the study participants’ AI literacy.            that [the decision making procedures] are fair
All three measures, procedural fairness             because they are objective, since they are au-
(𝑟 = 0.35, 𝑝 = 0.006), informational fairness       tomated. Humans usually [can’t] make de-
(𝑟 = 0.52, 𝑝 < 0.001) and trustworthiness           cisions without bias.” Other participants re-
(𝑟 = 0.48, 𝑝 < 0.001) demonstrate a signif-         sponded with a (somewhat expected) disap-
icant positive correlation with AI literacy.        proval towards the ADS. Participants criti-
Therefore, within the scope of our pilot            cized, for instance, that the decisions “are
study, we found that participants with more         missing humanity in them” and how an auto-
knowledge and experience in the field of AI         mated decision based “only on statistics with-
tend to perceive the decision making process        out human morality and ethics” simply can-
and the provided explanations of the ADS            not be fair. One participant went so far as
at hand to be fairer and more trustworthy           to formulate positive arguments for human
than participants with less knowledge and           bias in decision making procedures: “I do not
believe that it is fair to assess anything that    5. Outlook
greatly affects an individual’s life or [liveli-
hood] through an automated decision system.        The potential of automated decision making
I believe some bias and personal opinion is of-    and its benefits over purely human-made de-
ten necessary to uphold ethical and moral stan-    cisions are obvious. However, several in-
dards.” Finally, some participants had mixed       stances are known where such automated
feelings because they saw the trade-off be-        decision systems (ADS) are having undesir-
tween a “cold approach” that lacks empathy         able effects—especially with respect to fair-
and a solution that promotes “equality with        ness and transparency. With this work, we
others” because it “eliminates personal bias”.     aim to contribute novel insights to better un-
                                                   derstand people’s perceptions of fairness and
Regarding explanations Study partici-              trustworthiness towards ADS, based on the
pants had strong opinions on the features          provision of varying degrees of information
considered in the loan decision. Most partic-      about such systems and their underlying pro-
ipants found gender to be the most inappro-        cesses. Moreover, we examine how these
priate feature. The comments on this feature       perceptions are influenced by people’s back-
ranged from “I think the gender of the per-        ground and experience in the field of arti-
son shouldn’t matter” to considering gender        ficial intelligence. As a first step, we have
as a factor being “ethically wrong” or even        conducted an online pilot study and obtained
“borderline illegal”. Education and property       preliminary results for a subset of conditions.
area were named by many participants as be-        Next, we will initiate our main study with
ing inappropriate factors as well: “I think ed-    a larger sample size and additional analyses.
ucation, gender, property area [. . . ] are in-    For instance, we will also explore whether
appropriate factors and should not be consid-      people’s perceptions of fairness and trust-
ered in the decision making process.” On av-       worthiness change when the decision maker
erage, the order of feature importance was         is claimed to be human (as opposed to purely
rated as equally appropriate as the features       automated). We hope that our contribution
themselves. Some participants assessed the         will ultimately help in designing more equi-
order of feature importance in general and         table decision systems as well as stimulate fu-
came to the conclusion that it is appropri-        ture research on this important topic.
ate: “The most important is credit history in
this decision and least gender so the order is     References
appropriate.” At the same time, a few partic-
ipants rated the order of feature importance        [1] N. R. Kuncel, D. S. Ones, D. M. Klieger,
as inappropriate, for instance because “some            In hiring, algorithms beat instinct, Har-
things are irrelevant yet score higher than loan        vard Business Review (2014). URL: http
term.” In the first of two settings, the coun-          s://hbr.org/2014/05/in-hiring-algorith
terfactual for property area was received neg-          ms-beat-instinct.
atively by some: “It shouldn’t matter where         [2] S. Townson, AI can make bank loans
the property is located.” Yet, most participants        more fair, Harvard Business Review
found the counterfactual explanations in the            (2020). URL: https://hbr.org/2020/11/
second setting to be appropriate: “The three            ai-can-make-bank-loans-more-fair.
scenarios represent plausible changes the indi-     [3] A. Satariano, British grading debacle
vidual could perform [. . . ]”                          shows pitfalls of automating govern-
     ment, The New York Times (2020). URL:            Conference on Human Factors in Com-
     https://www.nytimes.com/2020/08/20               puting Systems, 2018, pp. 1–14.
     /world/europe/uk-england-grading-alg        [11] J. Dodge, Q. V. Liao, Y. Zhang, R. K.
     orithm.html.                                     Bellamy, C. Dugan, Explaining mod-
 [4] W. D. Heaven, Predictive policing al-            els: An empirical study of how explana-
     gorithms are racist. They need to be             tions impact fairness judgment, in: Pro-
     dismantled, MIT Technology Review                ceedings of the 24th International Con-
     (2020). URL: https://www.technology              ference on Intelligent User Interfaces,
     review.com/2020/07/17/1005396/predic             2019, pp. 275–285.
     tive-policing-algorithms-racist-disman      [12] M. K. Lee, Understanding perception
     tled-machine-learning-bias-criminal-             of algorithmic decisions: Fairness, trust,
     justice/.                                        and emotion in response to algorith-
 [5] J. G. Harris, T. H. Davenport, Auto-             mic management, Big Data & Society
     mated decision making comes of age,              5 (2018) 1–16.
     MIT Sloan Management Review (2005).         [13] M. K. Lee, S. Baykal, Algorithmic medi-
     URL: https://sloanreview.mit.edu/articl          ation in group decisions: Fairness per-
     e/automated-decision-making-comes-               ceptions of algorithmically mediated vs.
     of-age/.                                         discussion-based social division, in:
 [6] S. Barocas, M. Hardt, A. Narayanan,              Proceedings of the 2017 ACM Confer-
     Fairness and machine learning, 2019.             ence on Computer-Supported Coopera-
     URL: http://www.fairmlbook.org.                  tive Work and Social Computing, 2017,
 [7] J. Buolamwini, T. Gebru,          Gender         pp. 1035–1048.
     shades: Intersectional accuracy dispari-    [14] R. Wang, F. M. Harper, H. Zhu, Fac-
     ties in commercial gender classification,        tors influencing perceived fairness in
     in: Conference on Fairness, Account-             algorithmic decision-making: Algo-
     ability and Transparency, 2018, pp. 77–          rithm outcomes, development proce-
     91.                                              dures, and individual differences, in:
 [8] J. Angwin, J. Larson, S. Mattu, L. Kirch-        Proceedings of the 2020 CHI Confer-
     ner, Machine bias, ProPublica (2016).            ence on Human Factors in Computing
     URL: https://www.propublica.org/artic            Systems, 2020, pp. 1–14.
     le/machine-bias-risk-assessments-in-        [15] R. Goebel, A. Chander, K. Holzinger,
     criminal-sentencing.                             F. Lecue, Z. Akata, S. Stumpf, P. Kiese-
 [9] M. Srivastava, H. Heidari, A. Krause,            berg, A. Holzinger, Explainable AI:
     Mathematical notions vs. human per-              The new 42?, in: International Cross-
     ception of fairness: A descriptive ap-           Domain Conference for Machine Learn-
     proach to fairness for machine learn-            ing and Knowledge Extraction, 2018,
     ing, in: Proceedings of the 25th ACM             pp. 295–303.
     SIGKDD International Conference on          [16] M. Eslami, K. Vaccaro, M. K. Lee,
     Knowledge Discovery & Data Mining,               A. Elazari Bar On, E. Gilbert, K. Kara-
     2019, pp. 2459–2468.                             halios, User attitudes towards algorith-
[10] R. Binns, M. Van Kleek, M. Veale,                mic opacity and transparency in online
     U. Lyngs, J. Zhao, N. Shadbolt, ‘It’s re-        reviewing platforms, in: Proceedings
     ducing a human being to a percentage’;           of the 2019 CHI Conference on Human
     perceptions of justice in algorithmic de-        Factors in Computing Systems, 2019,
     cisions, in: Proceedings of the 2018 CHI         pp. 1–14.
[17] A. Bussone, S. Stumpf, D. O’Sullivan,            Learning 45 (2001) 5–32.
     The role of explanations on trust and re-   [26] S. Palan, C. Schitter, Prolific.ac—a sub-
     liance in clinical decision support sys-         ject pool for online experiments, Jour-
     tems, in: IEEE International Confer-             nal of Behavioral and Experimental Fi-
     ence on Healthcare Informatics, 2015,            nance 17 (2018) 22–27.
     pp. 160–169.                                [27] J. M. Cortina, What is coefficient alpha?
[18] T. Kulesza, S. Stumpf, M. Burnett,               An examination of theory and applica-
     S. Yang, I. Kwan, W.-K. Wong, Too                tions, Journal of Applied Psychology 78
     much, too little, or just right? Ways            (1993) 98–104.
     explanations impact end users’ mental       [28] V. McKinney, K. Yoon, F. M. Zahedi, The
     models, in: 2013 IEEE Symposium on               measurement of web-customer satisfac-
     Visual Languages and Human-Centric               tion: An expectation and disconfirma-
     Computing, 2013, pp. 3–10.                       tion approach, Information Systems Re-
[19] C. Molnar, Interpretable machine learn-          search 13 (2002) 296–315.
     ing, 2020. URL: https://christophm.git      [29] J. A. Colquitt, J. B. Rodell, Measuring
     hub.io/interpretable-ml-book/.                   justice and fairness, in: R. S. Cropan-
[20] A. Adadi, M. Berrada, Peeking inside             zano, M. L. Ambrose (Eds.), The Oxford
     the black-box: A survey on explainable           Handbook of Justice in the Workplace,
     artificial intelligence (XAI), IEEE Ac-          Oxford University Press, 2015, pp. 187–
     cess 6 (2018) 52138–52160.                       202.
[21] M. K. Lee, A. Jain, H. J. Cha, S. Ojha,     [30] C.-M. Chiu, H.-Y. Lin, S.-Y. Sun, M.-H.
     D. Kusbit, Procedural justice in al-             Hsu, Understanding customers’ loyalty
     gorithmic fairness: Leveraging trans-            intentions towards online shopping: An
     parency and outcome control for fair al-         integration of technology acceptance
     gorithmic mediation, Proceedings of              model and fairness theory, Behaviour &
     the ACM on Human-Computer Interac-               Information Technology 28 (2009) 347–
     tion 3 (2019) 1–26.                              360.
[22] R. F. Kizilcec, How much information?       [31] L. Carter, F. Bélanger, The utilization
     Effects of transparency on trust in an           of e-government services: Citizen trust,
     algorithmic interface, in: Proceedings           innovation and acceptance factors, In-
     of the 2016 CHI Conference on Human              formation Systems Journal 15 (2005) 5–
     Factors in Computing Systems, 2016,              25.
     pp. 2390–2395.                              [32] A. Wilkinson, J. Roberts, A. E. While,
[23] D. Long, B. Magerko, What is AI lit-             Construction of an instrument to mea-
     eracy? Competencies and design con-              sure student information and communi-
     siderations, in: Proceedings of the 2020         cation technology skills, experience and
     CHI Conference on Human Factors in               attitudes to e-learning, Computers in
     Computing Systems, 2020, pp. 1–16.               Human Behavior 26 (2010) 1369–1376.
[24] J. A. Colquitt, D. E. Conlon, M. J. Wes-
     son, C. O. Porter, K. Y. Ng, Justice
     at the millennium: A meta-analytic re-
     view of 25 years of organizational jus-
     tice research, Journal of Applied Psy-
     chology 86 (2001) 425–445.
[25] L. Breiman, Random forests, Machine
A. Constructs and Items for Automated Decisions
All items within the following constructs were measured on a 5-point Likert scale and mostly
drawn (and adapted) from previous studies.
   1. Understandability
      Please rate your agreement with the following statements:
         • The explanations provided by the automated decision system are clear in mean-
           ing. [28]
         • The explanations provided by the automated decision system are easy to com-
           prehend. [28]
         • In general, the explanations provided by the automated decision system are un-
           derstandable for me. [28]
   2. Procedural Fairness
      The statements below refer to the procedures the automated decision system uses to
      make decisions about loan applications. Please rate your agreement with the following
      statements:
         • Those procedures are free of bias. [29]
         • Those procedures uphold ethical and moral standards. [29]
         • Those procedures are fair.
         • Those procedures ensure that decisions are based on facts, not personal biases
           and opinions. [29]
         • Overall, the applying individual is treated fairly by the automated decision sys-
           tem. [29]
   3. Informational Fairness
      The statements below refer to the explanations the automated decision system offers
      with respect to the decision-making procedures. Please rate your agreement with the
      following statements:
         • The automated decision system explains decision-making procedures thor-
           oughly. [29]
         • The automated decision system’s explanations regarding procedures are reason-
           able. [29]
         • The automated decision system tailors communications to meet the applying in-
           dividual’s needs. [29]
         • I understand the process by which the decision was made. [10]
         • I received sufficient information to judge whether the decision-making proce-
           dures are fair or unfair.
   4. Trustworthiness
      The statements below refer to the automated decision system. Please rate your agree-
      ment with the following statements:
       • Given the provided explanations, I trust that the automated decision system
         makes good-quality decisions. [12]
       • Based on my understanding of the decision-making procedures, I know the au-
         tomated decision system is not opportunistic. [30]
       • Based on my understanding of the decision-making procedures, I know the au-
         tomated decision system is trustworthy. [30]
       • I think I can trust the automated decision system. [31]
       • The automated decision system can be trusted to carry out the loan application
         decision faithfully. [31]
       • In my opinion, the automated decision system is trustworthy. [31]
 5. AI Literacy
       • How would you describe your knowledge in the field of artificial intelligence?
       • Does your current employment include working with artificial intelligence?
    Please rate your agreement with the following statements:
       • I am confident interacting with artificial intelligence. [32]
       • I understand what the term artificial intelligence means.


B. Explanation Styles for Automated Decisions and One
   Exemplary Setting (Male Applicant)
  Explanation Style (F)


     A finance company offers loans on real estate in urban, semi-urban and ru-
     ral areas. A potential customer first applies online for a specific loan, and
     afterwards the company assesses the customer’s eligibility for that loan.
     An individual applied online for a loan at this company. The company denied
     the loan application. The decision to deny the loan was made by an automated
     decision system and communicated to the applying individual electronically
     and in a timely fashion.


  The automated decision system explains that the following factors (in alphabetical
  order) on the individual were taken into account when making the loan application
  decision:
     • Applicant Income: $3,069 per month
     • Co-Applicant Income: $0 per month
     • Credit History: Good
    • Dependents: 0

    • Education: Graduate
    • Gender: Male
    • Loan Amount: $71,000

    • Loan Amount Term: 480 months
    • Married: No
    • Property Area: Urban
    • Self-Employed: No

Explanation Style (FFI)


   A finance company offers loans on real estate in urban, semi-urban and ru-
   ral areas. A potential customer first applies online for a specific loan, and
   afterwards the company assesses the customer’s eligibility for that loan.
   An individual applied online for a loan at this company. The company denied
   the loan application. The decision to deny the loan was made by an automated
   decision system and communicated to the applying individual electronically
   and in a timely fashion.


The automated decision system explains . . .
    • . . . that the following factors (in alphabetical order) on the individual were taken
      into account when making the loan application decision:
         – Applicant Income: $3,069 per month
         – Co-Applicant Income: $0 per month
         – Credit History: Good
         – Dependents: 0
         – Education: Graduate
         – Gender: Male
         – Loan Amount: $71,000
         – Loan Amount Term: 480 months
         – Married: No
         – Property Area: Urban
         – Self-Employed: No

    • . . . that different factors are of different importance in the decision. The fol-
      lowing list shows the order of factor importance, from most important to least
      important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant
      Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount
      Term ≻ Self-Employed ≻ Gender

Explanation Style (FFICF)


   A finance company offers loans on real estate in urban, semi-urban and ru-
   ral areas. A potential customer first applies online for a specific loan, and
   afterwards the company assesses the customer’s eligibility for that loan.
   An individual applied online for a loan at this company. The company denied
   the loan application. The decision to deny the loan was made by an automated
   decision system and communicated to the applying individual electronically
   and in a timely fashion.


The automated decision system explains . . .
    • . . . that the following factors (in alphabetical order) on the individual were taken
      into account when making the loan application decision:
         – Applicant Income: $3,069 per month
         – Co-Applicant Income: $0 per month
         – Credit History: Good
         – Dependents: 0
         – Education: Graduate
         – Gender: Male
         – Loan Amount: $71,000
         – Loan Amount Term: 480 months
         – Married: No
         – Property Area: Urban
         – Self-Employed: No
    • . . . that different factors are of different importance in the decision. The fol-
      lowing list shows the order of factor importance, from most important to least
      important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant
      Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount
      Term ≻ Self-Employed ≻ Gender
• . . . that the individual would have been granted the loan if—everything else
  unchanged—one of the following hypothetical scenarios had been true:
    – The Co-Applicant Income had been at least $800 per month
    – The Loan Amount Term had been 408 months or less
    – The Property Area had been Rural