=Paper=
{{Paper
|id=Vol-2903/IUI21WS-TExSS-12
|storemode=property
|title=A Study on Fairness and Trust Perceptions in Automated Decision Making
|pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-TExSS-12.pdf
|volume=Vol-2903
|authors=Jakob Schoeffer,Yvette Machowski,Niklas Kuehl
|dblpUrl=https://dblp.org/rec/conf/iui/SchofferMK21
}}
==A Study on Fairness and Trust Perceptions in Automated Decision Making==
A Study on Fairness and Trust Perceptions in Automated Decision Making Jakob Schoeffera , Yvette Machowskia and Niklas Kuehla a Karlsruhe Institute of Technology (KIT), Germany Abstract Automated decision systems are increasingly used for consequential decision making—for a variety of reasons. These systems often rely on sophisticated yet opaque models, which do not (or hardly) allow for understanding how or why a given decision was arrived at. This is not only problematic from a legal perspective, but non-transparent systems are also prone to yield undesirable (e.g., unfair) outcomes because their sanity is difficult to assess and calibrate in the first place. In this work, we conduct a study to evaluate different attempts of explaining such systems with respect to their effect on people’s perceptions of fairness and trustworthiness towards the underlying mechanisms. A pilot study revealed surprising qualitative insights as well as preliminary significant effects, which will have to be verified, extended and thoroughly discussed in the larger main study. Keywords Automated Decision Making, Fairness, Trust, Transparency, Explanation, Machine Learning 1. Introduction [1, 5]. One widespread assumption is that ADS can also avoid human biases in the deci- Automated decision making has become sion making process [1]. However, ADS are ubiquitous in many domains such as hir- typically based on artificial intelligence (AI) ing [1], bank lending [2], grading [3], and techniques, which, in turn, generally rely on policing [4], among others. As automated historical data. If, for instance, this underly- decision systems (ADS) are used to inform ing data is biased (e.g., because certain socio- increasingly high-stakes consequential deci- demographic groups were favored in a dis- sions, understanding their inner workings is proportional way in the past), an ADS may of utmost importance—and undesirable be- pick up and perpetuate existing patterns of havior becomes a problem of societal rele- unfairness [6]. Two prominent examples of vance. The underlying motives of adopting such behavior from the recent past are the ADS are manifold: They range from cost- discrimination of black people in the realm cutting to improving performance and en- of facial recognition [7] and recidivism pre- abling more robust and objective decisions diction [8]. These and other cases have put ADS under enhanced scrutiny, jeopardizing Joint Proceedings of the ACM IUI 2021 Workshops, April trust in these systems. 13–17, 2021, College Station, USA " jakob.schoeffer@kit.edu (J. Schoeffer); In recent years, a significant body of re- yvette.machowski@alumni.kit.edu (Y. Machowski); search has been devoted to detecting and niklas.kuehl@kit.edu (N. Kuehl) mitigating unfairness in automated decision 0000-0003-3705-7126 (J. Schoeffer); making [6]. Yet, most of this work has fo- 0000-0002-9271-6342 (Y. Machowski); 0000-0001-6750-0876 (N. Kuehl) cused on formalizing the concept of fairness © 2021 Copyright for this paper by its authors. Use permit- ted under Creative Commons License Attribution 4.0 Inter- and enforcing certain statistical equity con- national (CC BY 4.0). straints, often without explicitly taking into CEUR http://ceur-ws.org CEUR Workshop Proceedings (CEUR-WS.org) Workshop ISSN 1613-0073 Proceedings account the perspective of individuals af- 2. Background and Related fected by such automated decisions. In addi- tion to how researchers may define and en- Work force fairness in technical terms, we argue It is widely understood that AI-based tech- that it is vital to understand people’s percep- nology can have undesirable effects on hu- tions of fairness—vital not only from an ethi- mans. As a result, topics of fairness, ac- cal standpoint but also with respect to facili- countability and transparency have become tating trust in and adoption of (appropriately important areas of research in the fields of deployed) socio-technical systems like ADS. AI and human-computer interaction (HCI), Srivastava et al. [9], too, emphasize the need among others. In this section, we provide an for research to gain a deeper understanding overview of relevant literature and highlight of people’s attitudes towards fairness in ADS. our contributions. A separate, yet very related, issue re- volves around how to explain automated de- Explainable AI Despite being a popular cisions and the underlying processes to af- topic of current research, explainable AI fected individuals so as to enable them to (XAI) is a natural consequence of design- appropriately assess the quality and origins ing ADS and, as such, has been around at of such decisions. Srivastava et al. [9] also least since the 1980s [15]. Its importance, point out that subjects should be presented however, keeps rising as increasingly so- with more information about the workings phisticated (and opaque) AI techniques are of an algorithm and that research should used to inform evermore consequential deci- evaluate how this additional information in- sions. XAI is not only required by law (e.g., fluences people’s fairness perceptions. In GDPR, ECOA2 ); Eslami et al. [16], for in- fact, the EU General Data Protection Regu- stance, have shown that users’ attitudes to- lation (GDPR)1 , for instance, requires to dis- wards algorithms change when transparency close “the existence of automated decision- is increased. When sufficient information making, including [. . . ] meaningful informa- is not presented, users sometimes rely too tion about the logic involved [. . . ]” to the heavily on system suggestions [17]. Yet, both “data subject”. Beyond that, however, such quantity and quality of explanations mat- regulations remain often vague and little ac- ter: Kulesza et al. [18] explore the effects tionable. To that end, we conduct a study to of soundness and completeness of explana- examine in more depth the effect of different tions on end users’ mental models and sug- explanations on people’s perceptions of fair- gest, among others, that oversimplification is ness and trustworthiness towards the under- problematic. We refer to [15, 19, 20] for more lying ADS in the context of lending, with a in-depth literature on the topic of XAI. focus on • the amount of information provided, Perceptions of fairness and trustworthi- ness A relatively new line of research in • the background and experience of peo- AI and HCI has started focusing on percep- ple, tions of fairness and trustworthiness in auto- • the nature of the decision maker (hu- mated decision making. For instance, Binns man vs. automated). 2 Equal Credit Opportunity Act: https://www.cons 1 https://eur-lex.europa.eu/eli/reg/2016/679/oj (last umer.ftc.gov/articles/0347-your-equal-credit-opportu accessed Jan 3, 2021) nity-rights (last accessed Jan 3, 2021) Table 1 Overview of related work. Amount of Computer / AI Human involvement Explanation provided Understandability Reference experience in context styles provided information tested evaluated considered evaluated Binns et al. distinct no single question no no [10] Dodge distinct no not mentioned no no et al. [11] knowledge of individual in Lee [12] distinct no no algorithms management context programming / Lee and n/a due to study group decision in fair no no algorithm Baykal [13] setup division context knowledge algorithmic decision, Wang et al. distinct partly no computer literacy reviewed by group in [14] crowdsourcing context individual in distinct and construct with Our work yes AI literacy provider-customer combined multiple items context et al. [10] and Dodge et al. [11] compare fair- Our contribution We aim to complement ness perceptions in ADS for four distinct ex- existing work to better understand how much planation styles. Lee [12] compares percep- of which information of an ADS should be tions of fairness and trustworthiness depend- provided to whom so that people are opti- ing on whether the decision maker is a per- mally enabled to understand the inner work- son or an algorithm in the context of manage- ings and appropriately assess the quality rial decisions. Lee and Baykal [13] explore (e.g., fairness) and origins of such decisions. how algorithmic decisions are perceived in Specifically, our goal is to add novel in- comparison to group-made decisions. Wang sights in the following ways: First, our ap- et al. [14] combine a number of manipula- proach combines multiple explanation styles tions, such as favorable and unfavorable out- in one condition, thereby disclosing varying comes, to gain an overview of fairness per- amounts of information. This differentiates ceptions. An interesting finding by Lee et al. our method from the concept of distinct indi- [21] suggests that fairness perceptions de- vidual explanations adopted by, for instance, cline for some people when gaining an un- Binns et al. [10]. We also evaluate the under- derstanding of an algorithm if their personal standability of explanations through multiple fairness concepts differ from those of the al- items; and we add a novel analysis of the ef- gorithm. Regarding trustworthiness, Kizil- fect of people’s AI literacy [23] on their per- cec [22], for instance, concludes that it is im- ceptions of fairness and trustworthiness. Fi- portant to provide the right amount of trans- nally, we investigate whether perceptions of parency for optimal trust effects, as both too fairness and trustworthiness differ between much and too little transparency can have having a human or an automated decision undesirable effects. maker, controlling for the provided explana- tions. For brevity, we have summarized rel- evant aspects where our work can comple- list of all constructs and associated measure- ment existing literature in Table 1. ment items for the case of automated deci- sions. Note that for each construct we mea- sure multiple items. 3. Study Design and Our analyses are based on a publicly avail- Methodology able dataset on home loan application de- cisions3 , which has been used in multiple With our study, we aim to contribute novel Kaggle competitions. Note that compa- insights towards answering the following rable data—reflecting a given finance com- main questions: pany’s individual circumstances and ap- proval criteria—might in practice be used Q1 Do people perceive a decision process to train ADS. The dataset at hand consists to be fairer and/or more trustworthy if of 614 labeled (loan Y/N) observations and more information about it is disclosed? includes the following features: applicant Q2 Does people’s experience / knowledge income, co-applicant income, credit history, in the field of AI have an impact on dependents, education, gender, loan amount, their perceptions of fairness and trust- loan amount term, marital status, property worthiness towards automated deci- area, self-employment. After removing data sion making? points with missing values, we are left with 480 observations, 332 of which (69.2%) in- Q3 How do people perceive human ver- volve the positive label (Y) and 148 (30.8%) sus automated (consequential) deci- the negative label (N). We use 70% of the sion making with respect to fairness dataset for training purposes and the remain- and trustworthiness? ing 30% as a holdout set. We choose to explore the aforementioned As groundwork, after encoding and scal- relationships in the context of lending—an ing the features, we trained a random for- example of a provider-customer encounter. est classifier with bootstrapping to predict Specifically, we confront study participants the held-out labels, which yields an out-of- with situations where a person was denied bag accuracy estimate of 80.1%. Our first ex- a loan. We choose a between-subjects design planation style, (F), consists of disclosing the with the following conditions: First, we re- features including corresponding values for veal that the loan decision was made by a an observation (i.e., an applicant) from the human or an ADS (i.e., automated). Then holdout set whom our model denied the loan. we provide one of four explanation styles We refer to such an observation as a setting. to each study participant. Figure 1 contains In our study, we employ different settings in an illustration of our study setup, the ele- order to ensure generalizability. Please re- ments of which will be explained in more de- fer to Appendix B for an excerpt of question- tail shortly. Eventually, we measure four dif- naires for one exemplary setting (male ap- ferent constructs: understandability (of the plicant). Note that all explanations are de- given explanations), procedural fairness [24], rived from the data—they are not concocted. informational fairness [24], and trustworthi- Next, we computed permutation feature im- ness (of the decision maker); and we com- portances [25] from our model and obtained pare the results across conditions. Addition- ally, we measure AI literacy of the study par- 3 https://www.kaggle.com/altruistdelhite04/loan-pr ticipants. Please refer to Appendix A for a ediction-problem-dataset (last accessed Jan 3, 2021) Decision Maker: Human Decision Maker: ADS Features (F) Features Features + Feature Importance (FFI) Feature Importance + Features Features + Feature Importance + Counterfactuals (FFICF) Counterfactuals + Feature Importance + Features Counterfactuals (CF) Counterfactuals Figure 1: Graphical representation of our study setup. Thick lines indicate the subset of conditions from our pilot study. the following hierarchy, using “≻” as a short- styles. We employ only model-agnostic ex- hand for “is more important than”: credit planations [20] in a way that they could plau- history ≻ loan amount ≻ applicant income ≻ sibly be provided by both humans and ADS. co-applicant income ≻ property area ≻ mar- ital status ≻ dependents ≻ education ≻ loan amount term ≻ self-employment ≻ gender. Re- 4. Preliminary Analyses vealing this ordered list of feature impor- and Findings tances in conjunction with (F) makes up our second explanation style (FFI). To construct Based on Section 3, we conducted an online our third and fourth explanation styles, we pilot study with 58 participants to infer pre- conducted an online survey with 20 quan- liminary insights regarding Q1 and Q2 and to titative and qualitative researchers to ascer- validate our study design. Among the partic- tain which of the aforementioned features ipants were 69% males, 29% females, and one are actionable—in a sense that people can person who did not disclose their gender; 53% (hypothetically) act on them in order to in- were students, 28% employed full-time, 10% crease their chances of being granted a loan. employed part-time, 3% self-employed, and According to this survey, the top-5 actionable 5% unemployed. The average age was 25.1 features are: loan amount, loan amount term, years, and 31% of participants have applied property area, applicant income, co-applicant for a loan before. For this pilot study, we income. Our third explanation style (FFICF) is only included the ADS settings (right branch then—in conjunction with (F) and (FFI)—the in Figure 1) and limited the conditions to provision of three counterfactual scenarios (F), (FFI), and (FFICF). The study participants where one actionable feature each is (mini- were randomly assigned to one of the three mally) altered such that our model predicts conditions, and each participant was pro- a loan approval instead of a rejection. The vided with two consecutive questionnaires last explanation style is (CF), without ad- associated with two different settings—one ditionally providing features or feature im- male and one female applicant. Participants portances. This condition aims at testing for this online study were recruited from all the effectiveness of counterfactual explana- over the world via Prolific4 [26] and asked tions in isolation, as opposed to providing to rate their agreement with multiple state- them in conjunction with other explanation 4 https://www.prolific.co/ Table 2 Pearson correlations between constructs for pilot study. Construct 1 Construct 2 Pearson’s 𝒓 Procedural Fairness Informational Fairness 0.47 Procedural Fairness Trustworthiness 0.78 Procedural Fairness Understandability 0.23 Informational Fairness Trustworthiness 0.72 Informational Fairness Understandability 0.69 Trustworthiness Understandability 0.41 ments on 5-point Likert scales, where a score derstandability. Overall, we found significant of 1 corresponds to “strongly disagree”, and a correlations (𝑝 < 0.05) between all constructs score of 5 denotes “strongly agree”. Addition- besides procedural fairness and understand- ally, we included multiple open-ended ques- ability. tions in the questionnaires to be able to carry out a qualitative analysis as well. Insights regarding Q1 We conducted multiple ANOVAs followed by Tukey’s tests 4.1. Quantitative Analysis for post-hoc analysis to examine the effects of our three conditions. The individual scores Constructs As mentioned earlier, we mea- for each construct and condition are provided sured four different constructs: understand- in Table 3. We found a significant effect be- ability (of the given explanations), procedu- tween different conditions on fairness per- ral fairness [24], informational fairness [24], ceptions for procedural fairness (𝐹 (2, 55) = and trustworthiness (of the decision maker); 3.56, 𝑝 = 0.035) as well as for informational see Appendix A for the associated measure- fairness (𝐹 (2, 55) = 10.90, 𝑝 < 0.001). Tukey’s ment items. Note that study participants re- test for post-hoc analysis showed that the ef- sponded to the same (multiple) measurement fect for procedural fairness was only signif- items per construct, and these measurements icant between the conditions (F) and (FFICF) were ultimately averaged to obtain one score (𝑝 = 0.040). When controlling for different per construct. We evaluated the reliability variables, such as study participants’ gender, of the constructs through Cronbach’s alpha— the effect for procedural fairness is reduced all values were larger than 0.8 thus showing to marginal significance (𝑝 > 0.05). For in- good reliability for all constructs [27]. We formational fairness the effect in the post-hoc proceeded to measure correlations between analysis without control variables is signifi- the four constructs with Pearson’s 𝑟 to obtain cant between (F) and (FFICF) (𝑝 < 0.001) as an overview of the relationships between our well as between (FFI) and (FFICF) (𝑝 = 0.042), constructs. Table 2 provides an overview of and it is marginally significant between (F) these relationships: Procedural fairness and and (FFI) (𝑝 = 0.072). Controlling for study informational fairness are each strongly cor- participants’ gender reduces the significance related with trustworthiness, and informa- between (FFI) and (FFICF) to marginal signif- tional fairness is strongly correlated with un- icance (𝑝 = 0.059); controlling for study par- Table 3 Construct scores by condition for pilot study. The scores, ranging from 1 (low) to 5 (high), were ob- tained by averaging across all measurement items for each construct. Construct (F) (FFI) (FFICF) Understandability 3.17 3.87 4.12 Procedural Fairness 3.28 3.40 3.91 Informational Fairness 2.79 3.33 3.92 Trustworthiness 2.92 3.39 3.83 ticipants’ age removes the significance be- experience in this field. tween these two conditions altogether. Interestingly, significant effects on under- 4.2. Qualitative Analysis standability between conditions (𝐹 (2, 55) = 7.52, 𝑝 = 0.001) came from (F) and (FFICF) In the following, we provide a summary of (𝑝 = 0.001) as well as (F) and (FFI) (𝑝 = 0.020). insightful responses to open-ended questions Significant effects of the conditions on trust- from our questionnaires. worthiness (𝐹 (2, 55) = 4.94, 𝑝 = 0.011) could only be observed between (F) and (FFICF) Regarding automated decision making (𝑝 = 0.007). In general, we urge to exercise Perhaps surprisingly, many participants ap- utmost caution when interpreting the quanti- proved of the ADS as the decision maker. tative results of our pilot study as the sample They perceived the decision to be less biased size is extremely small. We hope to gener- and argued that all applicants are treated ate more reliable and extensive insights with equally, because the ADS makes its choices our main study and a much larger number of based on facts, not based on the likeabil- participants. ity of a person: “I think that an automated system treats every individual fairly because Insights regarding Q2 We calculated everybody is judged according to the same Pearson’s 𝑟 between each of our fair- rules.” Some participants directly compared ness measures including trustworthiness the ADS to human decision makers: “I think and the study participants’ AI literacy. that [the decision making procedures] are fair All three measures, procedural fairness because they are objective, since they are au- (𝑟 = 0.35, 𝑝 = 0.006), informational fairness tomated. Humans usually [can’t] make de- (𝑟 = 0.52, 𝑝 < 0.001) and trustworthiness cisions without bias.” Other participants re- (𝑟 = 0.48, 𝑝 < 0.001) demonstrate a signif- sponded with a (somewhat expected) disap- icant positive correlation with AI literacy. proval towards the ADS. Participants criti- Therefore, within the scope of our pilot cized, for instance, that the decisions “are study, we found that participants with more missing humanity in them” and how an auto- knowledge and experience in the field of AI mated decision based “only on statistics with- tend to perceive the decision making process out human morality and ethics” simply can- and the provided explanations of the ADS not be fair. One participant went so far as at hand to be fairer and more trustworthy to formulate positive arguments for human than participants with less knowledge and bias in decision making procedures: “I do not believe that it is fair to assess anything that 5. Outlook greatly affects an individual’s life or [liveli- hood] through an automated decision system. The potential of automated decision making I believe some bias and personal opinion is of- and its benefits over purely human-made de- ten necessary to uphold ethical and moral stan- cisions are obvious. However, several in- dards.” Finally, some participants had mixed stances are known where such automated feelings because they saw the trade-off be- decision systems (ADS) are having undesir- tween a “cold approach” that lacks empathy able effects—especially with respect to fair- and a solution that promotes “equality with ness and transparency. With this work, we others” because it “eliminates personal bias”. aim to contribute novel insights to better un- derstand people’s perceptions of fairness and Regarding explanations Study partici- trustworthiness towards ADS, based on the pants had strong opinions on the features provision of varying degrees of information considered in the loan decision. Most partic- about such systems and their underlying pro- ipants found gender to be the most inappro- cesses. Moreover, we examine how these priate feature. The comments on this feature perceptions are influenced by people’s back- ranged from “I think the gender of the per- ground and experience in the field of arti- son shouldn’t matter” to considering gender ficial intelligence. As a first step, we have as a factor being “ethically wrong” or even conducted an online pilot study and obtained “borderline illegal”. Education and property preliminary results for a subset of conditions. area were named by many participants as be- Next, we will initiate our main study with ing inappropriate factors as well: “I think ed- a larger sample size and additional analyses. ucation, gender, property area [. . . ] are in- For instance, we will also explore whether appropriate factors and should not be consid- people’s perceptions of fairness and trust- ered in the decision making process.” On av- worthiness change when the decision maker erage, the order of feature importance was is claimed to be human (as opposed to purely rated as equally appropriate as the features automated). We hope that our contribution themselves. Some participants assessed the will ultimately help in designing more equi- order of feature importance in general and table decision systems as well as stimulate fu- came to the conclusion that it is appropri- ture research on this important topic. ate: “The most important is credit history in this decision and least gender so the order is References appropriate.” At the same time, a few partic- ipants rated the order of feature importance [1] N. R. Kuncel, D. S. Ones, D. M. Klieger, as inappropriate, for instance because “some In hiring, algorithms beat instinct, Har- things are irrelevant yet score higher than loan vard Business Review (2014). URL: http term.” In the first of two settings, the coun- s://hbr.org/2014/05/in-hiring-algorith terfactual for property area was received neg- ms-beat-instinct. atively by some: “It shouldn’t matter where [2] S. Townson, AI can make bank loans the property is located.” Yet, most participants more fair, Harvard Business Review found the counterfactual explanations in the (2020). URL: https://hbr.org/2020/11/ second setting to be appropriate: “The three ai-can-make-bank-loans-more-fair. scenarios represent plausible changes the indi- [3] A. Satariano, British grading debacle vidual could perform [. . . ]” shows pitfalls of automating govern- ment, The New York Times (2020). URL: Conference on Human Factors in Com- https://www.nytimes.com/2020/08/20 puting Systems, 2018, pp. 1–14. /world/europe/uk-england-grading-alg [11] J. Dodge, Q. V. Liao, Y. Zhang, R. K. orithm.html. Bellamy, C. Dugan, Explaining mod- [4] W. D. Heaven, Predictive policing al- els: An empirical study of how explana- gorithms are racist. They need to be tions impact fairness judgment, in: Pro- dismantled, MIT Technology Review ceedings of the 24th International Con- (2020). URL: https://www.technology ference on Intelligent User Interfaces, review.com/2020/07/17/1005396/predic 2019, pp. 275–285. tive-policing-algorithms-racist-disman [12] M. K. Lee, Understanding perception tled-machine-learning-bias-criminal- of algorithmic decisions: Fairness, trust, justice/. and emotion in response to algorith- [5] J. G. Harris, T. H. Davenport, Auto- mic management, Big Data & Society mated decision making comes of age, 5 (2018) 1–16. MIT Sloan Management Review (2005). [13] M. K. Lee, S. Baykal, Algorithmic medi- URL: https://sloanreview.mit.edu/articl ation in group decisions: Fairness per- e/automated-decision-making-comes- ceptions of algorithmically mediated vs. of-age/. discussion-based social division, in: [6] S. Barocas, M. Hardt, A. Narayanan, Proceedings of the 2017 ACM Confer- Fairness and machine learning, 2019. ence on Computer-Supported Coopera- URL: http://www.fairmlbook.org. tive Work and Social Computing, 2017, [7] J. Buolamwini, T. Gebru, Gender pp. 1035–1048. shades: Intersectional accuracy dispari- [14] R. Wang, F. M. Harper, H. Zhu, Fac- ties in commercial gender classification, tors influencing perceived fairness in in: Conference on Fairness, Account- algorithmic decision-making: Algo- ability and Transparency, 2018, pp. 77– rithm outcomes, development proce- 91. dures, and individual differences, in: [8] J. Angwin, J. Larson, S. Mattu, L. Kirch- Proceedings of the 2020 CHI Confer- ner, Machine bias, ProPublica (2016). ence on Human Factors in Computing URL: https://www.propublica.org/artic Systems, 2020, pp. 1–14. le/machine-bias-risk-assessments-in- [15] R. Goebel, A. Chander, K. Holzinger, criminal-sentencing. F. Lecue, Z. Akata, S. Stumpf, P. Kiese- [9] M. Srivastava, H. Heidari, A. Krause, berg, A. Holzinger, Explainable AI: Mathematical notions vs. human per- The new 42?, in: International Cross- ception of fairness: A descriptive ap- Domain Conference for Machine Learn- proach to fairness for machine learn- ing and Knowledge Extraction, 2018, ing, in: Proceedings of the 25th ACM pp. 295–303. SIGKDD International Conference on [16] M. Eslami, K. Vaccaro, M. K. Lee, Knowledge Discovery & Data Mining, A. Elazari Bar On, E. Gilbert, K. Kara- 2019, pp. 2459–2468. halios, User attitudes towards algorith- [10] R. Binns, M. Van Kleek, M. Veale, mic opacity and transparency in online U. Lyngs, J. Zhao, N. Shadbolt, ‘It’s re- reviewing platforms, in: Proceedings ducing a human being to a percentage’; of the 2019 CHI Conference on Human perceptions of justice in algorithmic de- Factors in Computing Systems, 2019, cisions, in: Proceedings of the 2018 CHI pp. 1–14. [17] A. Bussone, S. Stumpf, D. O’Sullivan, Learning 45 (2001) 5–32. The role of explanations on trust and re- [26] S. Palan, C. Schitter, Prolific.ac—a sub- liance in clinical decision support sys- ject pool for online experiments, Jour- tems, in: IEEE International Confer- nal of Behavioral and Experimental Fi- ence on Healthcare Informatics, 2015, nance 17 (2018) 22–27. pp. 160–169. [27] J. M. Cortina, What is coefficient alpha? [18] T. Kulesza, S. Stumpf, M. Burnett, An examination of theory and applica- S. Yang, I. Kwan, W.-K. Wong, Too tions, Journal of Applied Psychology 78 much, too little, or just right? Ways (1993) 98–104. explanations impact end users’ mental [28] V. McKinney, K. Yoon, F. M. Zahedi, The models, in: 2013 IEEE Symposium on measurement of web-customer satisfac- Visual Languages and Human-Centric tion: An expectation and disconfirma- Computing, 2013, pp. 3–10. tion approach, Information Systems Re- [19] C. Molnar, Interpretable machine learn- search 13 (2002) 296–315. ing, 2020. URL: https://christophm.git [29] J. A. Colquitt, J. B. Rodell, Measuring hub.io/interpretable-ml-book/. justice and fairness, in: R. S. Cropan- [20] A. Adadi, M. Berrada, Peeking inside zano, M. L. Ambrose (Eds.), The Oxford the black-box: A survey on explainable Handbook of Justice in the Workplace, artificial intelligence (XAI), IEEE Ac- Oxford University Press, 2015, pp. 187– cess 6 (2018) 52138–52160. 202. [21] M. K. Lee, A. Jain, H. J. Cha, S. Ojha, [30] C.-M. Chiu, H.-Y. Lin, S.-Y. Sun, M.-H. D. Kusbit, Procedural justice in al- Hsu, Understanding customers’ loyalty gorithmic fairness: Leveraging trans- intentions towards online shopping: An parency and outcome control for fair al- integration of technology acceptance gorithmic mediation, Proceedings of model and fairness theory, Behaviour & the ACM on Human-Computer Interac- Information Technology 28 (2009) 347– tion 3 (2019) 1–26. 360. [22] R. F. Kizilcec, How much information? [31] L. Carter, F. Bélanger, The utilization Effects of transparency on trust in an of e-government services: Citizen trust, algorithmic interface, in: Proceedings innovation and acceptance factors, In- of the 2016 CHI Conference on Human formation Systems Journal 15 (2005) 5– Factors in Computing Systems, 2016, 25. pp. 2390–2395. [32] A. Wilkinson, J. Roberts, A. E. While, [23] D. Long, B. Magerko, What is AI lit- Construction of an instrument to mea- eracy? Competencies and design con- sure student information and communi- siderations, in: Proceedings of the 2020 cation technology skills, experience and CHI Conference on Human Factors in attitudes to e-learning, Computers in Computing Systems, 2020, pp. 1–16. Human Behavior 26 (2010) 1369–1376. [24] J. A. Colquitt, D. E. Conlon, M. J. Wes- son, C. O. Porter, K. Y. Ng, Justice at the millennium: A meta-analytic re- view of 25 years of organizational jus- tice research, Journal of Applied Psy- chology 86 (2001) 425–445. [25] L. Breiman, Random forests, Machine A. Constructs and Items for Automated Decisions All items within the following constructs were measured on a 5-point Likert scale and mostly drawn (and adapted) from previous studies. 1. Understandability Please rate your agreement with the following statements: • The explanations provided by the automated decision system are clear in mean- ing. [28] • The explanations provided by the automated decision system are easy to com- prehend. [28] • In general, the explanations provided by the automated decision system are un- derstandable for me. [28] 2. Procedural Fairness The statements below refer to the procedures the automated decision system uses to make decisions about loan applications. Please rate your agreement with the following statements: • Those procedures are free of bias. [29] • Those procedures uphold ethical and moral standards. [29] • Those procedures are fair. • Those procedures ensure that decisions are based on facts, not personal biases and opinions. [29] • Overall, the applying individual is treated fairly by the automated decision sys- tem. [29] 3. Informational Fairness The statements below refer to the explanations the automated decision system offers with respect to the decision-making procedures. Please rate your agreement with the following statements: • The automated decision system explains decision-making procedures thor- oughly. [29] • The automated decision system’s explanations regarding procedures are reason- able. [29] • The automated decision system tailors communications to meet the applying in- dividual’s needs. [29] • I understand the process by which the decision was made. [10] • I received sufficient information to judge whether the decision-making proce- dures are fair or unfair. 4. Trustworthiness The statements below refer to the automated decision system. Please rate your agree- ment with the following statements: • Given the provided explanations, I trust that the automated decision system makes good-quality decisions. [12] • Based on my understanding of the decision-making procedures, I know the au- tomated decision system is not opportunistic. [30] • Based on my understanding of the decision-making procedures, I know the au- tomated decision system is trustworthy. [30] • I think I can trust the automated decision system. [31] • The automated decision system can be trusted to carry out the loan application decision faithfully. [31] • In my opinion, the automated decision system is trustworthy. [31] 5. AI Literacy • How would you describe your knowledge in the field of artificial intelligence? • Does your current employment include working with artificial intelligence? Please rate your agreement with the following statements: • I am confident interacting with artificial intelligence. [32] • I understand what the term artificial intelligence means. B. Explanation Styles for Automated Decisions and One Exemplary Setting (Male Applicant) Explanation Style (F) A finance company offers loans on real estate in urban, semi-urban and ru- ral areas. A potential customer first applies online for a specific loan, and afterwards the company assesses the customer’s eligibility for that loan. An individual applied online for a loan at this company. The company denied the loan application. The decision to deny the loan was made by an automated decision system and communicated to the applying individual electronically and in a timely fashion. The automated decision system explains that the following factors (in alphabetical order) on the individual were taken into account when making the loan application decision: • Applicant Income: $3,069 per month • Co-Applicant Income: $0 per month • Credit History: Good • Dependents: 0 • Education: Graduate • Gender: Male • Loan Amount: $71,000 • Loan Amount Term: 480 months • Married: No • Property Area: Urban • Self-Employed: No Explanation Style (FFI) A finance company offers loans on real estate in urban, semi-urban and ru- ral areas. A potential customer first applies online for a specific loan, and afterwards the company assesses the customer’s eligibility for that loan. An individual applied online for a loan at this company. The company denied the loan application. The decision to deny the loan was made by an automated decision system and communicated to the applying individual electronically and in a timely fashion. The automated decision system explains . . . • . . . that the following factors (in alphabetical order) on the individual were taken into account when making the loan application decision: – Applicant Income: $3,069 per month – Co-Applicant Income: $0 per month – Credit History: Good – Dependents: 0 – Education: Graduate – Gender: Male – Loan Amount: $71,000 – Loan Amount Term: 480 months – Married: No – Property Area: Urban – Self-Employed: No • . . . that different factors are of different importance in the decision. The fol- lowing list shows the order of factor importance, from most important to least important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount Term ≻ Self-Employed ≻ Gender Explanation Style (FFICF) A finance company offers loans on real estate in urban, semi-urban and ru- ral areas. A potential customer first applies online for a specific loan, and afterwards the company assesses the customer’s eligibility for that loan. An individual applied online for a loan at this company. The company denied the loan application. The decision to deny the loan was made by an automated decision system and communicated to the applying individual electronically and in a timely fashion. The automated decision system explains . . . • . . . that the following factors (in alphabetical order) on the individual were taken into account when making the loan application decision: – Applicant Income: $3,069 per month – Co-Applicant Income: $0 per month – Credit History: Good – Dependents: 0 – Education: Graduate – Gender: Male – Loan Amount: $71,000 – Loan Amount Term: 480 months – Married: No – Property Area: Urban – Self-Employed: No • . . . that different factors are of different importance in the decision. The fol- lowing list shows the order of factor importance, from most important to least important: Credit History ≻ Loan Amount ≻ Applicant Income ≻ Co-Applicant Income ≻ Property Area ≻ Married ≻ Dependents ≻ Education ≻ Loan Amount Term ≻ Self-Employed ≻ Gender • . . . that the individual would have been granted the loan if—everything else unchanged—one of the following hypothetical scenarios had been true: – The Co-Applicant Income had been at least $800 per month – The Loan Amount Term had been 408 months or less – The Property Area had been Rural