=Paper= {{Paper |id=Vol-3251/paper1 |storemode=property |title=Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI |pdfUrl=https://ceur-ws.org/Vol-3251/paper1.pdf |volume=Vol-3251 |authors=Greta Warren,Mark T. Keane,Ruth M.J. Byrne |dblpUrl=https://dblp.org/rec/conf/ijcai/WarrenKB22 }} ==Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI== https://ceur-ws.org/Vol-3251/paper1.pdf
Features of Explainability: How Users Understand
Counterfactual and Causal Explanations for Categorical and
Continuous Features in XAI
Greta Warren 1,2, Mark T. Keane1,2,3 and Ruth M.J. Byrne 4
1
  School of Computer Science, University College Dublin, Dublin, Ireland
2
  Insight SFI Centre for Data Analytics, University College Dublin, Dublin, Ireland
3
  VistaMilk SFI Research Centre, University College Dublin, Dublin, Ireland
4
  School of Psychology and Institute of Neuroscience, Trinity College Dublin, University of Dublin, Dublin,
Ireland


                Abstract
                Research on eXplainable AI (XAI) has recently focused on the use of counterfactual
                explanations to address interpretability, recourse, and bias in AI decisions. Many proponents
                of these counterfactual algorithms claim they are cognitively valid in their generation of
                “plausible” explanations using “important”, “actionable” or “causal” features, where these
                features are computed from the model being explained. However, very few of these claims
                have been tested by psychological studies; specifically, claims about the role of different
                feature-types have not been validated, perhaps suggesting that a more considered analysis of
                these knowledge representations is required. In this paper, we consider the cognitive validity
                of a key representational distinction, between continuous and categorical features, in
                counterfactual explanations. In a controlled user study (N=127), we tested the effects of
                counterfactual and causal explanations on the objective accuracy of users’ predictions of the
                decisions made by a simple AI system, and their subjective judgments of satisfaction and trust
                in the explanations. We found that users understand explanations referring to categorical
                features more readily than those referring to continuous features. We also discovered a
                dissociation between objective and subjective measures: counterfactual explanations elicit
                higher accuracy of predictions than no-explanation control descriptions but no higher accuracy
                than causal explanations, and yet counterfactual explanations elicit greater satisfaction and
                trust judgments than causal explanations. We discuss the implications of these findings for
                cognitive aspects of knowledge representation in XAI.

                Keywords 1
                XAI, counterfactual explanation, algorithmic recourse, interpretable machine learning

1. Introduction
The use of automated decision making in computer programs that impact people’s everyday lives has
led to rising concerns about the fairness, transparency, and trustworthiness of Artificial Intelligence
(AI) [1,2]. These concerns have created renewed interest in, and an urgency about, tackling the problem
of eXplainable AI (XAI), that is, the need to provide explanations of AI systems’ decisions. Recently,
counterfactual explanations have been advanced as a promising solution to the XAI problem because
of their compliance with data protection regulations, such as the EU’s General Data Protection
Regulation (GDPR) [3], their potential to support algorithmic recourse [4], and their psychological
importance in explanation [5,6]. The prototypical XAI scenario for counterfactuals is the explanation
of an automated decision when a bank customer’s loan application is refused; on querying the decision,

IJCAI-ECAI’22 Workshop: Cognitive Aspects of Knowledge Representation, July 23–29, 2022, Vienna, Austria
EMAIL: greta.warren@ucdconnect.ie (G. Warren); mark.keane@ucd.ie (M. T. Keane); rmbyrne@tcd.ie (R. M. J. Byrne)
ORCID: 0000-0002-3804-2287 (G. Warren); 0000-0001-7630-9598 (M. T. Keane); 0000-0003-2240-1211 (R. M. J. Byrne)
             ©️ 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
the customer is told “if you had asked for a lower loan of $10,000, your application would have been
approved”. These counterfactual explanations appear to be readily understood by humans, while also
offering users possible recourse to change the decision’s outcome (e.g., by lowering their loan request).
Although there is now a substantial XAI literature on counterfactuals, because of a lack of user studies
we know very little about how people understand these counterfactual explanations of AI decisions, and
which aspects of counterfactual methods are critical to their use in XAI. Many counterfactual algorithms
aim to explain decisions by referring to “plausible”, “actionable”, or “causally important” features,
however, it is unclear how to reliably identify these sorts of features, much less how, and which (if any)
of these characteristics are important to users. In this paper, we focus on the representational distinction
between continuous and categorical features in a statistically well-powered and psychologically well-
controlled study (N=127), examining how different explanations impact people’s understanding of
automated decisions. We test explanations of automated decisions about blood alcohol content and legal
limits for driving using counterfactual explanations (e.g., “if John had drunk 3 units instead of 5 units,
he would have been under the limit”), compared to causal explanations (e.g., “John was over the limit
because he drank 5 units”), and descriptions (“John was over the limit”). The study examines not only
the effects of explanations but also the effects of different types of features – categorical features
(gender, stomach-fullness) and continuous features (units, duration of drinking, body weight). It
includes objective measures of the accuracy of participants’ understanding of the automated decision,
and subjective measures of their satisfaction and trust in the system and its decisions. In the remainder
of this introduction, we consider the relevant related work in this area on counterfactual explanations
in XAI (see 1.1), as well as how feature-types (see 1.2), and causal explanations (see 1.3) have been
handled in these systems, before outlining the current experiment (see 1.4).

1.1.    Counterfactual Explanations
    In recent years, XAI research on the use of counterfactuals has exploded, with over 100 distinct
computational methods proposed in the literature (for reviews see [7,8]). These various techniques
argue for different approaches to counterfactual generation; some advance optimisation techniques,
[3,9] others emphasise the use of causal models [10], distributional analyses [11] or the importance of
instances [12]. These alternative proposals are typically motivated by claims that the method in question
generates “good” counterfactuals for end-users; for instance, that the counterfactuals are
psychologically “good” because they are proximal [3], plausible/actionable [10], sparse [12] or diverse
[9]. However, most of these claims are based on intuition rather than on empirical evidence. A recent
review found that just 21% of 117 papers on counterfactual explanation included any user-testing, and
fewer (only ~7%) tested specific properties of the method proposed [7]. This state of affairs raises the
possibility that many of these techniques contain functions with little or no psychological validity, that
may have no practical benefit to people in real-life applications [13].
       Consider what we have learned from the few user studies on counterfactuals in XAI. Most user
studies test whether counterfactual explanations impact people’s responses relative to no-explanation
controls or some other explanation strategy (e.g., example-based explanations or rule-based
explanations [14]). These studies assess explanation quality using “objective” measures (e.g., user
predictive accuracy) and/or “subjective” measures (e.g., user judgments of trust, satisfaction,
preference). In philosophical and psychological research, explanations are understood to be designed
to change people’s understanding of the world, events or phenomena [5,15]. In XAI, this definition has
been conceptualised to mean that explanation should improve people’s understanding of the AI system,
the domain involved in the task and/or their performance on the target task [16]. An explanation is
effective, therefore, if people objectively perform better on a task involving the AI system by, for
example, being faster, more accurate, or by being able to predict what the system might do next [14,17–
20]. Concretely, if a person with diabetes is using an application to estimate their blood sugar levels for
insulin treatments, ideally the system’s predictions would help them better understand their condition
in the future; for example, their predictions of their own blood sugar levels should improve, when the
application’s help is not available.
    So far, a handful of studies have shown mixed support for the use of counterfactual explanations in
improving user understanding in this regard. “What-if” counterfactual explanations have been found to
improve performance in prediction and diagnosis tasks relative to no-explanation controls, however
they did not improve performance appreciably more than other explanation options (“why-not”, “how-
to” and “why” explanations) [17]. Visual counterfactual explanations were shown to increase
classification accuracy relative to no-explanation controls in a small sample of users [18]. In some cases,
prompting users to reason counterfactually about a decision may impair objective performance. One
study compared counterfactual tasks, in which users were asked if a system’s recommendation would
change given a perturbation of some input feature, to simulation tasks, in which users were asked to
predict the recommendation based on the input features [19]. Counterfactuals elicited longer response
times, greater judgments of difficulty, and lower accuracy than forward simulation. Another study
found that people were less accurate when asked to produce a counterfactual change for an instance
than when asked to predict an outcome from the features [20]. These findings are consistent with the
proposal that counterfactuals require people to consider multiple possibilities, to compare reality to the
suggested alternative and to infer causal relations, as is often reported in the cognitive psychological
literature [21]. They also provide further evidence that counterfactuals aid people to reason about past
decisions, and prepare for future ones, but require cognitive effort and resources [22-25]. However,
some caution should be exercised in generalising from the small collection of XAI user studies on
counterfactual explanations, given the diversity of tasks, domains and experimental designs; some do
not involve controls, and many others use too few test items or very small numbers of participants to
be confident about the findings reported.
    XAI research has also focused on whether explanations work subjectively; that is, whether the
explanation improves people’s trust or satisfaction in the AI system, or whether the explanation makes
people “feel better” about their interaction with the system, with generally positive results. Users judge
counterfactuals as more appropriate and fair than example-based [2], demographic-based, and
influence-based explanations [1]. Providing contrastive explanations in a sales-forecasting domain has
also been found to increase self-reported understanding of the system’s decisions [20]. However, two
studies have shown dissociations between objective and subjective measures in XAI. Users shown
contrastive rule-based explanations self-reported better understanding of the system’s decision than no-
explanation controls, however neither of these groups, nor users shown contrastive example-based
explanations, showed any improvement in accuracy for predicting what the system might do, and tended
to follow the system’s advice, even when incorrect [14]. A similar disconnect between objective and
subjective evaluation measures was found for tasks that systematically increased the complexity of a
system’s causal rules; although users’ response times and judgments of difficulty also increased, little
effect of complexity was observed on task accuracy [19]. Thus, in XAI, studies asking users how well
they understand a system’s decisions or how satisfying they find an explanation, may not accurately
reflect the true explanatory power of different sorts of explanations, particularly given people’s
propensity to overestimate their understanding of complex causal mechanisms [26]. Notably, if an
explanation strategy has no objective impact on understanding but is subjectively preferred by users,
then concerns about its ethical use could arise. In the current study, we assess the extent to which
counterfactual and causal explanations increase people’s understanding, using measures that are
objective (accuracy in predicting what the system will do) and subjective (i.e., judgments of trust and
satisfaction).

1.2.    Feature-Types in Explanations
Advocates of counterfactual explanation methods often emphasise the role of different feature-types in
making explanations “good” or “psychologically plausible”. Many counterfactual methods distinguish
between the types of features to be used in explanations; arguing that it makes sense to use features that
are mutable rather than immutable [27] (e.g., being told to “reduce your age to get a loan” is not useful).
Furthermore, proponents argue that the features used in the counterfactual should be causally important
[8] and/or actionable [10]; a counterfactual explanation proposing to reduce the size of the requested
loan is more actionable and therefore better than one telling the customer to modify a long-standing,
bad credit-rating. However, to our knowledge, there is only one existing study that examines users’
assessments of feature-types in XAI [28], which found that while the mutability/actionability of a
feature is predictive of user satisfaction and (self-reported) understanding, the importance of this factor
varies depending on the domain. Although such feature-distinctions are made readily in AI models,
from a cognitive perspective they appear context-dependent and ill-defined [29,30]. For example, the
different sorts of mutability are more ambiguous than assumed by computational approaches. Indeed,
psychologically, perhaps there are more fundamental representational distinctions to be made between
feature properties, such as whether people can understand continuous features (such as income or credit-
score) or categorical features (such as race or gender) equally well. Psychological studies have long
shown that people do not tend to spontaneously make changes to continuous variables such as time or
speed, e.g., when they imagine how an accident could have been avoided [31]. This representational
distinction between continuous and categorical features is important, because if people are less likely
to manipulate continuous features or have difficulties understanding counterfactuals about them, then
the potential causal importance or actionability of such features is moot. For example, in algorithmic
recourse, people may better understand counterfactual advice that says, “you need to change your
credit-score from bad to good” rather than advice that says “you need to increase your credit score from
3.4 to 4.6”. To date, the differential impacts of categorical and continuous feature-types has not been
considered in counterfactual methods for XAI. Most counterfactual generation methods make the
assumption that users treat and understand continuous and categorical features in the same way (e.g.,
DiCE [9] applies one-hot encoding to categorical feature values). In the present study, we examine
people’s understanding of counterfactual explanations for different feature-types (continuous versus
categorical), predicting that explanations focusing on categorical features will be more readily
understood, leading to greater predictive accuracy based on these features.

1.3.    Causal Explanations
A third goal of the present work is to compare counterfactual explanations to those using causal rules
(with respect to feature-types), as the latter are a long-standing explanation strategy in AI. In
philosophical and psychological research, there is consensus that everyday explanations often invoke
some notion of cause and effect [15]. Causal explanations and counterfactuals have long been viewed
as being intertwined in complex ways [27,32,33], although psychologically they differ in significant
ways. For example, when people create causal explanations, they focus on causes that are sufficient and
may be necessary for an outcome to occur, whereas when they create counterfactuals they focus on
causes that are necessary but may not be sufficient [23]. In AI, causal explanations are often cast as IF-
THEN rules (e.g., in expert systems such as MYCIN [34] or decision trees [19, 35]). In XAI, it is
commonly claimed that such rule-based explanations are inherently interpretable, although some have
pointed out that this claim may not be accurate [13]. One study reports that when users were given
causal decision sets for a system, they achieved high accuracy in a prediction task, however in a
counterfactual task using the same decision sets, participants’ accuracy decreased, while their response
times and judgments of subjective difficulty increased [19]. Another study found that contrastive rule-
based explanations were effective in helping users identify the crucial feature in a system’s decision
and increased people’s sense of understanding the system; indeed contrastive rule-based explanations
were more effective than contrastive example-based explanations [14]. These findings suggest that
causal rules may be as good as, if not sometimes better than, counterfactual explanations in some task
contexts. In the cognitive psychological literature, it has been found that when people are asked to
reflect on an imagined negative event, they spontaneously generate twice as many causal explanations
as counterfactual thoughts about it [36], consistent with the proposal that causal explanations may not
require people to compare multiple possibilities in the way that counterfactual explanations do [37,38].
Since it has also been shown that people make difficult inferences from counterfactuals more readily
than they do from their factual counterparts [39], it is plausible that counterfactuals’ evocation of
multiple possibilities may help users consider an AI system’s decision more deeply. Given the clear
importance of these two explanation options – causal and counterfactual strategies – both are compared
to one another in the present study. Considering that their appeal to the purportedly contrastive nature
of explanation [5] is one of the main arguments for the use of counterfactuals in XAI, and given the
psychological evidence that counterfactuals are understood by thinking about more possibilities than
causal explanations, we predict that counterfactual explanations will aid users in understanding the
system’s decisions more than causal explanations, and that both will outperform mere descriptions of
an outcome.

1.4.    Outline of Current Study
The study tested the impact of counterfactual versus causal explanations, and continuous versus
categorical features, on users’ accuracy of understanding and subjective evaluation of a simulated AI
system designed to predict blood alcohol content and legal limits. Participants were shown predictions
by the system for different instances, with explanations (e.g. “If Mary had weighed 80kg instead of
75kg, she would have been under the limit.”). The study consisted of two phases (i) a training phase in
which participants were asked to predict the system’s decision (i.e. an individual being over or under
the legal blood alcohol content threshold to drive a car), and were provided with feedback on the
system’s predictions and with explanations for each decision, and (ii) a testing phase, in which they
were asked to predict outcomes for a different set of test instances, this time with no feedback nor any
explanations. In the training phase, participants considered the system’s predictions and learned about
the blood alcohol content domain with the help of the explanations, to determine whether this
experience objectively improved their understanding of the domain. The testing phase objectively
measured their developed understanding of the system by measuring the accuracy of their predictions.
Users’ subjective evaluations were also recorded by measuring their judgments of satisfaction and trust.

2. The Task: Predicting Legal Limits for Driving
Participants were presented with the output of a simulated AI system presented as an application,
designed to predict whether someone is over the legal blood alcohol content limit to drive. The system
relies on a commonly-used approximate method, the Widmark equation [40], that uses five key features
for blood alcohol content with the limit threshold being set at 0.08% alcohol per 100ml of blood. This
formula was used to generate a dataset of instances for normally-distributed values of the feature-set,
from which the study’s materials were drawn (N=2000).
    In the experimental task, participants were instructed that they would be testing a new application,
SafeLimit, designed to inform people whether or not they are over the legal limit to drive, from five
features: units of alcohol consumed by the person, weight (in kg), duration of drinking period (in
minutes), gender (male/female) and stomach-fullness (full/empty). The experiment consisted of two
phases. In the training phase, participants were shown examples of tabular data for different individuals,
and asked to make a judgment about whether each individual was under or over the limit on each
screen. Participants selected one of three options: “Over the limit”, “Under the limit”, or “Don’t know”
by clicking the corresponding on-screen button. The order of these options was randomised, to ensure
that participants did not merely click on the same button-order each time. After giving their response,
feedback was given on the next page, with the correct answer highlighted using a green tick-mark, and
the incorrect answer (if selected) highlighted using a red X-mark (see Figures 1 and 2). Above the
answer options, participants were also shown an explanation, and which explanation they were shown
depended on the experimental condition. Figures 1 and 2 show sample materials used in the
counterfactual and causal conditions, respectively. Note that in both conditions the explanations draw
attention to a key feature (e.g., the units drunk) as being critical to the prediction made. In all the study’s
conditions, a balanced set of instances were used, with eight items for each of the five features
presented. Upon completing the training phase, participants began the testing phase (see Figure 3).
Again, they were shown instances referring to individuals (different to those in the training phase), and
asked to judge if the individual was over or under the legal limit to drive. After submitting their
response, no feedback or explanation was given, and they moved on to the next trial. For each instance,
participants were asked to consider a specific feature in making their prediction; for instance, “Given
this person’s WEIGHT, please make a judgment about their blood alcohol level.” Again, in this phase,
a balanced set of instances was used, with eight items for each of the five features presented.
Figure 1: Feedback for (a) Correct Answer and (b) Incorrect Answer in the Counterfactual condition.




Figure 2: Feedback for (a) Correct Answer and (b) Incorrect Answer in the Causal condition.

   The objective measure of performance in both phases of the study was accuracy (i.e., correct
predictions made by participants compared to those of the system). The subjective measures were
explanation satisfaction and trust in the system, assessed using the DARPA project’s Explanation
Satisfaction and Trust scales [16] respectively). To assess engagement with the task, participants
completed four attention checks at random intervals throughout the experiment, and were asked to recall
the 5 features used by the application by selecting them from a list of 10 options at the end of the session.
Figure 3: Example of a prediction task in the testing phase.

2.1.    Method
We compared the impact of counterfactual and causal explanations, to descriptions of the system’s
decisions as a control condition, on the predictions people made about the SafeLimit application’s
decisions. Participants were assigned in fixed order to one of three groups (counterfactual, causal,
control) and completed the experiment, consisting of (i) a training phase in which they made predictions
and were given feedback with explanations or descriptions and (ii) a testing phase where they made
predictions with no feedback and no explanations (for all groups). Hence, any observed differences in
accuracy in the testing phase should reflect people’s understanding of the AI system based on their
experiences in the training phase, which differed only in the nature of the explanation (or control
description) provided. Participants were presented with 40 items in each phase, which were
systematically varied in terms of the five features used with balanced occurrence (i.e., eight instances
for each feature). Explanation satisfaction and trust in the system were measured following the training
and testing phases. Our primary predictions were: (i) explanations will improve accuracy, that is,
performance in the training phase will be more accurate than performance in the testing phase, (ii)
counterfactual explanations will improve accuracy more than causal explanations, as they are
potentially more informative, (iii) predictions about categorical features will be more accurate than
predictions about continuous features, if people find the former less complex than the latter, and (iv)
counterfactual explanations will be judged as more satisfying and trustworthy than causal explanations,
given previous studies showing that they are often subjectively preferred over other explanations.

2.1.1. Participants and Design
The participants (N=127), crowdsourced using the Prolific platform (https://www.prolific.co/), were
randomly assigned to the three between-participant conditions: counterfactual explanation (n=41),
causal explanation (n=43) and control (n=43). These groups consisted of 80 women, 46 men, and one
non-binary person aged 18-74 years (M=33.54, SD=13.15); and were pre-screened to select native
English speakers from Ireland, the United Kingdom, the United States, Australia, Canada and New
Zealand, who had not participated in previous related studies. The experimental design was a 3
(Explanation: counterfactual, causal, control) x 2 (Task: training vs testing phase) x 5 (Feature: units,
duration, gender, weight, stomach-fullness) design, with repeated measures on the latter two variables.
A further 11 participants were excluded prior to any data analysis, one for giving identical responses
for each trial, and 10 who failed more than one attention or memory check. Before testing, the power
analysis with G*Power [41] indicated that 126 participants were required to achieve 90% power for a
medium-sized effect with alpha <.05 for two-tailed tests. Ethics approval for the study was granted by
the University College Dublin ethics committee with the reference code LS-E-20-11-Warren-Keane.

2.1.2. Materials and Procedure
Eighty instances were randomly selected, based on key filters from the 2000-item dataset generated for
the blood alcohol content domain (based on stepped increments of a feature’s normally-distributed
values with realistic upper/lower limits). Specifically, the procedure randomly selected an instance
(query case) and incrementally increased or decreased one of the five feature’s values until its blood
alcohol content value crossed the decision boundary to create a counterfactual case. For the categorical
features, gender and stomach-fullness, the inverse value was assigned, while continuous variables were
incremented in steps of 15kg for weight, 15 minutes for duration and 1 unit for alcohol. If the query
case could not be perturbed to cross the decision boundary, a different case was randomly selected, and
the procedure was re-started. If the perturbation was successful, the instance was selected as a material
and its counterfactual was used as the basis for the explanation shown to the counterfactual group. For
example, if an instance with units = 4 crossed the decision boundary when it was reduced by one unit
(to be under rather than over the limit) the counterfactual explanation read “If John had drunk 3 units
instead of 4 units, he would have been under the limit”. The matched causal explanation read “John is
over the limit because he drank 4 units”, with the control group given a description of the outcome
(e.g., “John is over the limit”). This selection procedure was performed 16 times for each feature, a total
of 80 times, with the further constraint that an equal number of instances were found on either side of
the decision boundary (i.e., equal numbers under and over the limit). Each instance was then randomly
assigned to one of two sets of materials, each comprising 40 items, again ensuring an equal number of
instances were classified as under/over the limit. To avoid any material-specific confounds, the
materials presented in the training and testing phases were counterbalanced, so that half of the
participants in each group saw Set A in the training phase, and Set B in the testing phase, and this order
was reversed for the other half of the participants. After data collection, t-tests verified that there was
no effect of material-set order. Participants read detailed instructions about the tasks (available at
https://osf.io/j7rm3/) and completed one practice trial for each phase of the study before commencing.
They then progressed through the presented instances, randomly re-ordered for each participant, within
the training and testing phases. After completing both phases, they completed the Explanation
Satisfaction and Trust scales. Participants were debriefed and paid £2.61 for their time. The experiment
took approximately 28 minutes to complete.

2.2.    Results and Discussion
The results show that providing explanations improved the accuracy of people’s predictions, and that
categorical features led to higher prediction accuracy than continuous features. Participants’ accuracy
on categorical features was markedly higher in the testing phase than the training phase, whereas their
accuracy on continuous features remained at similar levels in both phases (an effect that occurred
independently of the explanation type). Participants judged counterfactual explanations to be more
satisfying and trustworthy than causal explanations, however counterfactual explanations had only a
slightly greater impact than causal explanations on participants’ accuracy in predicting the AI system’s
decisions. The data for this experiment are publicly available at https://osf.io/wqdtn/.

2.2.1. Analysis of the Accuracy Measure
A 3 (Explanation: counterfactual, causal, control) x 2 (Task: training vs testing) x 5 (Feature: units,
duration, gender, weight, stomach fullness) mixed ANOVA with repeated measures on the second two
factors was conducted on the proportion of correct answers given by each participant (see Figure 4). A
Huynh-Feldt correction was applied to the main effect of Feature and its interactions. Significant main
effects were found for Explanation, F(2,124)=5.63, p=.005, ηp2=.083, for Task, F(1,124)=32.349,
p<.001, ηp2=.207, and for Feature, F(3.945, 489.156)=47.599, p<.001, ηp2=.277. Task interacted with
Feature, F(4, 496)=7.23, p<.001, ηp2=.055. No other effects were significant1. These effects were further
examined in post hoc analyses.
    First, with respect to the main effect of Explanation, post hoc Tukey HSD tests showed that the
Counterfactual group (M=.636, SD=.08) was more accurate than the Control group (M=.590, SD=.08),
p=.003, d=.22. However, the Causal group (M=.614, SD=.09) did not differ significantly from the
Counterfactual, p=.245, or Control groups, p=.186. Further exploratory analysis indicated there was a
reliable trend in increasing accuracy with the following ordering of the groups for their accuracy scores
(Page’s L(40)=1005.0, p<.001): Counterfactual > Causal > Control. These results suggest that providing
explanations is better than not providing them, for improving accuracy. They also show, as predicted,
that counterfactual explanations have a greater impact than causal explanations, and compared to a
control condition given no explanations. Note that these effects were observed for both phases of the
study overall (Explanation does not interact with Task).
    Second, with respect to the significant Task and Feature main effects, and their significant
interaction, the decomposition of the interaction revealed that accuracy improves from the training to
the testing phase for the categorical features (gender, stomach-fullness), but not for the continuous
features (units, weight and duration). Post hoc pairwise comparisons with a Bonferroni-corrected alpha
of .002 for 25 comparisons showed that participants made more correct responses in the testing phase
than the training phase when considering gender, t(126)=5.626, p<.001, d=.50, and stomach-fullness,
t(126)=4.430, p<.001, d=.39, but not units, t(126)=1.350, p=.179, weight, t(126)=-1.209, p=.229, or
duration, t(126)=.32, p=.75. The analysis also showed that within each phase of the study, the
categorical features produced higher accuracy than the continuous features, confirming the prediction
that people find the former easier to understand than the latter. In the training phase, accuracy for gender
was significantly higher than accuracy for units, t(126)=4.935, p<.001, d=.44, weight, t(126)=6.824,
p<.001, d=.61, duration, t(126)=6.332, p<.001, d=.58, and stomach-fullness, t(126)=5.202, p<.001,
d=.46, all other features did not differ significantly from each other (p>.05 for all comparisons). In the
testing phase, similar tests found accuracy to be higher for gender than for units, t(126)=8.844, p<.001,
d=.78, weight, t(126)=10.824, p<.001, d=.96, duration, t(126)=10.81, p<.001, d=.96 and stomach-
fullness, t(126)=4.986, p<.001, d=.44. Furthermore, accuracy for stomach-fullness was significantly
higher than that for weight, t(126)=4.943, p<.001, d=.44, duration, t(126)=4.959, p<.001, d=.44, and
units, t(126)=2.853, p=.005, although the latter was not significant on the corrected alpha.2
    Further exploratory analysis indicates it is the diversity in the range of feature values that may lead
to these effects, rather than some abstract ontological status of the feature. When we rank-ordered each
of the features in terms of the number of unique values present in the materials, we found that this rank-
ordering predicted the observed trend in accuracy in the testing phase. That is, the rank ordering from
highest-to-lowest diversity – duration (60 unique values) > weight (36 unique values) > units (4 unique
values) > stomach-fullness (2 unique values) = gender (2 unique values) – inversely predicts the trend
in accuracy: duration (M=.549) < weight (M=.557) < units (M=.615) < stomach-fullness (M=.675), <
gender (M=.796); Page’s L(127)=6256.5, p<.001.




1
  No other two-way interactions were reliable, neither Explanation with Task, F(2, 124)=.759, p=.47,
nor Explanation with Feature, F(7.89, 489.156)=1.14, p=.335, nor was the three-way interaction
significant, F(8, 496)=1.215, p=.288.
2
  Accuracy for units was significantly higher than weight, t(126)=3.152, p=.002, d=.28 and duration,
t(126)=3.539, p=.001, d=.31. Accuracy for weight and duration did not differ, t(126)=.385, p=.701.
Figure 4: Mean accuracy (proportion of correct answers) across conditions for each feature in the (A)
Training and (B) Testing phases of the study. Error bars are standard error of the mean; dashed line
represents chance accuracy.

2.2.2. Analysis of the Subjective Measures: Satisfaction and Trust
All groups completed the DARPA Explanation Satisfaction and Trust scales after completing the two
main phases in the experiment (see Figure 5).




Figure 5: Summed judgments for Explanation Satisfaction and Trust scales. Error bars are standard
error of the mean.
    Satisfaction Measure. A one-way ANOVA was carried out on the summed judgments for the
Explanation Satisfaction scale to examine group differences in satisfaction levels for the explanations
provided. Significant differences between the three groups were identified F(2, 126)=6.104, p=.003,
ηp2=.09. Post hoc Tukey HSD tests showed that the counterfactual group (M=27.83, SD=6.12) gave
significantly higher satisfaction judgments than the causal group (M=22.79, SD=6.63), p=.002, d=0.76.
The control group (M=25.86, SD=7.19) did not differ significantly from either the counterfactual
(p=.369) or the causal (p=.087) groups. A reliable trend was identified when rank-ordering judgments
for each item in the order: Counterfactual > Control > Causal, Page’s L(8)=111.0, p<.001, suggesting
that counterfactual explanations were somewhat more satisfying than descriptions, and descriptions
were slightly more satisfying than causal explanations. People were less satisfied with causal
explanations compared to counterfactual explanations or even none at all.
    Trust Measure. A one-way ANOVA was carried out on the summed judgments for the Trust Scale
to examine group differences in trust levels for the explanations provided. Significant differences
between the groups were identified F(2, 126)=8.184, p<.001, ηp2=.117. Post hoc Tukey HSD tests
showed that the counterfactual group (M=26.15, SD=6.14) gave significantly higher trust judgments
than the causal group (M=20.21, SD=6.27), p<.001, d=.88. The control group (M=23.12, SD=7.63) did
not differ significantly from either the counterfactual (p=.101) or causal groups (p=.115). A reliable
trend was identified when rank-ordering judgments for each item in the order: Counterfactual > Control
> Causal, L(8)=112.0, p<.001. Similar to the satisfaction judgments, these results suggest that
counterfactual explanations were somewhat more trustworthy than descriptions, and descriptions were
slightly more trustworthy than causal explanations. People placed less trust in causal explanations
compared to counterfactual explanations or even none at all .

3. General Discussion, Conclusions and Future Directions
The present study shows that a knowledge representation distinction between abstract feature-types –
continuous versus categorical – is cognitively significant in impacting people’s understanding of
explanations in XAI, a distinction whose significance is not noted in current counterfactual methods.
The experiment showed that users’ accuracy in predicting a system’s decisions improved when they
were provided with explanations compared to none at all, and when they were provided with
counterfactual explanations compared to causal ones; counterfactual explanations were also
subjectively preferred compared to causal ones. The experiment also shows that users’ accuracy in
predicting a system’s decisions improved when they relied on categorical features rather than
continuous features, with improvements over time between the training and test phases of the study. In
the following sub-sections, we discuss the implications of these findings for (i) the significance of
categorical and continuous features in explanations, and (ii) the role of explanations in XAI and the
relative differences between counterfactual and causal explanations (and descriptions).

3.1.    The Primacy of Categorical Over Continuous Features
The results described in this paper indicate that users were more accurate in making predictions based
on categorical features than continuous features within each phase of the experiment. User accuracy
increased in the testing phase relative to the training one, but this rise was mainly due to improvement
in making predictions about categorical features (gender and stomach-fullness), an improvement that
does not occur for continuous features (units, duration, weight). We cannot attribute this effect to the
provision of explanations (the three-way interaction was not significant); instead it is an improvement
that emerges as people gain more experience throughout the training phase with the categorical features.
Current counterfactual methods in XAI do not recognise any functional benefits for categorical features
over continuous ones. These counterfactual methods transform categorical features to allow them to be
processed similarly to continuous ones, using one-hot encoding or by mapping to ordinal feature spaces.
Hence, no current model recognises that one feature-type might be more psychologically beneficial
than another. Remarkably, given the 100+ methods in the counterfactual XAI literature, no current
algorithm gives primacy to categorical features over continuous ones for explanations of the predictions
of an AI system. Many models consider mutability and actionability as being important to the provided
counterfactual explanations but neither of these concepts account for the results found here. Recall, the
results showed improved performance for the gender and stomach-fullness features even though the
former is immutable and non-actionable (in the context of blood alcohol decisions) and the latter is
mutable and actionable. Moreover, the results showed less improved performance for the units,
duration, and weight features, even though they are mutable and actionable (albeit weight in the context
of blood alcohol decisions is immutable in the short term). Hence, the improvement in accuracy for the
gender and stomach-fullness features over the course of the experiment (from training to testing phase)
is more plausibly due to their simplicity (both have just two possible feature values) compared to the
more complex continuous features (which have many possible feature values). There are clear
implications of these results for counterfactual approaches in algorithmic recourse; namely, that it
would be better to focus on categorical features than on continuous ones when the predictive outcomes
are equivalent.

3.2.    What Is It That (Counterfactual) Explanations Do?
The results also have a bearing on cognitive aspects of explanations. There is an increasing recognition
that explanations can play one of several roles in XAI. One major role is to improve the user’s
understanding of the domain, the AI system, or both, manifested by objective performance
improvements in the task domain when explanations are provided. Measuring effects of explanation on
objective performance is the guiding proposal in Hoffman’s et al.’s [16] conceptual framework for XAI
and is a repeated theme in XAI user studies [13,16]. However, a number of studies indicate that
explanations, especially counterfactual explanations, may not improve objective performance on the
task [14,19]. Moreover, many studies show that explanations are more likely to impact subjective
assessments than objective performance; that is, people tend to self-report higher understanding [20] or
judge decisions to be more fair [1] or appropriate [2]. These considerations combined with the findings
of the present study, raise potentially serious ethical concerns about the use of explanations. They
suggest that some explanations may cause people to “feel better” about the AI system, without gaining
any insight into why it made a prediction or how it works. Explanations may lead the recipient to a
somewhat false assessment of the value of the system, akin to the “illusion of explanatory depth”,
wherein people overestimate their understanding of causal mechanisms underlying common
phenomena [26], potentially leading to inappropriate trust in a system and its decisions. The present
results help to clarify the role that explanations may take. Overall, the counterfactual group were more
accurate than the control group, and the causal group’s accuracy lay in-between the other groups. This
observation suggests that counterfactuals help people reason about the causal importance of the features
used in the system’s decisions more effectively than mere descriptions of an outcome, and slightly better
than causal explanations. Moreover, counterfactual explanations improved people’s accuracy in both
phases, without depending on transfer or learning from the training to the testing phase (i.e., there was
an effect of Explanation, but this factor did not interact with any other factor). This conclusion is highly
consistent with key findings in the psychological literature that counterfactuals elicit causal reasoning
and enable people to understand causal relations [24,32]. Indeed, these findings also support proposals
for the use of counterfactuals in algorithmic recourse [3,8], as they seem to better prompt an
understanding of the predictions made by the system.

3.3.    Future Directions
The present work emphasises how AI needs to consider the cognitive aspects of knowledge
representation; it shows that a cognitively-blind AI will miss functional aspects of proposed algorithms
that have major cognitive effects. Several issues need to be addressed further in future work. First, the
categorical features examined here were limited to binary values. Although these kinds of features
commonly occur in many datasets (such as gender, ethnicity, or Boolean true/false features), categorical
features can, in theory, have as many potential values as continuous ones. Hence, it is necessary to
establish whether there is a limit to the number of categorical values that humans can keep track of
without compromising accuracy (that is, before the categorical features become as challenging as
continuous features). The differences in accuracy observed between the different types of features
suggests that users may be able to monitor up to at least four categories (given that accuracy for units
was higher than that for weight and duration), but further investigation is needed to test this hypothesis.
A further question is whether people find categorical features easier to reason about because of feature-
value diversity or some other property of categorical features. Overall, the findings motivate a more
psychologically-grounded approach to counterfactuals in XAI, to design methods that reflect the
demonstrated cognitive benefits of categorical features, based on experimentally corroborated
hypotheses rather than on untested conjectures.

4. Acknowledgements
This paper resulted from research funded by (i) the UCD Foundation and (ii) Science Foundation
Ireland (SFI) to the Insight Centre for Data Analytics (12/RC/2289-P2).

5. References
[1] J. Dodge, Q. Vera Liao, Y. Zhang, R. Bellamy, C. Dugan, Explaining models: An empirical
     study of how explanations impact fairness judgment, in: Int. Conf. Intell. User Interfaces,
     Proc. IUI, volume Part F1476, 2019, p. 275–85. doi:10.1145/3301275.3302310.
[2] R. Binns, M. Kleek, M. Veale, U. Lyngs, J. Zhao, N. Shadbolt, It’s reducing a human being
     to a percentage, Proc (2018) 1–14. doi:10.1145/3173574.3173951.
[3] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the
     black box: Automated decisions and the GDPR, Harv J Law Technol 31 (2018).
[4] A. H. Karimi, B. Schölkopf, I. Valera, Algorithmic recourse: From counterfactual expla-
     nations to interventions, in: FAccT 2021 - Proc 2021 ACM Conf Fairness, Accountability,
     Transpar, 2021.
[5] T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artif
     Intell 267 (2019) 1–38. doi:10.1016/j.artint.2018.07.007.
[6] R. M. J. Byrne, Counterfactuals in explainable artificial intelligence (xai): Evidence from
     human reasoning, IJCAI Int. Jt. Conf. Artif. Intell (2019) 6276–82. doi:10.24963/ijcai.
     2019/876.
[7] M. T. Keane, E. M. Kenny, E. Delaney, B. Smyth, If only we had better counterfactual
     explanations: Five key deficits to rectify in the evaluation of counterfactual xai techniques, in:
     IJCAI-21, 2021.
[8] A. H. Karimi, G. Barthe, B. Schölkopf, I. Valera, A survey of algorithmic recourse: con-
     trastive explanations and consequential recommendations, ACM Comput Surv 1 (2021)
     1–26.
[9] R. Mothilal, A. Sharma, C. Tan, Explaining machine learning classifiers through diverse
     counterfactual explanations, FAT* 2020 (2020) 607–17. doi:10.1145/3351095.3372850.
[10] A. H. Karimi, G. Barthe, B. Balle, I. Valera, Model-agnostic counterfactual explanations for
     consequential decisions, Proc 108 (2020).
[11] E. M. Kenny, M. T. Keane. On Generating Plausible Counterfactual and Semi-Factual
     Explanations for Deep Learning, AAAI-21, 11575–11585.
[12] M. T. Keane, S. B. Counterfactuals, W. Find Them, Int, Conf. Case-Based Reason., Springer,
     Cham, 2020.
[13] F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning, 2017.
[14] J. der Waa, E. Nieuwburg, A. Cremers, N. M. XAI, A comparison of rule-based and example-
     based explanations, Artif Intell 291 (2021). doi:10.1016/j.artint.2020.103404.
[15] F. C. Keil, Explanation and understanding, Annu Rev Psychol 57 (2006) 227–54. doi:10.
     1146/annurev.psych.57.102904.190100.
[16] R. R. Hoffman, S. T. Mueller, G. Klein, J. Litman, Metrics for explainable ai: Challenges and
     prospects, 2018.
[17] B. Y. Lim, A. K. Dey, D. Avrahami, Why and why not explanations improve the intelligibility
     of context-aware intelligent systems, Conf. Hum. Factors Comput. Syst. - Proc (2009)
     2119–28. doi:10.1145/1518701.1519023.
[18] Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, S. Lee, Counterfactual visual explanations,
     Int. Conf. Mach. Learn (2019) 2376–84.
[19] I. Lage, E. Chen, J. He, M. Narayanan, B. Kim, S. Gershman, Human evaluation of models
     built for interpretability, Proc. AAAI Conf. Hum. Comput. Crowdsourcing (2019) 59–67.
[20] A. Lucic, H. Haned, M. Rijke, Why does my model fail? Contrastive local explanations for
     retail                forecasting,              in:               FAT*2020,                 2020
     pp. 90–8. doi:10.1145/3351095.3372824.
[21] R. M. J. Byrne, Counterfactual thought, Annu Rev Psychol 67 (2016) 135–57. doi:10.1146/
     annurev-psych-122414-033249.
[22] R. M. J. Byrne, The Rational Imagination, MIT Press, Cambridge, MA, 2005.
[23] D. R. Mandel, D. R. Lehman, Counterfactual thinking and ascriptions of cause and pre-
     ventability, J Pers Soc Psychol 71 (1996) 450–63. doi:10.1037/0022-3514.71.3.450.
[24] N. J. Roese, K. Epstude, The functional theory of counterfactual thinking: New evidence, new
     challenges, new insights. In Advances in experimental social psychology (2017) 56, pp. 1-79. AP.
[25] K. D. Markman, M. N. McMullen, R. A. Elizaga, Counterfactual thinking, persistence, and
     performance: A test of the reflection and evaluation model. Journal of Experimental Social
     Psychology (2008), 44(2), 421-428.
[26] L. Rozenblit, F. C. Keil, The misunderstood limits of folk science: An illusion of explanatory
     depth, Cogn Sci 26 (2002) 521–62. doi:10.1016/S0364-0213(02)00078-2.
[27] D. Kahneman, D. T. Miller, Norm theory: Comparing reality to its alternatives, Psychol
     Rev 93 (1986) 136–53. doi:10.1037/0033-295X.93.2.136.
[28] L. Kirfel, Liefgreen. A. What if (and how...)? Actionability shapes people’s perceptions of
     counterfactual explanations in automated decision-making, in: ICML (International Conf, Learn.
     Work. Algorithmic Recourse, Mach, 2021.
[29] V. Girotto, D. Ferrante, S. Pighin, M. Gonzalez, Postdecisional counterfactual thinking by actors
     and readers. Psychological Science 18(6) (2007) 510-515.
[30] S. Pighin, R. M. Byrne, D. Ferrante, M. Gonzalez, V. Girotto, Counterfactual thoughts about
     experienced, observed, and narrated events. Thinking & Re. 17(2) (2011). 197-211.
[31] D. Kahneman, A. Tversky, The simulation heuristic, in: D. Kahneman, P. Slovic, A. Tversky
     (Eds.), Judgment Under Uncertainty: Heuristics and Biases, CUP, New York, 1982, pp. 201–8.
[32] Spellman, B. A., & Mandel, D. R. (1999). When possibility informs reality: Counterfactual
     thinking as a cue to causality. Current Directions in Psychological Science, 8(4), 120-123.
[33] J. Y. Halpern, J. Pearl, Causes and explanations: A structural-model approach, Part I:
     Causes. Br J Sci 56 (2005) 843–87. doi:10.1093/bjps/axi147.
[34] B. G. Buchanan, E. H. Shortliffe, Rule-based expert systems: the MYCIN experiments of
     the Stanford Heuristic Programming Project, CUMINCAD, 1984.
[35] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, B. Baesens, An empirical evaluation
     of the comprehensibility of decision table, tree and rule based predictive models, DecisSupport
     Syst 51 (2011) 141–54. doi:10.1016/j.dss.2010.12.003.
[36] A. McEleney, R. M. J. Byrne, Spontaneous counterfactual thoughts and causal explanations,
     Think Reason 12 (2006) 235–55. doi:10.1080/13546780500317897.
[37] D. A. Lagnado, T. Gerstenberg, R. I. Zultan, Causal responsibility and counterfactuals. Cognitive
     science 37(6) (2013) 1036-1073.
[38] C. R. Walsh, R. M. Byrne, How people think “if only…” about reasons for actions. Thinking &
     Reasoning, 13(4), (2007) 461-483.
[39] R. M. J. Byrne, A. Tasso, Deductive reasoning with factual, possible, and counterfactual
     conditionals, Mem Cogn 27 (1999) 726–40. doi:10.3758/BF03211565.
[40] E. M. P. Widmark, Die theoretischen Grundlagen und die praktische Verwendbarkeit der
     gerichtlich-medizinischen Alkoholbestimmung, Urban Schwarzenberg, Berlin, 1932.
[41] F. Franz, E. Erdfelder, A. Buchner, A. Lang, Statistical power analyses using G* Power 3.1: Tests
     for correlation and regression analyses, Behavior research methods 41 (4) (2009) 1149-1160.