<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Center for Cognitive Interaction Technology (CITEC), Inspiration</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Influence of prior and task-generated emotions on XAI explanation retention and understanding</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Birte Richter</string-name>
          <email>birte.richter@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Schütze</string-name>
          <email>christian.schuetze@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Aksonova</string-name>
          <email>anna.aksonova@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julian Leichert</string-name>
          <email>jleichert@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Britta Wrede</string-name>
          <email>bwrede3@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Medical Assistance Systems, Medical School OWL, Bielefeld University Morgenbreede 2</institution>
          ,
          <addr-line>33615 Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <volume>1</volume>
      <issue>33619</issue>
      <abstract>
        <p>The explanation of an AI decision and how they are received by users is an increasingly active research field. However, there is a surprising lack of knowledge about how social factors such as emotions afect the process of explanation by a decision support system (DSS). While previous research has shown the efects of emotions on DSS supported decision-making, it remains unknown in how far emotions afect cognitive processing during an explanation. In this study, we, therefore, investigated the influence of prior emotions and task-related arousal on the retention and understanding of explained feature relevance. To investigate the influence of prior emotions, we induced happiness and fear before the interaction with the explainable decision support system. Before emotion induction, user characteristics to assess their risk type were collected via a questionnaire. To identify emotional reactions to the explanations of the relevance of diferent features, we observed heart rate variability (HRV) and facial expressions of the explainee while they were observing and listening to the explanation and assessed their retention of the features as well as their understanding of the influence of each feature on the outcome of the decision task. Results indicate that (1) task-unrelated prior emotions do not afect the retention or the understanding of the relevance of certain features when no further arousing events occur, (2) certain feature explanations related to personal attitudes yielded arousal in individual participants, and (3) this arousal afected the understanding of these variables. More specifically, when participants perceived an error in the system's explanation the task-unrelated emotion “Fear” which was associated with higher reported levels of arousal lead to significantly less understanding than in the “Happy” condition. In other words, task-unrelated emotions alone did not afect retention or understanding. However, when task-generated emotions occur understanding can be afected. This may be due to a too high level of arousal.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Social XAI</kwd>
        <kwd>Co-Construction</kwd>
        <kwd>HAI</kwd>
        <kwd>understanding</kwd>
        <kwd>emotions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Related Work</title>
      <p>As artificial intelligence (AI) systems increasingly support human decision-making across diverse
domains—from healthcare to finance and beyond there is a growing demand for these systems to be not
only accurate but also explainable. Explainable AI (XAI) aims to make machine-generated decisions
transparent and interpretable, allowing users to understand and potentially trust the reasoning behind
automated recommendations. A key goal of XAI research is to ensure that users can recall and understand
the explanations provided by decision support systems (DSSs), particularly when those explanations
concern the relevance of specific input features.</p>
      <p>
        While early work has focused on providing mathematical explanations for AI researchers themselves,
lay users as well as domain experts (i.e., medical experts) have been identified as an important target
group. This shift in focus has led to a re-evaluation of existing research, identifying the need for more
interactive approaches [1, 2]. However, the underlying assumption in this research has mostly been
that interaction takes place with a rational decision maker who follows purely logical considerations.
Thus, support has been intended to (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) provide the human decision maker with relevant information
about certain features, and (2) to avoid cognitive biases such as confirmation bias [ 3, 4]. Yet, it is well
known that human decision-making is heavily influenced by emotions [5].
      </p>
      <p>More recently, the influence of emotions on the outcome of AI explanations has come into the focus
of research. For example, it was shown that task-unrelated prior emotions afected the advice-taking
behavior of participants given diferent explanation strategies [ 6]. More specifically, participants with
high arousal were more likely to follow AI advice with a (guided) explanation than participants with
low arousal. The latter preferred AI advice without any explanation.</p>
      <p>On the other hand, explanations can induce afective reactions. Explanations given by an AI for an
easy task were shown to yield a negative efect in explainees, whereas explanations in a dificult task
caused positive afect. But also, the quality of advice can afect the emotional state of an explainee. In a
vignette study, it was found that wrong advice would lead to negative feelings, whereas correct advice
would lead to positive feelings [7].</p>
      <p>Interestingly, an investigation on the efectiveness of an XAI intervention showed that an explicit
nudging strategy was successful in debiasing emotions [8]. However, it was not strong enough to debias
decision-making.</p>
      <p>[9] analyzed facial expressions during explanations and found that certain eyebrow movements were
correlated with later user behavior. This indicates that certain cognitive or emotional reactions during
the explanations influence the processing and possibly understanding of the explanations, afecting the
ifnal decision behavior in specific ways.</p>
      <p>However, clear guidelines regarding how emotions afect the understanding in a decision-making
situation and how an explaining system should take the explainee’s emotional state into account are
still missing. This work contributes to the field by addressing this question. In this paper, we present
ifndings from an interaction study with a Decision Support System (DSS) and the efect of task-related
and task-unrelated emotions on the retention and understanding of the AI explanation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Questions</title>
      <p>It is important to note that emotions can arise as contextual factors, i.e., from unrelated tasks, or from
the interaction with the system concerning the task at hand. While both emotions are likely to have
influence on the human’s processing, it is important to separate the efects of prior task-unrelated
emotions from those that are closely related with the task at hand.</p>
      <p>In this research, we therefore investigate three research questions:
• RQ1a: Do task-unrelated emotions influence the retention of explained feature relevance?
• RQ1b: Do task-unrelated emotions influence the understanding of explained feature relevance?
• RQ2: Which features trigger emotional reactions during explanation? (task-related emotions)
• RQ3a: Do task-related emotions influence retention of the explanation features?
• RQ3b: Do task-related emotions influence understanding of the explanation features?</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.1. Measurements</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Independent Variables</title>
          <p>To investigate the influence of the diferent emotions on retention and understanding of explanations,
we conducted an interaction study with a decision support system.</p>
          <p>
            Two types of emotional influences were considered: (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) prior task-unrelated emotions and (2) task-related
emotions, defined as emotional responses elicited directly by the explanation itself.
          </p>
          <p>Prior task-unrelated emotions. The prior task-unrelated emotions are induced before the
explanation and unrelated to its content. We chose a between-subjects design, with participants assigned to
either a fear or a happiness condition as an induced emotion. The emotion induction itself is described
in section 3.4.2.</p>
          <p>Task-related emotions. We measure the emotional reactions by using the EmoNet [10] arousal
data during the feature presentation and explanation. Emotional reactions are defined as an anomaly in
the arousal state, which are calculated using the z-score [11]. Based on the arousal value , an anomaly
(emotional reaction) in the arousal state is detected by
in combination with the rolling z-score with k = 2.5 and a window of 500 ms.</p>
          <p>=
 −</p>
          <p>
            =
{︃1, if  &gt; 
0, else
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(2)
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Dependent Variables</title>
          <p>Retention. Retention was measured based on the participant’s verbal recall of the features they
remembered as relevant to the AI’s decision, as conveyed through the explanation. Participants were
explicitly asked to provide verbal input, allowing them to articulate their reasoning processes. The
spoken responses were automatically transcribed using Whisper, a deep neural network-based automatic
speech recognition system [12]. The recognized words were then manually mapped to the ten predefined
variable names, accounting for minor inaccuracies in naming.</p>
          <p>Understanding. To assess the extent to which participants understood the meaning of the variables,
they were asked—via a graphical user interface (see Fig. 4)—to indicate the contribution of each variable
to the overall decision of the decision support system (DSS). Specifically, participants were instructed to
indicate whether a given variable contributed to a higher or lower risk estimate. This was implemented
by allowing users to move each variable to the left (indicating lower risk), to the right (indicating higher
risk), or down (indicating a lack of memory). We interpreted this recalled information as a cue for
(retained) understanding of the explanation.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Control Variables and Additional Measurements</title>
          <p>In addition to the primary measures, several supplementary variables were recorded for exploratory
and control purposes:
• Gender: Participant gender (male/female/diverse/not specified) was recorded.
• Emotional self-assessment:
– STAI: State-Trait Anxiety Inventory (STAI), providing a measure of participants’ baseline
anxiety disposition.
– mDES: [13] modified Diferential Emotions Scale,
– SAM [14]: Self-assessment manikin, providing a measure of participants’ self-rated arousal
and valence
• Emotional observations:
– Heart Rate Variability (HRV), measured via the Polar H10 Sensor1
– EmoNet [10] Results: In addition to arousal scores, full output vectors from the EmoNet
model were stored for each facial frame, enabling detailed emotional state tracking over
time.
• System Events: System-level events (e.g., start of an explanation) were logged for quality control
and alignment of multimodal data streams.
• Videos: All participant sessions were video recorded for potential qualitative analysis and
crossvalidation of facial expression data.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Decision Suport System</title>
        <p>In this study, the same decision task and advice scheme were utilized as in a prior study [15] but now
implemented as a live study by using RISE [16] to coordinate the process. The system centers around
an embodied conversational agent named Flobi [17], who provides a personalized assessment of an
individual’s risk profile based on a set of predefined input features. This risk assessment is subsequently
used in the context of a Holt and Laury lottery task, where participants make incentivized decisions
under risk [18]. The agent’s evaluation serves as a form of decision support, ofering guidance while
allowing participants to ultimately make their own choices. An example interaction is depicted in figure
1.
1https://www.polar.com/de/sensors/h10-heart-rate-sensor</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Participants</title>
        <p>A total of 49 participants took part in the study. Six subjects had to be excluded from the evaluation
due to missing data. Of the others, 20 were randomly assigned to the fear condition and 23 to the happy
condition. The sample included 21 male and 22 female participants.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Experimental Setting — Procedure</title>
        <p>The experiment consisted of six phases (cf. Fig. 3.4. We recorded the users’ heart rate variability, video
and audio, and their facial expressions as computed by EmoNet. In the first phase, a questionnaire was
administered that requested data from the user to assess the user’s risk type. The risk type classification
was achieved by a linear scoring scheme integrating the user’s numerical answers, yielding a risk type
value between 0 and 11. After an emotion induction sequence, the actual risk task was explained to
the user, and the user could provide his/her risk selection, i.e., selecting a high or low risk on a scale
from 0 to 9. After the user’s first risk decision, the system would present its risk suggestion to the user,
based on the evaluation of the user’s risk type. More specifically, a user yielding a high value for a
high-risk propensity would receive a suggestion of a high-risk choice and vice versa, also on a scale
from 0 to 11. This suggestion was followed by an explanation of all eleven variables and their relevance
to the estimated risk type of the user. For example, being female was an indicator towards less risk
propensity, whereas being male was an indicator for a high-risk propensity. We analyzed the HRV
and facial expressions during these episodes to detect arousal. After this explanation, the user could
revise his/her decision. In the last phase, the users were asked to (verbally) name the features that they
remembered from the explanation. After this, they were presented the features one after another and
were asked to determine whether a certain value of this feature was an indicator for higher or lower
risk propensity. We used this information as a proxy for understanding.</p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Questionnaire</title>
          <p>During the first phase of the interaction, the participants were asked to fill out an online questionnaire.A
total of eleven questions were asked. We refer to these questions as the “variables” or “features” that
the system uses to compute and explain the risk type of the participant.</p>
          <p>Based on results from empirical studies reported in the literature for each variable, a scoring scheme
was developed that assigned a score for each answer indicating higher or lower risk propensity. Overall,
this resulted in a linear scoring scheme – the more scores, the higher the risk propensity. Scores were
assigned for each feature based on a median value reported in the literature, which a higher (+1) or lower
(0) risk score was given [19]. This resulted in a risk type value between 0 and 9 for each participant.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Emotion Induction</title>
          <p>To investigate the efect of prior non-task-related emotions on retention and understanding, we induced
emotions via a biographic event recall similar to the one used in [6]. The participants were randomly
assigned to one of the two emotions: fear and happiness. For the induction, they were asked to
remember an actual event where they experienced fear (or happiness). After a relaxation phase to
reduce unwanted existing emotions, they were given 5 minutes time to mentally replay the situation to
induce the emotion.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.4.3. Risk decision: First choice</title>
          <p>Directly after the emotion induction, the participants were explained the risk task and asked to provide
their first decision. This first decision is important information to determine if the DSS’ suggestion and
explanation have an efect on the participant’s (second) decision. That is, if the participant changes her
selection in the direction of the system’s suggestion, this is an indicator of advice taking.</p>
        </sec>
        <sec id="sec-3-4-4">
          <title>3.4.4. XAI decision and explanation</title>
          <p>In the explanation phase (phase 4, see above), Flobi would explain for each variable whether the
explainee’s (self-rated) value (e.g., of her political orientation) was an indicator adding to a higher or
lower probability of the explainee being more risk-friendly. In this case, Flobi would say: “Because your
political orientation is rather left, you are presumably less risk-friendly.” This would be repeated for all
11 variables, revealing one after the other. Fig. 3.4.4 shows the GUI, where all variables are depicted
with their respective contributions to the AI system’s decision.</p>
          <p>Subsequently the user was asked to take another shot at a decision in the lottery. They were free to
retain their first decision or to change it in any direction.</p>
        </sec>
        <sec id="sec-3-4-5">
          <title>3.4.5. Assessment of the user’s retention and understanding of the explanations and the task</title>
          <p>In the last phase, the users’ retention and understanding were assessed using the procedures described
in subsection 3.1.2. See Fig. 4 for the GUI to sort the features according to the remembered influence it
had on the risk type classification.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Emotion Manipulation Check</title>
        <p>To examine diferences in afective responses between induction conditions, we compared SAM valence
and arousal scores across groups (fear vs. happy—cf. Fig. 5). On the valence scale, which ranges from 1
(“unpleasant feeling”) to 5 (“pleasant feeling”), participants in the happy condition reported no higher
valence ratings ( = 2), compared to the fear group (  = 2,  = 252,  = .647). However,
participants in the fear group have a significantly higher SAM arousal score (   = 2,  =
1;  = 306,  = .028). Similarly, on the arousal scale (1 = “calm”, 5 = “aroused”), the fear group tended
to report higher arousal levels than the happy group, suggesting that the fear induction elicited more
physiologically activating emotional responses. These trends align with theoretical expectations – fear
typically evokes high arousal and negative valence, while happiness is associated with positive valence
and lower arousal.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Influence of task-unrelated prior emotion on retention</title>
        <p>To answer the RQ1a (Do task-unrelated emotions influence the retention of explained feature
relevance?), the mean recall of the verbally and visually explained features was measured for each condition
by analyzing the verbal answer to the question of which features the user remembered. Note that this
was an open question, requiring the users to actively retrieve and formulate the name of the features
while ignoring the meaning it had on the outcome. Since we recorded the participant’s voice during the
whole experiment, we were able to capture their comments also during the explanation of the features.
As noted above, due to a programming error, one feature (“Einstellung bzgl. Zukunft” - attitude towards
future) was reported wrongly for the majority of participants. In these cases, some participants would
comment on the mistake spontaneously through a verbal utterance. However, some participants also
commented on features that were communicated correctly. For the participants, there was no diference
between these cases – they experienced both cases as a mistake from the system. As these comments
indicated an epistemic reaction (surprise), we also investigated their efect on retention.</p>
        <p>Figure 6 shows the retention for each task-unrelated induced emotion (left). The right side shows
the average verbal labeling of features that participants explicitly identified as incorrect. A one-way</p>
        <p>
          ANOVA was conducted to compare the mean retention between the two induced prior emotions (fear
and happiness). The analysis revealed no significant diference, ( (
          <xref ref-type="bibr" rid="ref1">1, 37</xref>
          ) = 0.104,  = .749). Thus,
the retention of features is not influenced by a prior emotion.
        </p>
        <p>
          However, those feature explanations that were perceived as erroneous by some participants may
have yielded better retention as they attracted attention. To investigate if the induced emotion had
an efect on the perceived incorrect features, a one-way ANOVA was conducted to examine the efect
of the induction condition on the number of recalls of the perceived incorrect features. The analysis
revealed no significant main efect of induction,  (
          <xref ref-type="bibr" rid="ref1">1, 38</xref>
          ) = 0.98,  = .330. However, as can be seen
in Fig. 6 participants in the condition “Happy” – who reported significantly lower levels of arousal –
remembered almost twice as many features they had commented on as wrong as those in the “Fear”
condition. This might indicate that arousal due to an explanation perceived as “wrong” in addition
to an initially high level of arousal may (in some cases) lead to a too high level of arousal causing a
decrease in retention.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Influence of task-unrelated prior emotion on understanding</title>
        <p>Addressing RQ1b (Do task-unrelated emotions influence the understanding of explained feature
relevance?), the explainees were asked to indicate for each variable which influence it had in their own
case on their risk propensity as estimated by the AI system (see Fig. 7 on the left).</p>
        <p>An ANOVA was conducted to examine the efects of task-unrelated emotions ( emotion induction)
and explained features and their interactions with the outcome variable understanding.</p>
        <p>
          There was a significant main efect of the explained feature,  (9, 410) = 7.71,  &lt; .001, suggesting
that the feature categories difered significantly in their association with the outcome understanding.
The main efect of emotion induction was not significant,  (
          <xref ref-type="bibr" rid="ref1">1, 410</xref>
          ) = 1, 88,  = .171. Also, the
interaction efect induction emotion × explained feature interaction was not statistically significant,
 (9, 410) = 0.92,  = .507.
        </p>
        <p>
          To assess whether task-unrelated emotions had an efect on understanding (see Fig. 7 on the right) in
those cases where participants perceived an error of the system, we carried out an ANOVA. There was
a significant main efect of the ( emotion induction) on the outcome variable understanding,  (
          <xref ref-type="bibr" rid="ref1">1, 158</xref>
          ) =
4.30,  = .040. Thus, participants in the fear condition reported a higher level of arousal. It can be
assumed that the perception of an error increased the arousal to a too high level, which impaired
understanding.
        </p>
        <p>For a more qualitative analysis, we visualized the matching (“match”) vs. non-matching (“miss”)
answers of the participants with the explanation they had received in the previous phase from Flobi
(see Fig. 7). Interestingly, we see that the feature that was explained wrongly to most of the participants
(“Einstellung bzgl. Zukunft”) was remembered diferently by the participants with diferent induced
emotions. While 80% of the participants of the fear condition agreed with Flobi’s (wrong!) explanation,
only 57% of the participants of the happy condition did so. This is somewhat surprising, as one would
expect that fear is generally associated with suspicion, leading to scrutinizing ofered information and
suggestions. Yet, in this case, it appears that participants in the happy condition remembered their own
decision better.</p>
        <p>
          A diferent interpretation might be that there are two efects of emotion on understanding: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) users
do not remember their answer to these specific questions about an uncertain future correctly, as they
were given in a neutral state. Rather, they “remember” their current emotionally tainted attitude towards
the future: in the condition of fear, this would be negative; in the condition of happiness, this would be
positive. Indeed, in the ATF it is being argued that happiness gives high attribution to certainty and fear
towards low certainty. (2) Users judge the impact of these variables according to their current emotion:
users in the fear condition believe that being uncertain about the future reduces the risk propensity
(although research shows a diferent relationship: high uncertainty about the future increases risk
propensity, low uncertainty decreases it), whereas users in the happy condition believe that being
uncertain about the future increases risk propensity. This might be seen as some kind of confirmation
bias or transfer, as this estimated influence on risk propensity corresponds to their own risk propensity
at that time: being happy (rather than being certain about the future) increases risk propensity, whereas
being fearful (rather than being uncertain about the future) decreases risk propensity.
        </p>
        <p>Although it is not certain what exactly causes these diferent judgment results, it is clear that the
emotional state can afect how the influence of certain variables on the outcome is remembered or
judged. Yet, it remains unclear how such an efect could be detected in interaction with an intelligent,
explainable AI system.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Efect of explanations on arousal</title>
        <p>
          To determine the efect of explanations on arousal, we defined the emotional reactions – or arousal
– by using the Emonet arousal data during the feature presentation in combination with the rolling
z-score with k = 2.5 and a window of 500 ms (cf. equations (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and (2)). In this way, we determined the
peaks of arousal that stood out from the preceding arousal values, indicating emotional reactions. Fig. 8
shows the arousal values as computed by EmoNet as a red line, plotted over time, for participant 25.
The feature names (rotated vertically) denote the beginning of the verbal (and visual) explanation of the
corresponding feature. For example, the explanation of the feature “Geschlecht” (“gender”) starts at the
second 0.
        </p>
        <p>Vertical black lines indicate a positive z-score and therefore an emotional reaction. For example, the
black line at second 4 indicates an arousal bout right at the beginning of the feature explanation for
“Politische Einstellug” (“political attitude”). The yellow lines indicate a rapid downfall of the measured
arousal.</p>
        <p>If a feature explanation segment contained one (or more) bouts of arousal – as indicated by the black
lines – it was counted as a feature explanation causing an emotional reaction (or arousal).</p>
        <p>Fig. 9 shows the proportion of participants per feature for whom arousal was measured. As can
be seen, there are diferences in frequency of arousal for the diferent features. The features “current
life satisfaction” (“Lebenszufriedenheit gegenwaertig”) and “even temper of the last 4 weeks”
(“Ausgeglichenheit...”) yielded the most frequent bouts of arousal with almost 60% of the participants, whereas
“risk propensity job” (“Risikobereitschaft Beruf”), and “attitude towards the future” (“Einstellung bezgl.
Zukunft”) yielded least frequent arousal, with about 30%. This indicates that features difer in their
potential to evoke arousal. What the underlying reasons for the arousal are remains unclear, so far.
But there may be intrinsic (e.g., the personal relevance of this feature for each participant) as well as
extrinsic (e.g., recency efect, length of word / ease of word) reasons.</p>
        <p>In addition to these diferences, we also see diferent frequencies of arousal between participants in
the fear and the happy groups. Most striking is the diference in the feature ‘current life satisfaction”
(“Lebenszufriedenheit gegenwaertig”) which raised arousal in about 30% of the participants in the Fear
group compared to about 60% in the Happy group.</p>
        <p>Thus, overall, we see that explanations can induce arousal. However, so far, no clear pattern as to
what factors actually cause the arousal is recognizable.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Efect of explanation-induced arousal on retention</title>
        <p>Figure 10 visualizes the mean retention as a function of the number of emotional reactions that a feature
explanation evoked. The size of the bullet visualizes the number of occurrences. Here, the explanations
of all 10 features for all 43 participants (10 x 43 = 430) have been considered. For example, in the left
ifgure, the largest dot a    = 0 indicates that the mean retention for those
200 (given by the size of the dot) feature explanations that yielded 0 emotional reactions (i.e., bouts of
arousal) was 30%. On the right side, the graph is split into the reactions of the participants from the fear
and the happy conditions.</p>
        <p>To examine whether emotional reactions were associated with participants’ retention of individual
features, we fitted a generalized linear mixed model (GLMM) with a binomial distribution and logit
link to predict the binary outcome of retention. The fixed efects included emotional reactions (i.e.,
arousal) and the explained feature, while a random intercept was included for participant ID (N = 430
observations, 43 participants). The binary dependent variable indicated whether a feature was verbally
recalled (retention = 1) or not (retention = 0). Model estimation was performed using the glmer()
function from the lme4 package in R.</p>
        <p>The model fit was acceptable,  = 396.1,  = 443.6,  − ℎ = − 186.0. The random
intercept variance was 0.79( = 0.89), indicating variability between participants (43 groups, 430
observations).</p>
        <p>There was a marginal trend for the predictor emotional reactions suggesting that a high emotional
reaction reduced the likelihood of recall,  = − 0.52,  = 0.31,  = − 1.70,  = .090.</p>
        <p>Among the fixed efects, the following explained feature variables were significant predictors and
more likely to be recalled verbally:
• Gender:  = 3.38,  = 0.83,  = 4.09,  &lt; .001
• Current health status:  = 3.22,  = 0.82,  = 3.91,  &lt; .001
• Political orientation:  = 3.07,  = 0.82,  = 3.73,  &lt; .001
• Concerns about the economic situation: = 2.48,  = 0.83,  = 3.00,  = .003
This means that the features of gender, current health status, political orientation, and concerns about
the economic situation are predictors for retention. Other feature labels were not significant predictors
(all  &gt; .10). This is an interesting result, as gender was the feature with the most frequent bouts of
arousal, whereas current health status yielded the least frequent bouts of arousal, which indicates that
arousal alone may not be a good predictor of retention.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Efect of explanation-induced arousal on understanding</title>
        <p>While we were unable to show a significant efect of arousal on retention, arousal might afect
understanding. We applied the same approach as for retention and fitted a generalized linear mixed model
(GLMM) with a binomial distribution and logit link to predict the binary outcome understanding. The
ifxed efects included emotional reactions, emotion induction, and the explained feature, while ID was
modeled as a random intercept to account for individual variability. Model estimation was performed
using the glmer() function from the lme4 package in R.</p>
        <p>The emotional reaction is significantly negatively associated with the outcome (  = − 0.31,  =
0.14,  = ˘2 .30,  = .022), suggesting that higher levels of emotional reactions were associated with
lower understanding. Thus, too much arousal or too many bouts of arousal hinder understanding.</p>
        <p>Among the feature labels, political orientation ( = 1.76,  = 0.63,  = 2.78,  = .005), risk-taking
in trusting strangers ( = 1.16,  = 0.54,  = 2.12,  = .034), and concerns about one’s economic
situation ( = ˘1 .87,  = 0.50,  = ˘3 .76,  &lt; .001) were significant predictors. Gender showed a
marginal trend ( = 0.93,  = 0.52,  = 1.77,  = .077). Other feature labels were not significant
predictors (all  &gt; .10).</p>
        <p>The random intercept for ID had a variance of 0.36 (SD = 0.60), indicating some variability in baseline
response tendencies across participants. Thus, we see individual efects on the capability to understand
the meaning of the efect of a feature on the outcome of the DSS.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>In the following, we will discuss our results regarding our initial research questions.
RQ1a: Do task-unrelated emotions influence the retention of explained feature relevance?
RQ3a: Do task-related emotions influence retention of the explanation features?</p>
      <p>Our results indicate that neither task-unrelated prior emotions nor task-generated emotional reactions
(or arousal) were significant predictors of retention. That is, although there is evidence that a certain
amount of arousal in general improves retention of information, this was not the case in the current
study. There are many possible explanations for this. One explanation might be that the complex
explanation situation was novel and required a high cognitive load due to the new way of explaining
the relevance of features. Another explanation might be that the feature variables themselves may not
have been understood by the participants. Some variable names are very long and might be dificult to
remember, so that participants were unable to map certain variable or feature names to the questions
they had been asked in the initial questionnaire. This would require an additional explanation layer
that allows the participant to ask for an explanation concerning the diferent (or globally most relevant)
features.</p>
      <p>RQ1b: Do task-unrelated emotions influence the understanding of explained feature relevance?
We found a main efect of the explained feature on understanding but no efect that task-unrelated
emotions influence understanding.</p>
      <p>We found an efect of task-unrelated emotions on understanding in those cases where participants
reported an “error” of the system. In these cases, participants in the “Fear” condition, where a higher level
of arousal was reported, showed significantly less understanding than those in the “Happy” condition.
This indicates that task-unrelated and task-generated emotions may work together in the sense of
increasing the level of arousal to such a degree that understanding is afected.</p>
      <p>RQ2: Which features trigger emotional reactions during explanation? (task-related emotions)
Our results indicate that certain individual characteristics – such as current life satisfaction and
even temper of the last 4 weeks– are significant predictors for the recall of explained features. Further
research needs to investigate what characteristics render features so salient that they are remembered
better than others.</p>
      <p>RQ3b: Do task-related emotions influence understanding of the explanation features?
Most interestingly, we found that the emotional reaction, i.e., arousal, is significantly negatively
associated with understanding, suggesting that higher levels of positive emotional reactions were
associated with lower understanding. This is in accordance with other findings that indicate that too
much arousal (or too many bouts of arousal, in our case) hinders understanding. Thus, it is an important
goal to find the right amount of arousal to foster understanding in a DSS scenario.</p>
      <p>Taken together, our results indicate that task-generated emotions (as induced by a perceived error
of the system or possibly other causes) can afect understanding in cases where the baseline arousal
is already high. Our findings to RQ3b showed that the decrease in understanding is indeed related
to a higher degree of measured arousal. Thus, to address the efects of emotions or arousal on the
understanding of explanations XAI explaining systems need to be sensitive to the history of interaction
as prior states or events may hinder understanding. These kinds of efects cannot be found when
looking only locally at certain events.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation):
TRR 318/1 2021-438445824 “Constructing Explainability”.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used LanguageTools in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
framework for the social design of ai systems, IEEE Transactions on Cognitive and Developmental
Systems 13 (2020) 717–728.
[2] U. Schmid, B. Wrede, What is missing in xai so far? An interdisciplinary perspective, KI-Künstliche</p>
      <p>Intelligenz 36 (2022) 303–315.
[3] D. Wang, Q. Yang, A. Abdul, B. Y. Lim, Designing theory-driven user-centric explainable ai, in:</p>
      <p>Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–15.
[4] D. Battefeld, S. Mues, T. Wehner, P. House, C. Kellinghaus, J. Wellmer, S. Kopp, Revealing the
dynamics of medical diagnostic reasoning as step-by-step cognitive process trajectories, in:
Proceedings of the Annual Meeting of the Cognitive Science Society, volume 46, 2024.
[5] J. M. George, E. Dane, Afect, emotion, and decision making, Organizational behavior and human
decision processes 136 (2016) 47–55.
[6] K. Thommes, O. Lammert, C. Schütze, B. Richter, B. Wrede, Human emotions in ai explanations,
in: L. Longo, S. Lapuschkin, C. Seifert (Eds.), Explainable Artificial Intelligence, Springer, 2024, pp.
270–293. doi:10.1007/978-3-031-63803-9_15.
[7] E. Bernardo, R. Seva, Exploration of emotions developed in the interaction with explainable ai,
in: 2022 15th International Symposium on Computational Intelligence and Design (ISCID), IEEE,
2022, pp. 143–146.
[8] O. Lammert, Can ai regulate your emotions? an empirical investigation of the influence of ai
explanations and emotion regulation on human decision-making factors, in: World Conference
on Explainable Artificial Intelligence, Springer, forthcoming, 2025.
[9] L. Guerdan, A. Raymond, H. Gunes, Toward afective xai: facial afect analysis for understanding
explainable human-ai interactions, in: Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2021, pp. 3796–3805.
[10] A. Toisoul, J. Kossaifi, A. Bulat, G. Tzimiropoulos, M. Pantic, Estimation of continuous valence
and arousal levels from faces in naturalistic conditions, Nature Machine Intelligence (2021). URL:
https://www.nature.com/articles/s42256-020-00280-0.
[11] P. J. Rousseeuw, M. Hubert, Anomaly detection by robust statistics, Wiley Interdisciplinary</p>
      <p>Reviews: Data Mining and Knowledge Discovery 8 (2018) e1236.
[12] A. Radford, J. Kim, T. Xu, G. Brockman, C. McLeavey, I. Sutskever, Robust speech recognition via
large-scale weak supervision (arxiv: 2212.04356). arxiv, 2022.
[13] M. Galanakis, A. Stalikas, C. Pezirkianidis, I. Karakasidou, et al., Reliability and validity of the
modified diferential emotions scale (mdes) in a greek sample, Psychology 7 (2016) 101.
[14] M. M. Bradley, P. J. Lang, Measuring emotion: the self-assessment manikin and the semantic
diferential, Journal of behavior therapy and experimental psychiatry 25 (1994) 49–59.
[15] C. Schütze, O. Lammert, B. Richter, K. Thommes, B. Wrede, Emotional debiasing explanations for
decisions in hci, in: H. Degen, S. Ntoa (Eds.), Artificial Intelligence in HCI. HCII 2023. Lecture
Notes in Computer Science, volume 14050, Springer, Cham, 2023, pp. 318–336. doi:10.1007/
978-3-031-35891-3_20.
[16] A. Groß, C. Schütze, M. Brandt, B. Wrede, B. Richter, Rise: an open-source architecture for
interdisciplinary and reproducible human–robot interaction research, Frontiers in Robotics and
AI 10 (2023) 1245501.
[17] I. Lütkebohle, F. Hegel, S. Schulz, M. Hackel, B. Wrede, S. Wachsmuth, G. Sagerer, The bielefeld
anthropomorphic robot head “flobi”, in: 2010 IEEE International Conference on Robotics and
Automation, 2010, pp. 3384–3391. doi:10.1109/ROBOT.2010.5509173.
[18] C. A. Holt, S. K. Laury, Risk aversion and incentive efects, American economic review 92 (2002)
1644–1655.
[19] O. Lammert, B. Richter, C. Schütze, K. Thommes, B. Wrede, Humans in xai: increased reliance
in decision-making under uncertainty by using explanation strategies, Frontiers in Behavioral
Economics 3 (2024) 1377075. doi:10.3389/frbhe.2024.1377075.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Rohlfing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , I. Scharlau,
          <string-name>
            <given-names>T.</given-names>
            <surname>Matzner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Buhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Buschmeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grimminger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Häb-Umbach</surname>
          </string-name>
          , et al.,
          <article-title>Explanation as a social practice: Toward a conceptual</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>