<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Coherence in Explainable AI: Strategies for Consistency Across Time and Interaction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alan Dix</string-name>
          <email>alan@hcibook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Turchi</string-name>
          <email>tommaso.turchi@unipi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Wilson</string-name>
          <email>b.j.m.wilson@swansea.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Malizia</string-name>
          <email>alessio.malizia@unipi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Monreale</string-name>
          <email>anna.monreale@unipi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matt Roach</string-name>
          <email>m.j.roach@swansea.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cardif Metropolitan University</institution>
          ,
          <addr-line>Wales</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computational Foundry, Swansea University</institution>
          ,
          <addr-line>Wales</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science, University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Molde University College</institution>
          ,
          <addr-line>Molde</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Can we create explanations of artificial intelligence and machine learning that have some level of consistency over time as we might expect of a human explanation? This paper explores this issue, and ofers several strategies for either maintaining a level of consistency or highlighting when and why past explanations might appear inconsistent with current decisions. Myers and Chater argue that a human explanation is not just an atomic utterance, but that we expect a level of coherence over time [13, 14]; that is future statements and explanations should be consistent with previous ones. Indeed, this is part of the implicit contract between the parties that enables mutual trust, efective communication and collaboration. For example, if Alan explains a food choice by saying “I prefer sausages to poultry”, you would expect him to subsequently choose sausages if given a choice. Myers and Chater extend their argument from the realm of human explanation to highlight 'what it would really mean for AI systems to be explainable'. They argue that AI explanations equally should have some level of consistency. Myers and Chater build their position based on extensive theoretical and empirical literature from psychology, sociology and XAI; we do not repeat this here beyond motivating examples. In this paper, we take a next step exploring the diferent ways that this consistency can occur within XAI settings and some potential algorithmic strategies to ensure this in practice.</p>
      </abstract>
      <kwd-group>
        <kwd>adaptive interfaces</kwd>
        <kwd>user experience</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This paper considers how XAI systems can behave in ways that are coherent over time, mirroring the
expectations of consistency for human explanations.</p>
      <p>
        It is widely believed that there are advantages to having AI systems that are comprehensible to
human users. This has been part of the literature since the early 1990s [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in particular highlighting the
potential for ethnic, socio-economic and gender bias in black-box ML and the way that explanation as a
form of transparency can help expose this. However, over recent years the issue has become a major
area of both research and practical development, with numerous algorithms [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ], frameworks and
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>In the next section we’ll look at the diferent ways (in)coherence may manifest in AI systems, and
then move on to consider ways this can be managed such as explaining incoherence or avoiding it
happening. Notions of nearness, closeness or local neighbourhoods are crucial to both.</p>
      <p>
        Note this paper will deal principally with single point explanations: “the system made this decision
about input  because ...”. Contrastive explanations may also be very powerful, that is answering
questions of the form, “why are the decisions about inputs  and  diferent (or the same)?” We also
principally focus on coherence between explanations and decisions for diferent inputs or models, that
is inter-response consistency. In addition for complex explanations (particularly LLMs), we can ask
whether the parts of the explanation are coherent, that is intra-response consistency. There are also
important issues regarding instability of explanations [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] to the same input (in the case of stochastic
algorithms or ongoing learning) and of explanations of the same decision given to diferent people
(where there is personalisation, for example in LLMs). We leave a detailed discussion of these issues to
a future paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Types of Incoherence</title>
      <p>Within both human–human there are many diferent meanings of coherence or consistency, with no
single clear definition, In general the term ‘coherence’ seems to be used more for internal consistency
with an argument intra-response consistency and ‘consistency’ more to do with the relationship between
multiple utterances or between utterance and action inter-response consistency. Here we are looking
predominantly at the latter, especially in relation to AI explanations, that is the extent to which the
decisions/outputs and explanation given by an AI at diferent times appear to agree or make sense
relative to one another. However, there are several ways in which an AI or ML system may exhibit
behaviour apparently (in)consistent with previous explanations. We will attempt to be more precise
than the fairly open definition above, but ultimately this is about human judgement or impressions of
what seems to be coherent.</p>
      <p>We will first look at situations where diferent inputs to the same model give rise to apparent
incoherence; that is, an AI medical advice system said that grapefruit was good to eat for one kind of
cancer, but not for another. We will then consider cases where the model has changed, say, owing to
new training data; perhaps analogous to the doctor changing their opinion based on a new article in
The Lancet.</p>
      <sec id="sec-2-1">
        <title>2.1. Notation</title>
        <sec id="sec-2-1-1">
          <title>We will use the following semi-formal notation for the AI cases:</title>
          <p>
            •  ,   ,   – previous input X, decision and explanation for a model 
•  ,   ,   – current input Y, decision and explanation for the same
•  ′ ,  ′ – decision for input X and explanation for this following a model change to  ′
The precise meaning of these difer depending on the kind of input data (e.g. images, medical test
results, user interface logs), the kinds of output (e.g. medical diagnosis, classification, automated action)
and explanation (e.g. SHAP-style feature importance, linear discriminant, decision tree).
Inconsistency
We will use the symbol ≁ to mean ‘apparently contradicts’ for diferent kinds of comparisons. In
some cases this is efectively ‘not equal’, but in others, for example looking at the relation between a
feature-importance explanation and particular decision, this is a more complex relationship.
Explanation as function
In many cases explanations are expected to be local [
            <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
            ], that is only operative in a neighbourhood
of the particular input. However, the explanation can often be applied in relation to other inputs; we
will write   [ ] for the explanation given for input  interpreted in the context of input  . Crucially
some explanations can be treated as functions that give a decision for a particular input, in these cases
we can think of   [ ] as the decision that would be taken given input  treating   as a function.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Fixed model</title>
        <p>First let’s consider the case of a fixed model that has been trained or constructed beforehand and does
not learn further during the period of use (see Figure 1, upper). We have two main cases:
Inconsistency of decisions – Is the current decision inconsistent with past explanation(s):   ≁
  [ ] ? Is the past decision consistent with current explanation:   ≁   [ ] ?
Inconsistency of explanations – Do the explanations agree in terms of decisions on the inputs, but
with diferent reasoning:</p>
        <p>∼   [ ] while   ≁   ?</p>
        <p>A human example of the first case would be if Tommaso said that a Fiat 500 was a good car because it
was small and then later said he would like to have a Humvee. An example of the second case would be
if he said he liked a blue Fiat 500 because it was small and then later said he liked a blue Mini because it
was blue.</p>
        <p>This incoherence might be for valid reasons. For example, in the first, Tommaso might prefer a small
car for ease of parking at work, but if not for that would really like the idea of driving the Humvee
– that is they are local explanations. In the second case, it might be that the explanation of the Fiat
500 had been made in comparison to a SUV whereas that for the (all) blue Mini was in contrast to a
red, white and blue striped Mini. Note that the latter, contrastive explanations, need special treatment,
which, as noted, we leave for a future paper.</p>
        <p>As with Tommaso’s reasoning, an AI model might be working well and be a justified inconsistency,
albeit initially appearing incoherent. Alternatively, the incoherence may represent a genuine problem
in reasoning:
• one or other decision or explanation was simply wrong as the model generalises poorly;
• the act of finding the later explanation efectively opened up ways of looking at the data that
would have been better applied to first input (related to model change);
• the explanation finding mechanism has stochastic elements or stability issues in certain areas
and simply gives diferent explanations by chance. In contrast to one being wrong, each might
have validity.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Changed model</title>
        <p>′ ≁  
by  ′ ≁  ′ [ ] and  
decision points.
 in diferent models  and  ′:
Now consider cases where the model has changed due to new training data (see Figure 1, lower). All
the above apply, that is we might have presented  to the old model  and  to the new model  ′, and
found apparent incoherence. These are not pictured for reasons of space, but would be represented
′ ≁   , etc. In practice, this may result from new training examples ‘near’ the old
In addition, we may see non-monotonic reasoning, that is issues of consistency with the same input
Inconsistency of decisions – Has the decision changed?  ′ ≠  
Inconsistency of explanations – Has the explanation for the same decision changed?  ′ =   , but</p>
        <p>Again similar issues can arise for human–human interactions. The earlier example of a doctor
changing their diagnosis or treatment based on a new Lancet article is an example of the first case. A
health-related example of the second would be a nutritionist who has always recommended a varied
diet in order to ensure a broad range of vitamins and nutrients, but based on recent studies, now makes
the same recommendation but emphasising the way a varied diet encourages a diverse gut biome with
an ensuing wide impact on mental and physical health.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Strategies to improve Coherence</title>
      <p>There are several diferent ways in which we can ensure coherence between decisions and explanations.
highlight inconsistency with previous explanations: “I know I said A before, but this is a diferent
kind of situation”. This doesn’t ensure consistency, but it maintains a claim to coherence.
explain inconsistency with previous explanations: “I know I said A before, but this is diferent
because of B”. This justifies the claim to coherence.
constrain consistency with previous explanations by adding each previous explanation   as a
constraint when making future decisions. This continually uses past explanations to update, or
manage the model, but may run into limits and may only be possible with some kinds of machine
learning algorithms.
ensure consistency by using each previous explanation   as a local decision rule when the current
situation is suficiently close to the input that gave rise to the explanation. That is completely
replace the model rule locally.</p>
      <sec id="sec-3-1">
        <title>We’ll look at each of these in a little more detail.</title>
        <sec id="sec-3-1-1">
          <title>3.1. Highlight Inconsistency</title>
          <p>
            Here the system needs to keep track of previous decisions and explanations and simply detect that
there is an apparent inconsistency. The exact form of this detection will vary depending on the form of
ML and XAI. As an example, the FRANK system [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] is used during interactive human training, but
adopts a mechanism that applies rules based on previous decisions to verify new user input:
“At first Frank applies the Ideal Rule Check (IRC), checking if the record is covered by one of the given
rules” [18, p.17]
          </p>
          <p>This approach is being used to monitor the consistency of human training examples in a process
of ‘skeptical learning’ (Fig. 2). However, the underlying mechanisms for checking aspects of training
data is similar to those that would be required to monitor future model decisions/advice. Rather than a
human labelling, we would instead check a new AI decision against previous rules.</p>
          <p>Detecting and highlighting inconsistency is a fairly minimal strategy, but may help retain user
confidence.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Explaining Inconsistency</title>
          <p>Where there is justified inconsistency of any of the types discussed in Section 2.2, we ideally need to
explain why this is occurring.</p>
          <p>
            Counterfactual-style explanations are already being used in some XAI contexts [
            <xref ref-type="bibr" rid="ref15 ref19">15, 19</xref>
            ] where
decisions   and   difer. For example, given two inputs  and  that look similar, but are given
diferent decisions   and   , we can try to locate training examples   and   with labels   and   , and
close to  and  respectively (ideally also both ‘between’  and  ), thus justifying the diferent decision.
          </p>
          <p>In a similar way, if   ≁   [ ] , we can find a training example   that is close to  or ‘between’  and
 , but where the label on   is not what one would expect with   [  ], thus justifying the limits of the
explanation   . The same technique can be used with multiple training examples to justify   ≁   .</p>
          <p>In many ways if the model changes, as described in Section 2.3, less justification is needed as the new
training examples quite reasonably will have changed the model. However, if the new examples appear
to be very diferent to an input  , it may still seem odd that the decision   or explanation   changes
so that change-oriented explanations are needed. In some cases we may be able to find new training
examples that are close to the past example, that is a new   that is close to  , but with a diferent
label or incompatible with the old explanation for   . In others there may be non-local changes, for
example, in a CNN (convolutional neural network) new training data might change low-level features
that have impacts on very diferent input data. This highlights the general XAI challenge of in some
way surfacing these intermediate emergent features.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.3. Constrain Inconsistency</title>
          <p>
            In some cases the underlying algorithm may be able to be constrained to continue to be consistent with
a previous explanation. For example, the Query-by-Browsing system uses a variant of ID3 to build
decision trees and SQL queries [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], which can be thought of as a single global ‘explanation’. However,
the top-down nature of ID3 means that small changes in training data may give rise to a completely
diferent decision tree. For this reason, one variant of Query-by-Browsing used genetic algorithms to
evolve the decision tree [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]; thus favouring smaller changes to the tree where this is possible consistent
with previous data.
          </p>
          <p>
            A more model agnostic method would be to generate synthetic training examples   that are distant
from existing training data, but close to a previous example  . If each new training example is labelled
to be consistent with   , this efectively cements the explanation for the locality. This is similar to the
techniques used to generate privacy-preserving synthetic data in [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ].
          </p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.4. Ensure Consistency</title>
          <p>
            As noted, in some cases we can interpret a decision   and   as a rule ‘WHEN in locality   APPLY
rule   ’. For example, LIME creates a linear discriminant model by looking at training examples in the
region of the input [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]; this both incorporates an existing idea of locality of the explanation (  ) and an
executable rule (  ).
          </p>
          <p>This collection of locality–rule pairs, (  ,  ) can then be used in a two stage process as illustrated
in Figure 3: given a new unseen input  , we first check if it is within a patch, and if so return the
result of the rule; if there is no matching patch the original model is used to generate the decision and
explanation.</p>
          <p>Initially a model M, and empty set of patches P.
for each new example X
1. look for (L[i],R[i]) in P such that X in L[i]
2. if found</p>
          <p>2.1 give decision and explanation by applying R[i] to X
3. if not found
3.1 let dX = decison of M at X
3.2 let eX = explanation of M at X
3.3 let L = a locality of X
3.4 let R = eX interpreted as a rule
3.5 add (L,R) to P
3.6 give decision dX and explanation eX</p>
          <p>This is rather like ensemble methods where one has multiple models and then meta-learning to create
a decision rule to determine which is to be used. In this case the rule set consists of the original model
 and a series of example–explanation pairs, ( ,   ), ( ,   ), etc. Efectively one is doing ensemble
learning on  ,   ,   , but where we have the anchor points  ,  to help.</p>
          <p>We can think of these local rules as comprising a partial patch model as depicted in Figure 4. The
ifgure also highlights several issues of patch models:
varying density of patches – Some areas of the input space may be densely covered in patches,
others relatively empty. For the purposes of coherence, this is not a problem, merely reflecting
the distribution of previous user inputs and associated explanations.
varying size and shape of patches – The localities defining the patches may difer in size and (highly
multidimensional) shape. The consequent issues of what to regard as ‘near’ will be discussed in
Section 4.
overlapping patches – If two localities overlap and the decisions implied by their rules difer, then
there clearly needs to be a meta-decision rule or adjustments of the localities to disambiguate the
decision. However, even if the rules agree in the intersection, the explanations will be diferent,
so there still needs to be some adjustment to one or both localities.</p>
          <p>
            The method above is iterative, building a secondary model patch-by-patch. It is also a partial patch
model, as the patches (initially at least) do not fully cover all inputs and the original model is still used
in the gaps.. The GLocalX method [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] is similar, but performs this whole process ‘upfront’, that is
by exploring the entire space, creating local explanations everywhere and then using this to create a
complete patch model (global explanation) all before ever encountering any unseen examples.
          </p>
          <p>This method can also be used when there is underlying model change. If new training examples are
not ‘close’ to a patch, the patch can be retained between models, thus also dealing with between-models
coherence at a single input point for both decisions and explanations. However, if the underlying model
change has not also been limited using some form of ‘constrain inconsistency’ approach, then this could
lead to increased instability at patch boundaries.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Nearness and locality</title>
      <p>
        In multiple places we have needed to think about some form of nearness or locality. In Section 2, which
considered the ways in which incoherence may occur, explanations for inputs  and  would only be
seen to be in conflict if  and  are suficiently close. Similarly, the patch models in Section 3 depend on
defining a locality over which each rule operates, typically defined in terms of closeness to a defining
example. Indeed ‘local’ explanation methods such as SHAP and LIME [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] have to have some measure
of what is close to a particular input in order to perform perturbations.
      </p>
      <p>
        Note there are three senses of closeness, one might want to consider:
closeness of input vectors –   – For binary features this might be Hamming distance or for
continuous features some form of Euclidean distance in feature space normalised by individual feature
variance.
closeness of outputs/classifications/decisions –   
 – This might be a binary agree/disagree, but
could be a more complex metric of the output such as a set of classifications with weightings.
coherence of explanations –     – This is the metric that is critical for instability in XAI [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. For
feature importance explanations this might be a euclidean distance, cosine similarity or Spearman
or Kendall rank correlation coeficient . For symbolic explanations, this may be some form of
inter-formulae edit distance.
      </p>
      <p>It is the first that we are dealing with in this section, but all are important in diferent circumstances.</p>
      <sec id="sec-4-1">
        <title>4.1. Localised feature importance</title>
        <p>Initially, closeness can be based on a global metric of closeness of input vectors. However, once we
have local explanations these can be used to help define more localised metrics. For example, as noted
previously, local explanations are often created by looking at training data close to the input; to be
‘local’ these will have adopted a measure of nearness, which can then be used to create the locality for a
patch model.</p>
        <p>In addition, the explanation will often create some form of feature importance which can be used to
create localised nearness metrics. In the case of perturbation and hotspot methods this is very direct,
as each feature is given a direct measure of importance, which can then be used to weight the feature
diferences in a local Euclidean metric; that is:
( ,  ) =
∑   (  −   )2

where   are weights based on the feature importance vector. Note that smaller diferences are
considered significant where they have higher feature importance, whereas even quite large diferences
in unimportant features may still be considered ‘close’.</p>
        <p>In the case of more algorithmic explanations such as decision trees, the fact that a feature is mentioned
can be used as metric of feature importance weighting these more highly than others. If the explanation
includes derived features (e.g boolean ‘SALARY &gt; 50000’), then these can be used to give a scale to the
feature. If the SALARY in the input that gave rise to this explanation is 60000, then we would expect
the locality of the rule to extend at least to some inputs that are otherwise similar, but with SALARY
less then 50000 so that the locality defines a region within which the rule is meaningfully applicable.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Explaining using measures of nearness</title>
        <p>As well as being important for constraining inconsistency or creating patch models, measures of
nearness can be used as part of explanations themselves. This might be vague, something like, “ and 
are similar in many ways, but difer in ways which are particularly important for the decision making”.
More convincing explanations could give the precise metrics being used to make the distinction, for
example, “while employee  and  have similar experience and skills, their jobs difer in terms of risk”.</p>
        <p>Local measures of nearness could also be used in counterfactual generation. If we are looking for a
training example  to explain the diference between decisions and/or explanations for  and  , then 
should be ‘between’  and  . The weighted locality metrics for  and  are likely to be better measures
of ‘between’ than global feature distances.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Explanations of measures of nearness</title>
        <p>Of course, if metrics of nearness contribute to the coherence of explanations and decisions, they must
themselves be explainable to end users. For example, rather than simply saying “while  and  difer
substantially in feature  , this is considered unimportant”, instead the explanation could be “while 
and  difer substantially in feature  , this feature does not appear in the explanations for  and  and is
therefore considered unimportant”.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>This paper has outlined several strategies for achieving coherent explanations in AI systems, particularly
in response to temporal or contextual shifts. It has identified several promising directions for future work,
including the development of mechanisms for explaining changes in reasoning; the use of patch models
that retain and reuse prior explanations; and the exploration of nearness metrics, which determine
when explanations can be meaningfully applied to new inputs. More broadly, the diversity of model
architectures and explanation types suggests a rich design space for experimenting with coherence
strategies, from algorithmic constraints to user interface representations and feedback mechanisms.</p>
      <p>From a human-centred AI perspective, coherence in explanations is not merely a technical attribute
but a vital social and cognitive afordance that underpins trust and mutual understanding. Just as
people rely on consistent reasoning to interpret intentions and anticipate actions, AI systems — all of
which operate within socio-technical settings — should treat coherence as a primary design objective
alongside accuracy and fairness. For instance, if an AI system revises its reasoning, a transparent shift in
rationale (e.g., “Given the user’s recent preferences, I now recommend slower but more scenic routes”)
can maintain user confidence even amidst change. Thus, explanation strategies supporting temporal
narrative continuity — where decisions are justified in the moment and situated within a comprehensible
arc of system behaviour — are key to fostering durable human-AI combination.</p>
      <p>While much of this paper focuses on model-specific XAI techniques, large language models (LLMs)
increasingly act as explanation interfaces — either as direct decision-makers or as natural language
layers over other AI systems. In these contexts, their internal consistency over a series of decisions
becomes a crucial dimension of explanation quality.</p>
      <p>
        Recent work has shown that large language models (LLMs) often exhibit sycophantic behaviour —
a tendency to align their outputs with user biases, personas, or perceived preferences—at the cost of
logical consistency and rational argumentation. This phenomenon introduces both intra-response and
inter-response inconsistencies, undermining the expectation of coherent explanatory reasoning over
time. For example, models have been shown to shift or abandon previously correct reasoning chains
when faced with user disagreement or subtle framing changes [
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25 ref26">22, 23, 24, 25, 26</xref>
        ]. Benchmarks such as
SycEval [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and BeHonest [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] quantify how sycophancy can persist across turns, leading to regressive
reasoning where models rationalize incorrect answers to maintain agreement. This behaviour poses a
direct challenge to explanation consistency, particularly in human-AI collaboration where users expect
stable, accountable rationales for system behaviour. Techniques such as bias-augmented training or
pinpoint tuning have shown promise in mitigating these efects [
        <xref ref-type="bibr" rid="ref22 ref26">22, 26</xref>
        ], but the deeper issue remains:
without mechanisms to preserve a coherent explanatory stance, models risk eroding user trust even
when individual outputs appear plausible. Addressing sycophancy is thus central to ensuring that
explanations remain reliable not just in the moment, but across the evolving arc of user interaction.
      </p>
      <p>
        In related work we have been looking at the way humans can explain their decisions/labels to AI on
order to improve ML and XAI [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. A key way in which these human explanations can be used is to
constrain the ML system to create decision systems that respect the human explanation as well as the
decision/label. This difers from the work in this paper in that the human explanations are efectively
additional input to the model, whereas the coherent use of XAI explanations is in some way a feedback
loop based on the current model. However, the algorithmic requirements for ensuring XAI consistency
turn out to be very similar to those for the use of human explanations as input.
      </p>
      <p>Coherence in AI explanations — whether provided through structured XAI methods or natural
language interfaces — remains an open challenge with significant implications for trust, usability,
and long-term human-AI hybridity. As AI systems become increasingly embedded in interactive
settings, the ability to provide stable, transparent, and revisitable justifications will be as important
as the correctness of individual decisions. This paper has presented potential strategies to implement
coherence within XAI, but it is not intended to ofer a final solution; our aim is to provide clarification of
the area and a partial roadmap for future research. We hope this initial exploration encourages further
work on algorithms, interaction strategies, and evaluation frameworks that treat coherence not as an
afterthought, but as a central goal of explainable AI.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been supported by the HORIZON Europe projects TANGO - Grant Agreement n.
101120763 and SoBigData++ Grant Agreement n. 871042. Views and opinions expressed are however
those of the author(s) only and do not necessarily reflect those of the European Union or the European
Health and Digital Executive Agency (HaDEA). Neither the European Union nor the granting authority
can be held responsible for them.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dix</surname>
          </string-name>
          ,
          <article-title>Human issues in the use of pattern recognition techniques</article-title>
          , in: R.
          <string-name>
            <surname>Beale</surname>
          </string-name>
          , J. Finlay (Eds.),
          <article-title>Neural Networks and Pattern Recognition in Human Computer Interaction</article-title>
          , Ellis Horwood,
          <year>1992</year>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>451</lpage>
          . URL: https://alandix.com/academic/papers/neuro92/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>
          ,
          <source>arXiv preprint arXiv:1312.6034</source>
          (
          <year>2013</year>
          ).
          <source>(ICLR</source>
          <year>2014</year>
          , Workshop Poster).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>4768</fpage>
          -
          <lpage>4777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , “
          <article-title>why should I trust you?”: Explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16)</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          . URL: https://doi.org/10.1145/2939672.2939778. doi:
          <volume>10</volume>
          .1145/2939672. 2939778.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Setzu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <article-title>GLocalX - from local to global explanations of black box AI models</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>294</volume>
          (
          <year>2021</year>
          )
          <fpage>103457</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Towards a rigorous science of interpretable machine learning</article-title>
          ,
          <source>arXiv preprint arXiv:1702.08608</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <article-title>A survey of methods for explaining black box models</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>51</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vermeulen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kankanhalli</surname>
          </string-name>
          ,
          <article-title>Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda</article-title>
          ,
          <source>in: Proceedings of the 2018 CHI conference on human factors in computing systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Angelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. I.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence: an analytical review</article-title>
          ,
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <article-title>e1424</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Are explanations helpful? a comparative study of the efects of explanations in ai-assisted decision-making</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hassija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chamola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scardapane</surname>
          </string-name>
          , I. Spinelli,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahmud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <article-title>Interpreting black-box models: a review on explainable artificial intelligence</article-title>
          ,
          <source>Cognitive Computation 16</source>
          (
          <year>2024</year>
          )
          <fpage>45</fpage>
          -
          <lpage>74</lpage>
          . URL: https://doi.org/10.1007/s12559-023-10179-8.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Vainio-Pekka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.-O.</given-names>
            <surname>Agbese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jantunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vakkuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rousi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abrahamsson</surname>
          </string-name>
          ,
          <article-title>The role of explainable AI in the research field of ai ethics</article-title>
          ,
          <source>ACM Transactions on Interactive Intelligent Systems</source>
          <volume>13</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          . URL: https://dl.acm.org/doi/10.1145/3599974.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chater</surname>
          </string-name>
          ,
          <article-title>Mutual understanding initial theory</article-title>
          ,
          <source>TANGO Deliverable D1.1</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chater</surname>
          </string-name>
          ,
          <article-title>Interactive explainability: Black boxes, mutual understanding and what it would really mean for AI systems to be as explainable as people</article-title>
          ,
          <year>2024</year>
          . URL: osf.io/preprints/ psyarxiv/ha37x_v1. doi:
          <volume>10</volume>
          .31234/osf.io/ha37x.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <article-title>Stable and actionable explanations of black-box models through factual and counterfactual rules</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>38</volume>
          (
          <year>2024</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2862</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gawantka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Just</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Savelyeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wappler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lässig</surname>
          </string-name>
          ,
          <article-title>A novel metric for evaluating the stability of XAI explanations</article-title>
          ,
          <source>Advances in Science, Technology and Engineering Systems Journal</source>
          <volume>9</volume>
          (
          <year>2024</year>
          )
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          . doi:
          <volume>10</volume>
          .25046/aj090113.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mazzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Malizia</surname>
          </string-name>
          ,
          <article-title>A Frank System for Co-Evolutionary Hybrid Decision-Making</article-title>
          ,
          <source>in: International Symposium on Intelligent Data Analysis</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teso</surname>
          </string-name>
          ,
          <article-title>Cognition-aware explanations for HML</article-title>
          ,
          <source>TANGO Deliverable D2.1</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and how to find them: literature review and benchmarking</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>38</volume>
          (
          <year>2024</year>
          )
          <fpage>2770</fpage>
          -
          <lpage>2824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dix</surname>
          </string-name>
          ,
          <article-title>Interactive querying-locating and discovering information</article-title>
          , in: Second Workshop on Information Retrieval and Human Computer Interaction, Glasgow, 11th
          <year>September 1998</year>
          ,
          <year>1998</year>
          . https://www.alandix.com/academic/papers/IQ98/.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Naretto</surname>
          </string-name>
          ,
          <article-title>Explainable AI methods and their interplay with privacy protection</article-title>
          ,
          <source>Ph.D. thesis, Scuola Normale Superiore</source>
          ,
          <year>2023</year>
          . URL: https://ricerca.sns.it/handle/11384/133984.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chua</surname>
          </string-name>
          , E. Rees,
          <string-name>
            <given-names>H.</given-names>
            <surname>Batra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          , E. Perez,
          <string-name>
            <given-names>M.</given-names>
            <surname>Turpin</surname>
          </string-name>
          ,
          <article-title>Bias-augmented consistency training reduces biased reasoning in chain-of-</article-title>
          <string-name>
            <surname>thought</surname>
          </string-name>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2403.05518.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Takuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vege</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Akalin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , TRUTH DECAY:
          <article-title>Quantifying multi-turn sycophancy in language models</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2503.11656.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <article-title>Ask again, then fail: Large language models' vacillations in judgment, 2024</article-title>
          . URL: https://arxiv.org/abs/2310.02174. arXiv:
          <volume>2310</volume>
          .
          <fpage>02174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.13160. arXiv:
          <volume>2305</volume>
          .
          <fpage>13160</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <article-title>From yes-men to truth-tellers: Addressing sycophancy in large language models with pinpoint tuning</article-title>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2409.01658.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fanous</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Daneshjou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Koyejo</surname>
          </string-name>
          , SycEval: Evaluating LLM sycophancy,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2502.08177.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          , P. Liu, Behonest: Benchmarking honesty in
          <source>large language models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2406.13261. arXiv:
          <volume>2406</volume>
          .
          <fpage>13261</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Turchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roach</surname>
          </string-name>
          ,
          <article-title>Talking Back - human input and explanations to interactive AI systems</article-title>
          , in: Workshop on Adaptive eXplainable
          <source>AI (AXAI)</source>
          ,
          <source>IUI</source>
          <year>2025</year>
          , Cagliari, Italy,
          <source>24th March</source>
          <year>2025</year>
          ,
          <year>2025</year>
          . URL: https://alandix.com/academic/papers/AXAI2025-talking-back/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>