=Paper= {{Paper |id=Vol-3209/8465 |storemode=property |title=On Monotonicity of Dispute Trees as Explanations for Case-Based Reasoning with Abstract Argumentation |pdfUrl=https://ceur-ws.org/Vol-3209/8465.pdf |volume=Vol-3209 |authors=Guilherme Paulino-Passos, Francesca Toni |dblpUrl=https://dblp.org/rec/conf/comma/Paulino-PassosT22 }} ==On Monotonicity of Dispute Trees as Explanations for Case-Based Reasoning with Abstract Argumentation== https://ceur-ws.org/Vol-3209/8465.pdf
On Monotonicity of Dispute Trees as Explanations for
Case-Based Reasoning with Abstract Argumentation
Guilherme Paulino-Passos1 , Francesca Toni1
1
    Imperial College London, Department of Computing, London, United Kingdom


                                         Abstract
                                         Recent work on explainability raises the question of what different types of explanations actually mean.
                                         One idea is that explanations can reveal information about the behaviour of the model on a subset
                                         of the input space. When this way of interpreting explanations is thought as an interactive process,
                                         inferences from explanations can be seen as a form of reasoning. In the case of case-based reasoning with
                                         abstract argumentation (AA-CBR), previous work has used arbitrated dispute trees as a methodology
                                         for explanation. Those are dispute trees where nodes are seen as losing or winning depending on the
                                         outcome for the new case under consideration. In this work we show how arbitrated dispute trees can
                                         be readapted for different inputs, which allows a broader interpretation of them, capturing more of the
                                         input-output behaviour of the model. We show this readaptation is correct by construction, and thus the
                                         resulting reasoning based on this reuse is monotonic and thus necessarily a faithful explanation.

                                         Keywords
                                         Explainable AI, Interactive Explanations, Argumentation, Case-Based Reasoning




1. Introduction
Recent work on explainability raises the question of what different types of explanations actually
mean. In particular, one general idea is that explanations can reveal information about model
behaviour on at least some other inputs. That is, an explanation can, at least in principle,
suggest some input-output behaviour of the model. Previous work has modelled this as a form
of reasoning, that can be possibly seen as non-monotonic when this usage of the explanation
for extrapolating model behaviour can be revised by further interactions [1].
   Dispute trees have been proposed as explanations for argumentation methods, in particular
for classification with 𝐴𝐴-𝐢𝐡𝑅, a model based on abstract argumentation (AA) for case-based
reasoning (CBR) [2, 3]. In the more recent version, arbitrated dispute trees (ADTs) are shown as
trees reflecting whether an argument is winning or losing, depending on the new case being
classified [3].
   Here we show how an ADT for classifying a new case can possibly be reused for other new
cases. This shows that an ADT can be seen as β€œcommitting” to outcomes for other possible
cases, instead of simply the case to be explained (and previous cases occurring in the ADT itself).

1st International Workshop on Argumentation for eXplainable AI (ArgXAI, co-located with COMMA ’22), September 12,
2022, Cardiff, UK
$ g.passos18@imperial.ac.uk (G. Paulino-Passos); f.toni@imperial.ac.uk (F. Toni)
Β€ https://clarg.doc.ic.ac.uk (G. Paulino-Passos); https://www.doc.ic.ac.uk/~ft/ (F. Toni)
 0000-0003-3089-1660 (G. Paulino-Passos); 0000-0001-8194-1459 (F. Toni)
                                       Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Indeed, this reuse yields another admissible dispute tree, and is therefore a faithful explanation,
by construction.
  In this work we instantiate our previous model of interactive explanations to dispute trees in
𝐴𝐴-𝐢𝐡𝑅, via a method of readapting a dispute tree providing explanation for a prediction.
Such method is not always applicable but when it is it results in a provably faithful explanation.
When modelling this as reasoning, we show this will result in a form of monotonic inference.


2. Background
2.1. Abstract Argumentation
An abstract argumentation framework (AF) [4] is a pair (Args, ⇝), where Args is a set (of
arguments) and β‡βŠ†Args Γ—Args is a binary relation on Args. For 𝛼, 𝛽 ∈ Args, if 𝛼 ⇝ 𝛽, then
we say that 𝛼 attacks 𝛽 and that 𝛼 is an attacker of 𝛽. For a set of arguments 𝐸 βŠ† Args and
an argument 𝛼 ∈ Args, 𝐸 defends 𝛼 if for all 𝛽 ⇝ 𝛼 there exists π›Ύβ‹ƒοΈ€βˆˆ 𝐸 such that 𝛾 ⇝ 𝛽. Then,
the grounded extension of (Args, ⇝) can be constructed as G = 𝑖⩾0 𝐺𝑖 , where 𝐺0 is the set
of all unattacked arguments, and βˆ€π‘– β©Ύ 0, 𝐺𝑖+1 is the set of arguments that 𝐺𝑖 defends. For
any (Args, ⇝), the grounded extension G always exists and is unique and, if (Args, ⇝) is
well-founded [4], extensions under other semantics (e.g. stable extensions) are equal to G. For
finite AFs, (Args, ⇝) is well-founded iff it is acyclic. Given (Args, ⇝), we will sometimes use
𝛼 ∈ (Args, ⇝) to stand for 𝛼 ∈ Args.

2.2. Abstract Argumentation for Case-Based Reasoning
We will present 𝐴𝐴-𝐢𝐡𝑅 following the presentation of Paulino-Passos and Toni [5], referring
to the model as 𝐴𝐴-𝐢𝐡𝑅βͺ° . 𝐴𝐴-𝐢𝐡𝑅βͺ° maps a dataset 𝐷 of examples labelled with an outcome
and an unlabelled example (with unknown outcome) into an AF. The dataset may be understood
as a casebase, the labelled examples as past cases and the unlabelled example as a new case: we
will use these terminologies interchangeably throughout. Cases have a characterisation, and
outcomes are chosen from two available ones, one of which is selected up-front as the default
outcome. Finally we assume that the set of characterisations of (past and new) cases is equipped
with a partial order βͺ― (whereby 𝛼 β‰Ί 𝛽 holds if 𝛼 βͺ― 𝛽 and 𝛼 ΜΈ= 𝛽 and is read β€œπ›Ό is less specific than
𝛽”) and with a relation ̸∼ (whereby 𝛼 ̸∼ 𝛽 is read as β€œπ›½ is irrelevant to 𝛼”). Previous literature
shows examples of partial orders for specific applications [6]. Formally:

Definition 1. Let 𝑋 be a set of characterisations, equipped with partial order β‰Ί and binary
relation ̸∼. Let π‘Œ = {π›Ώπ‘œ , π›ΏΒ―π‘œ } be the set of (all possible) outcomes, with π›Ώπ‘œ the default outcome.
Then, a casebase 𝐷 is a finite set such that 𝐷 βŠ† 𝑋 Γ—π‘Œ (thus a past case 𝛼 ∈ 𝐷 is of the form
(𝛼𝐢 , π›Όπ‘œ ) for 𝛼𝐢 ∈ 𝑋, π›Όπ‘œ ∈ π‘Œ ) and a new case is of the form (𝑁𝐢 , ?) for 𝑁𝐢 ∈ 𝑋. We also
discriminate a particular element 𝛿𝐢 ∈ 𝑋 and define the default argument (𝛿𝐢 , π›Ώπ‘œ ) ∈ 𝑋 Γ—π‘Œ .
   A casebase 𝐷 is coherent iff there are no two cases (𝛼𝐢 , π›Όπ‘œ ), (𝛽𝐢 , π›½π‘œ ) ∈ 𝐷 such that 𝛼𝐢 = 𝛽𝐢
but π›Όπ‘œ ΜΈ= π›½π‘œ , and it is incoherent otherwise.

   For simplicity of notation, we sometimes extend the definition of βͺ° and ≁ to 𝑋 Γ— π‘Œ , by
setting (𝛼𝑐 , π›Όπ‘œ ) βͺ° (𝛽𝑐 , π›½π‘œ ) iff 𝛼𝑐 βͺ° 𝛽𝑐 .
Definition 2. The AF mined from a dataset 𝐷 and a new case (𝑁𝐢 , ?) is (Args, ⇝), in which:

     β€’ Args = 𝐷 βˆͺ {(𝛿𝐢 , π›Ώπ‘œ )} βˆͺ {(𝑁𝐢 , ?)};
     β€’ for (𝛼𝐢 , π›Όπ‘œ ), (𝛽𝐢 , π›½π‘œ ) ∈ 𝐷 βˆͺ {(𝛿𝐢 , π›Ώπ‘œ )}, it holds that (𝛼𝐢 , π›Όπ‘œ ) ⇝ (𝛽𝐢 , π›½π‘œ ) iff

          1. π›Όπ‘œ ΜΈ= π›½π‘œ ,
          2. 𝛼𝐢 βͺ° 𝛽𝐢 , and
          3. βˆ„(𝛾𝐢 , π›Ύπ‘œ ) ∈ 𝐷 βˆͺ {(𝛿𝐢 , π›Ώπ‘œ )} with 𝛼𝐢 ≻ 𝛾𝐢 ≻ 𝛽𝐢 and π›Ύπ‘œ = π›Όπ‘œ ;

     β€’ for (𝛽𝐢 ,π›½π‘œ )∈𝐷 βˆͺ{(𝛿𝐢 , π›Ώπ‘œ )}, it holds that (𝑁𝐢 , ?)⇝(𝛽𝐢 ,π›½π‘œ ) iff (𝑁𝐢 , ?)̸∼(𝛽𝐢 ,π›½π‘œ ).

The AF mined from a dataset 𝐷 alone is (Args β€² , ⇝′ ), with Args β€² = Args βˆ– {(𝑁𝐢 , ?)} and
⇝′ =⇝ ∩(Args β€² Γ— Args β€² ).

Definition 3. Let G be the grounded extension of the AF mined from 𝐷 and (𝑁𝐢 , ?), with default
argument (𝛿𝐢 , π›Ώπ‘œ ). The outcome for 𝑁𝐢 is π›Ώπ‘œ if (𝛿𝐢 , π›Ώπ‘œ ) is in G, and π›ΏΒ―π‘œ otherwise.

Definition 4. The AF mined from 𝐷 alone and the AF mined from 𝐷 and (𝑁𝐢 , ?), with default
argument (𝛿𝐢 , π›Ώπ‘œ ), are regular when the following holds:
   1. the irrelevance relation ̸∼ is defined as: π‘₯1 ̸∼ π‘₯2 iff π‘₯1 ΜΈβͺ° π‘₯2 , and
   2. 𝛿𝐢 is the least element of 𝑋.1

   In this paper we will restrict our attention to regular mined AFs. We will refer to the (regular)
AF mined from 𝐷 and (𝑁𝐢 , ?), with default argument (𝛿𝐢 , π›Ώπ‘œ ), as 𝐴𝐹βͺ° (𝐷, 𝑁𝐢 ), and to the
(regular) AF mined from 𝐷 alone as 𝐴𝐹βͺ° (𝐷).

Example 1. For concreteness, let us consider a simple example of credit application. Applicants
will be described by two features: age and income. System developers decided from background
knowledge and previous data that the default case of an applicant is of {age:30; income:25}
(age in years, income in Β£1000 per year) and by the default it is denied (𝑦 = 0). The casebase is
formed of the following cases:
    𝐷 = {({age:20; income:100}, 1), ({age:40; income:23}, 1), ({age:32; income:40}, 1),
({age:60; income:42}, 0), ({age:86; income:150}, 1)}
    More formally, we could say inputs are elements of 𝑋 = R2 , and outputs of π‘Œ = {0, 1}. We will
interpret the partial order in the following way: {age:π‘Ž; income:𝑏} βͺ― {age:π‘Ž2; income:𝑏2}
iff i) 30 ≀ π‘Ž ≀ π‘Ž2 or π‘Ž2 ≀ π‘Ž ≀ 30; and ii) 25000 ≀ 𝑏 ≀ 𝑏2 or 𝑏2 ≀ 𝑏 ≀ 25000. Essentially, a
case 𝛼 is smaller than 𝛽 when, for both dimensions (features), the value of the dimension for 𝛼 is
between 𝛽 and the default value. Notice this means that the notion of atypicality, represented by
the partial order βͺ―, is neither being too big or being too small. The corresponding framework is
presented in Figure 1.



    1
     Indeed this is not a strong condition, since it can be proved that if π‘₯𝛼 ΜΈβͺ° 𝛿𝐢 then all cases (π‘₯𝛼 , 𝑦𝛼 ) in the
casebase could be removed, as they would never change an outcome. On the other hand, assuming also the first
condition in Definition 4, if (π‘₯𝛼 , ?) is the new case and π‘₯𝛼 ΜΈβͺ° 𝛿𝐢 , then the outcome is π›ΏΒ―π‘œ necessarily.
                                             ((30, 25), 0)



                         ((20, 100), 1)      ((40, 30), 1)          ((32, 40), 1)



                                                        ((60, 42), 0)



                                                        ((86, 150), 1)



Figure 1: Case base 𝐷 from Example 1 structured as an AA framework (i.e. without a focus case).


3. The interactive process
A more in-depth discussion of interactive explanations as reasoning is in previous literature [1].
Here, we make a more concise presentation, so the key concepts can be applied later.

3.1. Intuition
The motivation for interactive explanations is scenarios where a user evaluates the behaviour of
a model via multiple, sequential, queries. They may ask for the output for a specific input, and
ask as well for an explanation. The explanation may motivate then a new query, resulting in a
new output and explanation. One may thus ask what should we a priori expect of such process,
or such sequence of explanations. We thus consider an interactive process (as overviewed
in Fig. 2), where the user queries for an output and explanation thereof, given an input, the
system returns them, and then the user may query again. We see the AI system as including a
classifierand an explainer, with both considered black-boxes by our conceptual model.
   We assume that the explanation method (although not the classifier) can keep track of the
history of inputs it received and their outputs. This allows supporting two scenarios: i) of an
explainer that tries to improve its explanations by knowing what it explained before; and ii) of
a malicious explainer that, trying to manipulate or mislead the user and to avoid being detected,
keeps track of what was explained before.
   Histories give a snapshot of the process by finite sequences of inputs and their outputs. Note
that we assume the explainer to be a function, thus ignoring randomness of the explanation
method itself. This assumption implies that no information about the computed explanations
needs to be stored in the history, as awareness of earlier inputs and outputs already gives all
information needed to infer what explanations were also previously returned.

Example 2. Let us illustrate the framework with a model of counterfactual explanations. We
will continue Example 1 and assume the classifier is a regular 𝐴𝐴-𝐢𝐡𝑅βͺ° model mined from 𝐷.
For illustrative reasons, we will assume this classifier is treated as a black-box model. As for the
counterfactual explanations, we will not assume a particular algorithm, but we will assume it
                                                        3. (π‘₯, 𝑦), 𝐸
                 User               1. π‘₯


                                           Classifier                                Explainer
                                                                2. (π‘₯, 𝑦)
               System                                                       ((π‘₯0 , 𝑦0 ), . . . , (π‘₯𝑛 , 𝑦𝑛 ))

Figure 2: Overview of the interactive explanation process. 1. The user queries the system with an input
π‘₯, which goes to the classifier. 2. The classifier produces output 𝑦, and sends the pair (π‘₯, 𝑦) to the
explainer. 3. The explainer produces an explanation 𝐸 for (π‘₯, 𝑦), using information from the history of
inputs/outputs ((π‘₯0 , 𝑦0 ), . . . , (π‘₯𝑛 , 𝑦𝑛 )), and sends (π‘₯, 𝑦) and 𝐸 back to the user, who may then stop or
further query the system.


finds a counterfactual by heuristical search [7, 8]. Besides, we will interpret it with a minimality
assumption: a counterfactual is taken to be minimally different from the input, corresponding to
the idea that it is as close as possible [7].
   This way, for a single input-output pair (π‘₯, 𝑦), the method returns a new input-output pair
(π‘₯𝑐𝑓 , 𝑦𝑐𝑓 ). For describing the intended interpretation is: let (Ξ”π‘Ž , Δ𝑏 ) = π‘₯ βˆ’ π‘₯𝑐𝑓 , and (Ξ”β€²π‘Ž , Δ′𝑏 )
such that 0 ≀ Ξ”β€²π‘Ž ≀ Ξ”π‘Ž , and 0 ≀ Δ′𝑏 ≀ Δ𝑏 . For any such (Ξ”β€²π‘Ž , Δ′𝑏 ), π‘₯β€² = π‘₯ + (Ξ”β€²π‘Ž , Δ′𝑏 ) is
intuitively a smaller change to π‘₯ than π‘₯𝑐𝑓 , and thus is expected to be classified as 𝑦, and not as 𝑦𝑐𝑓 .
   For example, for a case π‘₯0 = {age:50; income:50} the output is 𝑦0 = 1, if the counterfactual
is 𝑒0 = ({age:75; income:43}, 0), then for input π‘₯1 = {age:65; income:45} the expected
output would still be 1.
   Now suppose a user interacts with the system querying for π‘₯0 , receives 𝑦0 as output and 𝑒0 as
an explanation. What would be expected if the user queries for π‘₯1 , but instead receives 𝑦1 = 0 and
𝑒1 = ({age:59; income:43}, 1)? Is this an inconsistency of the method? Could it be solved?


3.2. A formal model of the interactive process
We assume an input set 𝑋 and an output set π‘Œ as well as a set of possible explanations β„°, that we
keep abstract. Regarding notation,
                                ⋃︀ for a𝑖 set 𝑆,𝑖 we denote the set of finite sequences of elements
of 𝑆 as π‘†π‘’π‘ž(𝑆), i.e., π‘†π‘’π‘ž(𝑆) = π‘–βˆˆN 𝑆 for 𝑆 a sequence of 𝑖 elements of 𝑆. Given 𝑛 ∈ N, we
use the notation [𝑛] = {π‘š ∈ N | π‘š ≀ 𝑛}. Thus a sequence (𝑠0 , 𝑠1 , . . . , 𝑠𝑛 ) ∈ π‘†π‘’π‘ž(𝑆) can be
written as (𝑠𝑖 )π‘–βˆˆ[𝑛] .
   We consider that the system is composed of a classifier C : 𝑋 β†’ π‘Œ and an explanation method
E : π‘†π‘’π‘ž(𝑋 Γ— π‘Œ ) β†’ π‘†π‘’π‘ž(β„°), mapping from a sequence of input-output pairs (π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[𝑛] to a
sequence of explanations (𝐸𝑖 )π‘–βˆˆ[𝑛] , of the same length as the sequence of pairs. Notice that the
explainer uses information on the entire past of inputs-outputs, as we discussed in Section 3.1.
We can think of this sequence of pairs as a history. We consider that at each time step 𝑑 ∈ N, the
user queries for an input π‘₯𝑑 ∈ 𝑋, which receives a classification C(π‘₯𝑑 ) = 𝑦𝑑 and an explanation
𝐸𝑑 . In this way, the explainer provides an explanation motivated by a specific input-output,
while considering the history ((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[π‘‘βˆ’1] .
                                                      ⊒
                                   π‘†π‘’π‘ž(𝑋 Γ— π‘Œ )              𝑋 Γ—π‘Œ
                                                 |=
                                       E                     |=
                                                      βŠ’π‘’
                                      π‘†π‘’π‘ž(β„°)                      β„°

Figure 3: Diagram illustrating the relations between sequences of input-output pairs, sequences of
explanations, single pairs and single explanations. This diagram does not always commute (indicated by
dashed lines) since for a sequence of explanations ⊒-entailing an input-output pair, there may not exist
a single explanation βŠ’π‘’ -entailed by this sequence which |=-entails this pair.


   An important particular case is when there is a function Eβˆ™ : 𝑋 Γ— π‘Œ β†’ β„°, mapping from a
single example (π‘₯, 𝑦) to an explanation 𝐸. In this case, the explainer function E can be defined
as applying Eβˆ™ to each element of the sequence: formally, E(((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] ) = (Eβˆ™ (π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] .
In this particular case the history is disregarded when explanations are computed.
   This view of the interactive explanation process does not enforce that previously exhibited
explanations are kept, that is, that 𝐸𝑖 is unchanged for all 𝑖 ∈ [𝑑] when π‘₯𝑑+1 is queried. Past
explanations being unretractable, in the sense that the system cannot replace the explanation of
any previous query, is captured by the following property:

Definition 5 (Interaction-stability). An explainer E is said to be interaction-stable whenever,
for every sequence of input-output pairs (π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[𝑛] and for every π‘š < 𝑛, if (𝐸𝑖 )π‘–βˆˆ[𝑛] =
E((π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[𝑛] ) and (𝐸𝑖′ )π‘–βˆˆ[π‘š] = E((π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[π‘š] ) then 𝐸𝑖 = 𝐸𝑖′ for any 𝑖 ∈ [π‘š].

   That is, an interaction-stable explainer will always keep the explanation 𝐸𝑖 associated to the
pair (π‘₯𝑖 , 𝑦𝑖 ), even as the interaction moves on. It is straightforward to see that an explainer E
derived from a function Eβˆ™ is always interaction-stable.
   With this setup, inference can be defined. We assume that a sequence of explanations (𝐸𝑖 )π‘–βˆˆ[𝑛]
β€œcommits” to some model behaviour. We model this by an entailment relation |= between π‘†π‘’π‘ž(β„°)
and 𝑋 Γ—π‘Œ , in such a way that (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) means that (𝐸𝑖 )π‘–βˆˆ[𝑛] β€œcommits” to the outcome
𝑦, given the input π‘₯. We will abuse notation and define 𝐸 |= (π‘₯, 𝑦) to mean (𝐸) |= (π‘₯, 𝑦) (for
(𝐸) the sequence with just one explanation, 𝐸). This entailment relation we keep abstract
and application-dependent. What is important is that it captures how a user would interpret
the explanation or plausible inferences therefrom, including as regards the input-output being
explained. One example is explanations as sufficient reasons: any sufficient reason is exactly
a rule that guarantees the output for a part of the input space, including the given input. An
important particular case of entailment is when it does not depend on the order of the elements
of the sequence. In this case, a set-based representation would be enough, and it is in this sense
that sequences generalise sets.
   From this core notion of |=, relating explanations to input-output, we can derive two β€œho-
mogeneous” notions of β€œentailment”, that is, from sequences of elements of a set to elements
of the same set. This makes that notion more analogous to the notion of entailment in logic,
which is defined from sets of formulas to a single formula. One such notion is at input-output
level, and the other at explanation level. For the former, we say (π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[𝑛] ⊒ (π‘₯, 𝑦) iff
E((π‘₯𝑖 , 𝑦𝑖 )π‘–βˆˆ[𝑛] ) |= (π‘₯, 𝑦). For the latter, (𝐸𝑖 )π‘–βˆˆ[𝑛] βŠ’π‘’ 𝐸 iff βˆ€(π‘₯, 𝑦) ∈ 𝑋 Γ— π‘Œ , if 𝐸 |= (π‘₯, 𝑦) then
(𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) (summarised in Fig. 3).

3.3. Consistency and non-monotonicity
We shall now define properties of explainers, in particular of consistency and non-monotonicity
for the relations associated with explainers.

Definition 6 (Consistency). A sequence of explanations (𝐸𝑖 )π‘–βˆˆ[𝑛] is said to be consistent iff there
does not exist π‘₯ ∈ 𝑋, 𝑦, 𝑦 β€² ∈ π‘Œ , with 𝑦 ΜΈ= 𝑦 β€² , such that (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) and (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦 β€² ).
An entailment relation |= is said to be consistent iff every sequence of explanations is consistent.
A relation ⊒ is said to be consistent iff there does not exist π‘₯ ∈ 𝑋, 𝑦, 𝑦 β€² ∈ π‘Œ , with 𝑦 ΜΈ= 𝑦 β€² , and
((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] such that ((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) and ((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦 β€² ).

   Since the relations ⊒ and βŠ’π‘’ , derived from the base notion of |= , are β€œhomogeneous”, we
can define properties borrowed from the literature on non-monotonic reasoning [9, 10], what
would not be possible for the relation |=. We only generalise them to sequences, instead of sets
(as typical). Formally, some properties are:

Definition 7 (Non-monotonicity). The relation |= is said to be non-monotonic iff there is
(𝐸𝑖 )π‘–βˆˆ[𝑛] , 𝐸𝑛+1 and (π‘₯, 𝑦) such that (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) and (𝐸𝑖 )π‘–βˆˆ[𝑛+1] ΜΈ|= (π‘₯, 𝑦).
   Also, given a set 𝑆, a relation βŠ’β€² from π‘†π‘’π‘ž(𝑆) to 𝑆, and 𝑠, 𝑠𝑖 ∈ 𝑆, for 𝑖 ∈ N, the relation βŠ’β€² is
said to satisfy non-monotonicity iff there is (𝑠𝑖 )π‘–βˆˆ[𝑛] , 𝑠𝑛+1 , 𝑠 s.t. (𝑠𝑖 )π‘–βˆˆ[𝑛] βŠ’β€² 𝑠 and (𝑠𝑖 )π‘–βˆˆ[𝑛+1] ΜΈβŠ’β€² 𝑠.

  Which concrete methods satisfy which of those properties and what is the impact on user
experience are open questions.

Example 2 (continued). An instantiation for counterfactuals as presented in this example is that
an explanation is a tuple of original input, output, and counterfactual and counterfactual output
(π‘₯, π‘₯𝑐𝑓 , 𝑦𝑐𝑓 ). Thus the explainer E is interaction-stable, based on a Eβˆ™ that, given a single input-
output pair, returns this tuple. The entailment relation is then defined as: for a single explanation
𝐸 = (π‘₯, 𝑦, π‘₯𝑐𝑓 , 𝑦𝑐𝑓 ), 𝐸 |= (π‘₯β€² , 𝑦 β€² ) iff:
    1. π‘₯β€² = π‘₯𝑐𝑓 and 𝑦 β€² = 𝑦𝑐𝑓 ;
    2. π‘₯β€² = π‘₯ and 𝑦 β€² = 𝑦; or
    3. with (Ξ”π‘Ž , Δ𝑏 ) = π‘₯ βˆ’ π‘₯𝑐𝑓 and (Ξ”β€²π‘Ž , Δ′𝑏 ) = π‘₯ βˆ’ π‘₯β€² , 0 ≀ Ξ”β€²π‘Ž ≀ Ξ”π‘Ž , and 0 ≀ Δ′𝑏 ≀ Δ𝑏 and
       𝑦′ = 𝑦′.
   For a sequence of explanations, a naΓ―ve aggregation can be used: (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯β€² , 𝑦 β€² ) iff there is
𝑖 such that 𝐸𝑖 |= (π‘₯β€² , 𝑦 β€² ). Using this definition, the behaviour in Example 2 is inconsistent.
   An alternative method could aggregate restricting by a specificity criterion: say 𝐸 covers π‘₯β€² iff
there is 𝑦 β€² such that 𝐸 |= (π‘₯β€² , 𝑦 β€² ). We say that 𝐸1 is more specific than 𝐸2 if the set of inputs
covered by 𝐸1 is a subset of the set of inputs covered by 𝐸2 . Then redefine for sequences in the
following way: (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯β€² , 𝑦 β€² ) iff there is 𝑖 such that 𝐸𝑖 |= (π‘₯β€² , 𝑦 β€² ) and there is no 𝐸𝑗 such
that 𝐸𝑗 |= (π‘₯β€² , 𝑦 β€²β€² ), where 𝑦 β€² ΜΈ= 𝑦 β€²β€² . By this definition, even if 𝑒0 |= (π‘₯1 , 1), (𝑒0 , 𝑒1 ) |= (π‘₯1 , 0)
and (𝑒0 , 𝑒1 ) ΜΈ|= (π‘₯1 , 1). This would make such relation consistent and non-monotonic.
4. Arbitrated dispute trees
Dispute trees [11, 12] have been defined as explanations to 𝐴𝐴-𝐢𝐡𝑅 since Čyras et al. [2], and
further developed in the form of arbitrated dispute trees for an extended version of it in Čyras
et al. [3]. This last form is the one which will interest us, due to its symmetry regarding default
and non-default outcomes. For compatibility with our presentation, we will redefine the main
concepts, but omit results proven originally, since the adaptations are minimal.

Definition 8. Let 𝐴𝐹βͺ° (𝐷, π‘₯) = (Args, ⇝). An arbitrated dispute tree (ADT) is a tree π’Ÿπ’―
such that:
   1. every node of π’Ÿπ’― is of the form [N : 𝛼] for 𝑁 ∈ {π‘Š, 𝐿} and 𝛼 ∈ Args, any such node being
      called an 𝑁 -node labelled by argument 𝛼;
   2. the root of π’Ÿπ’― is labelled by the default argument (𝛿𝐢 , π›Ώπ‘œ ) and is a W-node, if
      𝐴𝐴-𝐢𝐡𝑅βͺ° (𝐷, π‘₯) = π›Ώπ‘œ ; and a L-node otherwise;
   3. for every W-node 𝑛 labelled 𝛼 ∈ Args and for every 𝛽 attacker of 𝛼 in (Args, ⇝), there is
      a child π‘š of 𝑛 such that π‘š is a L-node labelled by 𝛽;
   4. for every L-node 𝑛 labelled by 𝛼 ∈ Args, there is exactly one child which is a W-node
      labelled by some 𝛽 attacker of 𝛼; and
   5. there are no other nodes in π’Ÿπ’― .
We will also refer to π’Ÿπ’― as an arbitrated dispute tree for (the prediction for) π‘₯.

  An example of a dispute tree is in Figure 4a. Dispute trees have been originally defined as
being possibly infinite, while Čyras et al. [3] assumes casebases are coherent, and one can thus
prove every arbitrated dispute tree in that context is finite2 . While we make no such assumption,
we consider infinite dispute trees to be inappropriate for explaining, so we retain the original
definition but explicitly define ADT explanations to limit ourselves to the finite case.

Definition 9. An arbitrated dispute tree explanation is a finite arbitrated dispute tree.


5. Arbitrated dispute trees as interactive explanations
While arbitrated dispute trees have been used as explanations in previous literature [3], here we
instantiate our framework for interactive explanations as ADTs. Our main question is: given an
ADT, to what model behaviour does it β€œcommit”? What should the |= relation be for them?
   ADTs are way of presenting and explaining the behaviour of 𝐴𝐴-𝐢𝐡𝑅 as a more succinct
structure than the entire argumentation framework. Our idea here is simply that such presenta-
tion can be generalised to other cases: intuitively, an ADT for a new case π‘₯ may show a set of
sufficient reasons for justifying a decision for another new case π‘₯2. In a sense, π‘₯2 does not need
more information than was already presented in that ADT in order to be decided, therefore the
rest of the AF is unnecessary. This can be verified simply by the structure of the decision tree
and by checking for relevance. Let us first define this formally.

   2
       This not being the case, some results from the original presentation are inapplicable, such as [3, Prop. 3.4].
           [L : ((30, 25), 0)]              [L : ((30, 25), 0)]              [L : ((30, 25), 0)]



           [W : ((40, 30), 1)]              [W : ((40, 30), 1)]              [W : ((40, 30), 1)]



           [L : ((60, 42), 0)]              [L : ((60, 42), 0)]              [L : ((60, 42), 0)]



           [W : ((86, 150), 1)]             [W : ((86, 150), 1)]             [W : ((50, 50), ?)]

                   (a)                             (b)                              (c)
Figure 4: Example of ADT explanation in Example 4. Figure 4a shows the ADT explanation for input
π‘₯2 = {age:90; income:200}. Figures 4b and 4c show the process of adapting it to the new input
π‘₯0 = {age:50; income:50}. In Figure 4b, in red are the nodes labelled by arguments which are
irrelevant to π‘₯0 . There is a single leaf, which is a W node. Therefore this ADT can be adapted. This is
done in Figure 4c, with the (single) attacker of the leaf added back, but now attacked by the new case.


5.1. Readaptation of ADTs
Let 𝐷 be a dataset, and (Args, ⇝) = 𝐴𝐹βͺ° (𝐷) be its corresponding argumentation framework.
Let π‘₯ be a new case, and π’Ÿπ’― be an ADT explanation for its prediction. Now let π‘₯2 be a second
new case. Depending on the partial order relation, on the ADT and on the new cases, it might
be possible to reuse the ADT π’Ÿπ’― , without looking at (Args, ⇝) directly, to decide the outcome
for π‘₯2. Indeed, in that case another ADT would be generated as well.
  The two core ideas are: 1) if a past case is relevant to the new case, then every case smaller
than it is also relevant; 2) every attacker of a W-node is in an ADT (as a L-node).
Theorem 10. Let π’Ÿπ’― β€² be π’Ÿπ’― with all nodes labelled by π‘₯1 removed. For every leaf 𝑙 of π’Ÿπ’― β€² , let
π‘šπ‘Žπ‘₯𝑙 be the node in the path from the root to 𝑙 which is maximally relevant to π‘₯2 (that is, there is
no node in the path greater than it such that it is also relevant to 𝑙).
   If all π‘šπ‘Žπ‘₯𝑙 are W-nodes, then the predicted outcome for π‘₯2 is the same as the predicted outcome
for π‘₯. Besides, let π’Ÿπ’― π‘₯2 be the tree constructed by the following process: start with the subtree of
π’Ÿπ’― β€² containing only π‘šπ‘Žπ‘₯𝑙 and their ancestors, add all L-nodes which are children of π‘šπ‘Žπ‘₯𝑙 in
π’Ÿπ’― β€² , and, finally, for each L-node added in this way, add as a child a new W-node labelled by π‘₯2.
Then π’Ÿπ’― π‘₯2 is an ADT for the prediction on π‘₯2.
  When this is the case for new cases π‘₯ and π‘₯2, we will say π’Ÿπ’― can be readapted to π‘₯2. Before
proving the theorem, let us illustrate the process:
Example 3. Given our casebase 𝐷, suppose we would like to have the prediction for π‘₯2 =
{age:90; income:200} and an ADT explanation. One would notice that the outcome is 𝑦2 = 1
with the ADT in Figure 4a. The idea is, if one would like to predict for π‘₯0 = {age:50; income:50},
this ADT could be readapted in a straightforward way. The processed is illustrated in the remaining
of Figure 4.
Proof of Theorem 10. For every 𝑙, π‘šπ‘Žπ‘₯𝑙 exists: by regularity, the default case (thus the root) is
relevant to π‘₯2. Let us see that π’Ÿπ’― π‘₯2 is an ADT. Clearly the condition 1 is satisfied. Let us
check conditions 3 and 4 for each node. For every branch, every node until π‘šπ‘Žπ‘₯𝑙 satisfies the
conditions in Definition 8, since π’Ÿπ’― is an ADT. Now consider π‘šπ‘Žπ‘₯𝑙 . By assumption, every
π‘šπ‘Žπ‘₯𝑙 is a W-node. Again, since π’Ÿπ’― is an ADT for π‘₯, for every π‘šπ‘Žπ‘₯𝑙 , each child of it is included
in π’Ÿπ’― as a L-node child. Since those are also in π’Ÿπ’― π‘₯2 , the condition is satisfied for each π‘šπ‘Žπ‘₯𝑙 .
Next, each of those attackers require exactly one W-node as a child, attacking it. This is satisfied
by every added W-node labelled by π‘₯2, which is an attacker since the arguments which label
such L-nodes are irrelevant (otherwise π‘šπ‘Žπ‘₯𝑙 would not be maximally relevant). Finally, since
π‘₯2 has no attackers in 𝐴𝐹βͺ° (𝐷, π‘₯2,), it satisfies the conditions of a leaf. Condition 5 is clearly
satisfied.
    The last requirement to check is whether condition 2 is satisfied. Indeed it is, since, given
that the other conditions are satisfied, then the set of arguments labelling W-nodes is in the
grounded extension of 𝐴𝐹βͺ° (𝐷, π‘₯2), which can be verified by induction [3, Prop. 3.3]. Therefore
if the root is a W-node, it is in the grounded extension and thus the prediction if π›Ώπ‘œ . Otherwise, it
is a L-node and then it has as a child which is a W-node, that is, the default argument is attacked
by the grounded extension (and thus not in it, since it is conflict-free) and so the prediction of
π‘₯2 is the π›ΏΒ―π‘œ .

5.2. The |= relation for dispute trees
Now we can instantiate our framework with dispute trees. We will define Eβˆ™ as a function
that receives a new case π‘₯ and its predicted outcome 𝐴𝐴-𝐢𝐡𝑅βͺ° (𝐷, π‘₯) = 𝑦, and returns an
ADT explanation π’Ÿπ’― for it. While the choice of the decision tree could matter for practical
applications, for our purposes our concern is only that some decision tree is returned. Regarding
existence, it requires the argumentation frameworks to be acyclic. This can be achieved by
restricting ourselves to coherent casebases, or instead applying 𝑐𝐴𝐴-𝐢𝐡𝑅βͺ° , a variation of the
original method which guarantees acyclic AFs [5]. For the rest of the section we will assume
those conditions.
   We can then easily instantiate the remaining of the framework, extending for sequences as
previously suggested E(((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] ) = (Eβˆ™ (π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] , and defining |= as: (π’Ÿπ’― 𝑖 )π‘–βˆˆ[𝑛] |=
(π‘₯, 𝑦) iff there is 𝑖 such that π’Ÿπ’― 𝑖 can be readapted to π‘₯. Indeed, this readapted tree π’Ÿπ’― π‘₯ is
precisely an explanation such that π’Ÿπ’― 𝑖 βŠ’π‘’ π’Ÿπ’― π‘₯ .
   The intuition for this relation is a possible answer to the question: what does and ADT
explanation say about other inputs, beyond the originally one being explained? Surely it also
reveals how the cases which label the appearing nodes are decided, but would that be all? What
we show from Theorem 10 is precisely that, for a new input case, if, when filtering the tree
for only nodes labelled by relevant arguments, all leaves are W-nodes, then not only we can
guarantee what the output is, but also acquire an explanation for it via a transformation. Also
notice that this all is done without running the prediction algorithm entirely from scratch. How
could this be used to accelerate inference time in practice is outside the scope of this paper.
   Another interesting aspect to notice is that, if there is a single L-node as a maximally
relevant node, then nothing can be said. This can be seen from Figure 4. For a new case
{age:19; income:110}, the only relevant case is the root (the default case), which is a L-case.
However the prediction for it is still 1 since the case ({age:20; income:100}, 1) is relevant,
and would attack the default case in the grounded extension. The slightly different new case
{age:21; income:110} would have no such attacker.
  We can then state the following consequences:

Theorem 11. Let E(((π‘₯𝑖 , 𝑦𝑖 ))π‘–βˆˆ[𝑛] ) = (𝐸𝑖 )π‘–βˆˆ[𝑛] , and (π‘₯, 𝑦) ∈ 𝑋 Γ— π‘Œ Then:
   1. E is interaction-stable;
   2. (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦) iff there is an explanation 𝐸 such that (𝐸𝑖 )π‘–βˆˆ[𝑛] βŠ’π‘’ 𝐸 and 𝐸 |= (π‘₯, 𝑦);
   3. (faithfulness) if (𝐸𝑖 )π‘–βˆˆ[𝑛] |= (π‘₯, 𝑦), then 𝐴𝐴-𝐢𝐡𝑅βͺ° (𝐷, π‘₯) = 𝑦.
   4. |= is consistent;
   5. |= is monotonic.

   Those are straightforward from |= being defined from existence of an explanation in the
sequence and from the fact that all arguments labelling W-nodes are contained in the grounded
extension [3, Prop. 3.3].
   ADT explanations are simple, but provably faithful explanations for 𝐴𝐴-𝐢𝐡𝑅. This should
not be surprising, since this inference relation is conservative: a single L-node as a maximally
relevant node in the tree makes the entire ADT unusable.


6. Discussion and conclusion
A possible discussion is whether the entire set of ADTs for a input could not be returned instead.
While this is a possibility, likely for a practical application all this information would be used, but
still adapted and filtered in a way not to avoid overwhelming the final user. This is especially the
case since many different dispute trees could have overlaps, being partly redundant. Depending
on context, this could be less informative than having the entire argumentation framework,
since although not a tree, there is no repetition of cases in there. Thus having a single ADT
seems a more effective strategy. The condition is, as previously mentioned, that either the
casebase is coherent, or that 𝑐𝐴𝐴-𝐢𝐡𝑅βͺ° is used [5], since there is also a correspondence in
that given an outcome, there is always a dispute for explaining that decision [3, Prop.3.4].
   We leave for future work an exploration of how would ADTs be for real applications of
𝐴𝐴-𝐢𝐡𝑅, including how often ADTs can be reused, and whether this can create gains in
inference time. Another aspect left for future work is user studies for evaluating the effectiveness
of ADT as explanations. An interesting question is whether ADTs with maximally relevant
L-nodes could be readapted in some way, even if fallible. An idea would be considering those
prima facie explanations, resulting in a non-monotonic entailment relation from sequences of
explanations.


Acknowledgments
GPP was supported by Capes (Brazil, Ph.D. Scholarship 88881.174481/2018-01). FT was partially
funded by the European Research Council (ERC) under the European Union’s Horizon 2020
research and innovation programme (grant agreement No. 101020934) and by J.P. Morgan
and by the Royal Academy of Engineering under the Research Chairs and Senior Research
Fellowships scheme. Any views or opinions expressed herein are solely those of the authors.


References
 [1] G. Paulino-Passos, F. Toni, On interactive explanations as non-monotonic reasoning, in:
     Workshop on Explainable Artificial Intelligence (XAI) at IJCAI 2022, 2022.
 [2] K. Čyras, K. Satoh, F. Toni, Explanation for case-based reasoning via abstract argumenta-
      tion, in: P. Baroni, T. F. Gordon, T. Scheffler, M. Stede (Eds.), Computational Models of
     Argument - Proceedings of COMMA 2016, Potsdam, Germany, 12-16 September, 2016, vol-
      ume 287 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2016, pp. 243–254.
      doi:10.3233/978-1-61499-686-6-243.
 [3] K. Čyras, D. Birch, Y. Guo, F. Toni, R. Dulay, S. Turvey, D. Greenberg, T. Hapuarachchi,
      Explanations by arbitrated argumentative dispute, Expert Syst. Appl. 127 (2019) 141–156.
      doi:10.1016/j.eswa.2019.03.012.
 [4] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic
      reasoning, logic programming and n-person games, Artificial Intelligence 77 (1995) 321 –
      357.
 [5] G. Paulino-Passos, F. Toni, Monotonicity and noise-tolerance in case-based reasoning with
      abstract argumentation, in: M. Bienvenu, G. Lakemeyer, E. Erdem (Eds.), Proceedings of the
     18th International Conference on Principles of Knowledge Representation and Reasoning,
      KR 2021, Online event, November 3-12, 2021, 2021, pp. 508–518. doi:10.24963/kr.2021/
     48.
 [6] O. Cocarascu, A. Stylianou, K. Čyras, F. Toni, Data-empowered argumentation for dialec-
      tically explainable predictions, in: ECAI 2020 - 24th European Conference on Artificial
      Intelligence, Santiago de Compostela, Spain, 10-12 June 2020, 2020.
 [7] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the
      black box: Automated decisions and the gdpr, Harvard Journal of Law & Technology 31
     (2018).
 [8] M. Hildebrandt, C. Castillo, L. E. Celis, S. Ruggieri, L. Taylor, G. Zanfir-Fortuna (Eds.), FAT*
     ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January
      27-30, 2020, ACM, 2020. doi:10.1145/3351095.
 [9] D. Makinson, General patterns in nonmonotonic reasoning, in: D. M. Gabbay, C. J. Hogger,
     J. A. Robinson (Eds.), Handbook of Logic in Artificial Intelligence and Logic Programming
     - Volume 3 - Nonmonotonic Reasoning and Uncertain Reasoning, Oxford University Press,
     1994, pp. 35–110.
[10] S. Kraus, D. Lehmann, M. Magidor, Nonmonotonic reasoning, preferential models and cu-
      mulative logics, Artif. Intell. 44 (1990) 167–207. doi:10.1016/0004-3702(90)90101-5.
[11] P. M. Dung, R. A. Kowalski, F. Toni, Dialectic proof procedures for assumption-based,
      admissible argumentation, Artif. Intell. 170 (2006) 114–159. doi:10.1016/j.artint.
      2005.07.002.
[12] P. M. Dung, P. Mancarella, F. Toni, Computing ideal sceptical argumentation, Artif. Intell.
     171 (2007) 642–674. doi:10.1016/j.artint.2007.05.003.