=Paper=
{{Paper
|id=Vol-3209/8465
|storemode=property
|title=On Monotonicity of Dispute Trees as Explanations for Case-Based Reasoning with Abstract Argumentation
|pdfUrl=https://ceur-ws.org/Vol-3209/8465.pdf
|volume=Vol-3209
|authors=Guilherme Paulino-Passos, Francesca Toni
|dblpUrl=https://dblp.org/rec/conf/comma/Paulino-PassosT22
}}
==On Monotonicity of Dispute Trees as Explanations for Case-Based Reasoning with Abstract Argumentation==
On Monotonicity of Dispute Trees as Explanations for Case-Based Reasoning with Abstract Argumentation Guilherme Paulino-Passos1 , Francesca Toni1 1 Imperial College London, Department of Computing, London, United Kingdom Abstract Recent work on explainability raises the question of what different types of explanations actually mean. One idea is that explanations can reveal information about the behaviour of the model on a subset of the input space. When this way of interpreting explanations is thought as an interactive process, inferences from explanations can be seen as a form of reasoning. In the case of case-based reasoning with abstract argumentation (AA-CBR), previous work has used arbitrated dispute trees as a methodology for explanation. Those are dispute trees where nodes are seen as losing or winning depending on the outcome for the new case under consideration. In this work we show how arbitrated dispute trees can be readapted for different inputs, which allows a broader interpretation of them, capturing more of the input-output behaviour of the model. We show this readaptation is correct by construction, and thus the resulting reasoning based on this reuse is monotonic and thus necessarily a faithful explanation. Keywords Explainable AI, Interactive Explanations, Argumentation, Case-Based Reasoning 1. Introduction Recent work on explainability raises the question of what different types of explanations actually mean. In particular, one general idea is that explanations can reveal information about model behaviour on at least some other inputs. That is, an explanation can, at least in principle, suggest some input-output behaviour of the model. Previous work has modelled this as a form of reasoning, that can be possibly seen as non-monotonic when this usage of the explanation for extrapolating model behaviour can be revised by further interactions [1]. Dispute trees have been proposed as explanations for argumentation methods, in particular for classification with π΄π΄-πΆπ΅π , a model based on abstract argumentation (AA) for case-based reasoning (CBR) [2, 3]. In the more recent version, arbitrated dispute trees (ADTs) are shown as trees reflecting whether an argument is winning or losing, depending on the new case being classified [3]. Here we show how an ADT for classifying a new case can possibly be reused for other new cases. This shows that an ADT can be seen as βcommittingβ to outcomes for other possible cases, instead of simply the case to be explained (and previous cases occurring in the ADT itself). 1st International Workshop on Argumentation for eXplainable AI (ArgXAI, co-located with COMMA β22), September 12, 2022, Cardiff, UK $ g.passos18@imperial.ac.uk (G. Paulino-Passos); f.toni@imperial.ac.uk (F. Toni) Β https://clarg.doc.ic.ac.uk (G. Paulino-Passos); https://www.doc.ic.ac.uk/~ft/ (F. Toni) 0000-0003-3089-1660 (G. Paulino-Passos); 0000-0001-8194-1459 (F. Toni) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Indeed, this reuse yields another admissible dispute tree, and is therefore a faithful explanation, by construction. In this work we instantiate our previous model of interactive explanations to dispute trees in π΄π΄-πΆπ΅π , via a method of readapting a dispute tree providing explanation for a prediction. Such method is not always applicable but when it is it results in a provably faithful explanation. When modelling this as reasoning, we show this will result in a form of monotonic inference. 2. Background 2.1. Abstract Argumentation An abstract argumentation framework (AF) [4] is a pair (Args, β), where Args is a set (of arguments) and ββArgs ΓArgs is a binary relation on Args. For πΌ, π½ β Args, if πΌ β π½, then we say that πΌ attacks π½ and that πΌ is an attacker of π½. For a set of arguments πΈ β Args and an argument πΌ β Args, πΈ defends πΌ if for all π½ β πΌ there exists πΎβοΈβ πΈ such that πΎ β π½. Then, the grounded extension of (Args, β) can be constructed as G = πβ©Ύ0 πΊπ , where πΊ0 is the set of all unattacked arguments, and βπ β©Ύ 0, πΊπ+1 is the set of arguments that πΊπ defends. For any (Args, β), the grounded extension G always exists and is unique and, if (Args, β) is well-founded [4], extensions under other semantics (e.g. stable extensions) are equal to G. For finite AFs, (Args, β) is well-founded iff it is acyclic. Given (Args, β), we will sometimes use πΌ β (Args, β) to stand for πΌ β Args. 2.2. Abstract Argumentation for Case-Based Reasoning We will present π΄π΄-πΆπ΅π following the presentation of Paulino-Passos and Toni [5], referring to the model as π΄π΄-πΆπ΅π βͺ° . π΄π΄-πΆπ΅π βͺ° maps a dataset π· of examples labelled with an outcome and an unlabelled example (with unknown outcome) into an AF. The dataset may be understood as a casebase, the labelled examples as past cases and the unlabelled example as a new case: we will use these terminologies interchangeably throughout. Cases have a characterisation, and outcomes are chosen from two available ones, one of which is selected up-front as the default outcome. Finally we assume that the set of characterisations of (past and new) cases is equipped with a partial order βͺ― (whereby πΌ βΊ π½ holds if πΌ βͺ― π½ and πΌ ΜΈ= π½ and is read βπΌ is less specific than π½β) and with a relation ΜΈβΌ (whereby πΌ ΜΈβΌ π½ is read as βπ½ is irrelevant to πΌβ). Previous literature shows examples of partial orders for specific applications [6]. Formally: Definition 1. Let π be a set of characterisations, equipped with partial order βΊ and binary relation ΜΈβΌ. Let π = {πΏπ , πΏΒ―π } be the set of (all possible) outcomes, with πΏπ the default outcome. Then, a casebase π· is a finite set such that π· β π Γπ (thus a past case πΌ β π· is of the form (πΌπΆ , πΌπ ) for πΌπΆ β π, πΌπ β π ) and a new case is of the form (ππΆ , ?) for ππΆ β π. We also discriminate a particular element πΏπΆ β π and define the default argument (πΏπΆ , πΏπ ) β π Γπ . A casebase π· is coherent iff there are no two cases (πΌπΆ , πΌπ ), (π½πΆ , π½π ) β π· such that πΌπΆ = π½πΆ but πΌπ ΜΈ= π½π , and it is incoherent otherwise. For simplicity of notation, we sometimes extend the definition of βͺ° and β to π Γ π , by setting (πΌπ , πΌπ ) βͺ° (π½π , π½π ) iff πΌπ βͺ° π½π . Definition 2. The AF mined from a dataset π· and a new case (ππΆ , ?) is (Args, β), in which: β’ Args = π· βͺ {(πΏπΆ , πΏπ )} βͺ {(ππΆ , ?)}; β’ for (πΌπΆ , πΌπ ), (π½πΆ , π½π ) β π· βͺ {(πΏπΆ , πΏπ )}, it holds that (πΌπΆ , πΌπ ) β (π½πΆ , π½π ) iff 1. πΌπ ΜΈ= π½π , 2. πΌπΆ βͺ° π½πΆ , and 3. β(πΎπΆ , πΎπ ) β π· βͺ {(πΏπΆ , πΏπ )} with πΌπΆ β» πΎπΆ β» π½πΆ and πΎπ = πΌπ ; β’ for (π½πΆ ,π½π )βπ· βͺ{(πΏπΆ , πΏπ )}, it holds that (ππΆ , ?)β(π½πΆ ,π½π ) iff (ππΆ , ?)ΜΈβΌ(π½πΆ ,π½π ). The AF mined from a dataset π· alone is (Args β² , ββ² ), with Args β² = Args β {(ππΆ , ?)} and ββ² =β β©(Args β² Γ Args β² ). Definition 3. Let G be the grounded extension of the AF mined from π· and (ππΆ , ?), with default argument (πΏπΆ , πΏπ ). The outcome for ππΆ is πΏπ if (πΏπΆ , πΏπ ) is in G, and πΏΒ―π otherwise. Definition 4. The AF mined from π· alone and the AF mined from π· and (ππΆ , ?), with default argument (πΏπΆ , πΏπ ), are regular when the following holds: 1. the irrelevance relation ΜΈβΌ is defined as: π₯1 ΜΈβΌ π₯2 iff π₯1 ΜΈβͺ° π₯2 , and 2. πΏπΆ is the least element of π.1 In this paper we will restrict our attention to regular mined AFs. We will refer to the (regular) AF mined from π· and (ππΆ , ?), with default argument (πΏπΆ , πΏπ ), as π΄πΉβͺ° (π·, ππΆ ), and to the (regular) AF mined from π· alone as π΄πΉβͺ° (π·). Example 1. For concreteness, let us consider a simple example of credit application. Applicants will be described by two features: age and income. System developers decided from background knowledge and previous data that the default case of an applicant is of {age:30; income:25} (age in years, income in Β£1000 per year) and by the default it is denied (π¦ = 0). The casebase is formed of the following cases: π· = {({age:20; income:100}, 1), ({age:40; income:23}, 1), ({age:32; income:40}, 1), ({age:60; income:42}, 0), ({age:86; income:150}, 1)} More formally, we could say inputs are elements of π = R2 , and outputs of π = {0, 1}. We will interpret the partial order in the following way: {age:π; income:π} βͺ― {age:π2; income:π2} iff i) 30 β€ π β€ π2 or π2 β€ π β€ 30; and ii) 25000 β€ π β€ π2 or π2 β€ π β€ 25000. Essentially, a case πΌ is smaller than π½ when, for both dimensions (features), the value of the dimension for πΌ is between π½ and the default value. Notice this means that the notion of atypicality, represented by the partial order βͺ―, is neither being too big or being too small. The corresponding framework is presented in Figure 1. 1 Indeed this is not a strong condition, since it can be proved that if π₯πΌ ΜΈβͺ° πΏπΆ then all cases (π₯πΌ , π¦πΌ ) in the casebase could be removed, as they would never change an outcome. On the other hand, assuming also the first condition in Definition 4, if (π₯πΌ , ?) is the new case and π₯πΌ ΜΈβͺ° πΏπΆ , then the outcome is πΏΒ―π necessarily. ((30, 25), 0) ((20, 100), 1) ((40, 30), 1) ((32, 40), 1) ((60, 42), 0) ((86, 150), 1) Figure 1: Case base π· from Example 1 structured as an AA framework (i.e. without a focus case). 3. The interactive process A more in-depth discussion of interactive explanations as reasoning is in previous literature [1]. Here, we make a more concise presentation, so the key concepts can be applied later. 3.1. Intuition The motivation for interactive explanations is scenarios where a user evaluates the behaviour of a model via multiple, sequential, queries. They may ask for the output for a specific input, and ask as well for an explanation. The explanation may motivate then a new query, resulting in a new output and explanation. One may thus ask what should we a priori expect of such process, or such sequence of explanations. We thus consider an interactive process (as overviewed in Fig. 2), where the user queries for an output and explanation thereof, given an input, the system returns them, and then the user may query again. We see the AI system as including a classifierand an explainer, with both considered black-boxes by our conceptual model. We assume that the explanation method (although not the classifier) can keep track of the history of inputs it received and their outputs. This allows supporting two scenarios: i) of an explainer that tries to improve its explanations by knowing what it explained before; and ii) of a malicious explainer that, trying to manipulate or mislead the user and to avoid being detected, keeps track of what was explained before. Histories give a snapshot of the process by finite sequences of inputs and their outputs. Note that we assume the explainer to be a function, thus ignoring randomness of the explanation method itself. This assumption implies that no information about the computed explanations needs to be stored in the history, as awareness of earlier inputs and outputs already gives all information needed to infer what explanations were also previously returned. Example 2. Let us illustrate the framework with a model of counterfactual explanations. We will continue Example 1 and assume the classifier is a regular π΄π΄-πΆπ΅π βͺ° model mined from π·. For illustrative reasons, we will assume this classifier is treated as a black-box model. As for the counterfactual explanations, we will not assume a particular algorithm, but we will assume it 3. (π₯, π¦), πΈ User 1. π₯ Classifier Explainer 2. (π₯, π¦) System ((π₯0 , π¦0 ), . . . , (π₯π , π¦π )) Figure 2: Overview of the interactive explanation process. 1. The user queries the system with an input π₯, which goes to the classifier. 2. The classifier produces output π¦, and sends the pair (π₯, π¦) to the explainer. 3. The explainer produces an explanation πΈ for (π₯, π¦), using information from the history of inputs/outputs ((π₯0 , π¦0 ), . . . , (π₯π , π¦π )), and sends (π₯, π¦) and πΈ back to the user, who may then stop or further query the system. finds a counterfactual by heuristical search [7, 8]. Besides, we will interpret it with a minimality assumption: a counterfactual is taken to be minimally different from the input, corresponding to the idea that it is as close as possible [7]. This way, for a single input-output pair (π₯, π¦), the method returns a new input-output pair (π₯ππ , π¦ππ ). For describing the intended interpretation is: let (Ξπ , Ξπ ) = π₯ β π₯ππ , and (Ξβ²π , Ξβ²π ) such that 0 β€ Ξβ²π β€ Ξπ , and 0 β€ Ξβ²π β€ Ξπ . For any such (Ξβ²π , Ξβ²π ), π₯β² = π₯ + (Ξβ²π , Ξβ²π ) is intuitively a smaller change to π₯ than π₯ππ , and thus is expected to be classified as π¦, and not as π¦ππ . For example, for a case π₯0 = {age:50; income:50} the output is π¦0 = 1, if the counterfactual is π0 = ({age:75; income:43}, 0), then for input π₯1 = {age:65; income:45} the expected output would still be 1. Now suppose a user interacts with the system querying for π₯0 , receives π¦0 as output and π0 as an explanation. What would be expected if the user queries for π₯1 , but instead receives π¦1 = 0 and π1 = ({age:59; income:43}, 1)? Is this an inconsistency of the method? Could it be solved? 3.2. A formal model of the interactive process We assume an input set π and an output set π as well as a set of possible explanations β°, that we keep abstract. Regarding notation, βοΈ for aπ set π,π we denote the set of finite sequences of elements of π as πππ(π), i.e., πππ(π) = πβN π for π a sequence of π elements of π. Given π β N, we use the notation [π] = {π β N | π β€ π}. Thus a sequence (π 0 , π 1 , . . . , π π ) β πππ(π) can be written as (π π )πβ[π] . We consider that the system is composed of a classifier C : π β π and an explanation method E : πππ(π Γ π ) β πππ(β°), mapping from a sequence of input-output pairs (π₯π , π¦π )πβ[π] to a sequence of explanations (πΈπ )πβ[π] , of the same length as the sequence of pairs. Notice that the explainer uses information on the entire past of inputs-outputs, as we discussed in Section 3.1. We can think of this sequence of pairs as a history. We consider that at each time step π‘ β N, the user queries for an input π₯π‘ β π, which receives a classification C(π₯π‘ ) = π¦π‘ and an explanation πΈπ‘ . In this way, the explainer provides an explanation motivated by a specific input-output, while considering the history ((π₯π , π¦π ))πβ[π‘β1] . β’ πππ(π Γ π ) π Γπ |= E |= β’π πππ(β°) β° Figure 3: Diagram illustrating the relations between sequences of input-output pairs, sequences of explanations, single pairs and single explanations. This diagram does not always commute (indicated by dashed lines) since for a sequence of explanations β’-entailing an input-output pair, there may not exist a single explanation β’π -entailed by this sequence which |=-entails this pair. An important particular case is when there is a function Eβ : π Γ π β β°, mapping from a single example (π₯, π¦) to an explanation πΈ. In this case, the explainer function E can be defined as applying Eβ to each element of the sequence: formally, E(((π₯π , π¦π ))πβ[π] ) = (Eβ (π₯π , π¦π ))πβ[π] . In this particular case the history is disregarded when explanations are computed. This view of the interactive explanation process does not enforce that previously exhibited explanations are kept, that is, that πΈπ is unchanged for all π β [π‘] when π₯π‘+1 is queried. Past explanations being unretractable, in the sense that the system cannot replace the explanation of any previous query, is captured by the following property: Definition 5 (Interaction-stability). An explainer E is said to be interaction-stable whenever, for every sequence of input-output pairs (π₯π , π¦π )πβ[π] and for every π < π, if (πΈπ )πβ[π] = E((π₯π , π¦π )πβ[π] ) and (πΈπβ² )πβ[π] = E((π₯π , π¦π )πβ[π] ) then πΈπ = πΈπβ² for any π β [π]. That is, an interaction-stable explainer will always keep the explanation πΈπ associated to the pair (π₯π , π¦π ), even as the interaction moves on. It is straightforward to see that an explainer E derived from a function Eβ is always interaction-stable. With this setup, inference can be defined. We assume that a sequence of explanations (πΈπ )πβ[π] βcommitsβ to some model behaviour. We model this by an entailment relation |= between πππ(β°) and π Γπ , in such a way that (πΈπ )πβ[π] |= (π₯, π¦) means that (πΈπ )πβ[π] βcommitsβ to the outcome π¦, given the input π₯. We will abuse notation and define πΈ |= (π₯, π¦) to mean (πΈ) |= (π₯, π¦) (for (πΈ) the sequence with just one explanation, πΈ). This entailment relation we keep abstract and application-dependent. What is important is that it captures how a user would interpret the explanation or plausible inferences therefrom, including as regards the input-output being explained. One example is explanations as sufficient reasons: any sufficient reason is exactly a rule that guarantees the output for a part of the input space, including the given input. An important particular case of entailment is when it does not depend on the order of the elements of the sequence. In this case, a set-based representation would be enough, and it is in this sense that sequences generalise sets. From this core notion of |=, relating explanations to input-output, we can derive two βho- mogeneousβ notions of βentailmentβ, that is, from sequences of elements of a set to elements of the same set. This makes that notion more analogous to the notion of entailment in logic, which is defined from sets of formulas to a single formula. One such notion is at input-output level, and the other at explanation level. For the former, we say (π₯π , π¦π )πβ[π] β’ (π₯, π¦) iff E((π₯π , π¦π )πβ[π] ) |= (π₯, π¦). For the latter, (πΈπ )πβ[π] β’π πΈ iff β(π₯, π¦) β π Γ π , if πΈ |= (π₯, π¦) then (πΈπ )πβ[π] |= (π₯, π¦) (summarised in Fig. 3). 3.3. Consistency and non-monotonicity We shall now define properties of explainers, in particular of consistency and non-monotonicity for the relations associated with explainers. Definition 6 (Consistency). A sequence of explanations (πΈπ )πβ[π] is said to be consistent iff there does not exist π₯ β π, π¦, π¦ β² β π , with π¦ ΜΈ= π¦ β² , such that (πΈπ )πβ[π] |= (π₯, π¦) and (πΈπ )πβ[π] |= (π₯, π¦ β² ). An entailment relation |= is said to be consistent iff every sequence of explanations is consistent. A relation β’ is said to be consistent iff there does not exist π₯ β π, π¦, π¦ β² β π , with π¦ ΜΈ= π¦ β² , and ((π₯π , π¦π ))πβ[π] such that ((π₯π , π¦π ))πβ[π] |= (π₯, π¦) and ((π₯π , π¦π ))πβ[π] |= (π₯, π¦ β² ). Since the relations β’ and β’π , derived from the base notion of |= , are βhomogeneousβ, we can define properties borrowed from the literature on non-monotonic reasoning [9, 10], what would not be possible for the relation |=. We only generalise them to sequences, instead of sets (as typical). Formally, some properties are: Definition 7 (Non-monotonicity). The relation |= is said to be non-monotonic iff there is (πΈπ )πβ[π] , πΈπ+1 and (π₯, π¦) such that (πΈπ )πβ[π] |= (π₯, π¦) and (πΈπ )πβ[π+1] ΜΈ|= (π₯, π¦). Also, given a set π, a relation β’β² from πππ(π) to π, and π , π π β π, for π β N, the relation β’β² is said to satisfy non-monotonicity iff there is (π π )πβ[π] , π π+1 , π s.t. (π π )πβ[π] β’β² π and (π π )πβ[π+1] ΜΈβ’β² π . Which concrete methods satisfy which of those properties and what is the impact on user experience are open questions. Example 2 (continued). An instantiation for counterfactuals as presented in this example is that an explanation is a tuple of original input, output, and counterfactual and counterfactual output (π₯, π₯ππ , π¦ππ ). Thus the explainer E is interaction-stable, based on a Eβ that, given a single input- output pair, returns this tuple. The entailment relation is then defined as: for a single explanation πΈ = (π₯, π¦, π₯ππ , π¦ππ ), πΈ |= (π₯β² , π¦ β² ) iff: 1. π₯β² = π₯ππ and π¦ β² = π¦ππ ; 2. π₯β² = π₯ and π¦ β² = π¦; or 3. with (Ξπ , Ξπ ) = π₯ β π₯ππ and (Ξβ²π , Ξβ²π ) = π₯ β π₯β² , 0 β€ Ξβ²π β€ Ξπ , and 0 β€ Ξβ²π β€ Ξπ and π¦β² = π¦β². For a sequence of explanations, a naΓ―ve aggregation can be used: (πΈπ )πβ[π] |= (π₯β² , π¦ β² ) iff there is π such that πΈπ |= (π₯β² , π¦ β² ). Using this definition, the behaviour in Example 2 is inconsistent. An alternative method could aggregate restricting by a specificity criterion: say πΈ covers π₯β² iff there is π¦ β² such that πΈ |= (π₯β² , π¦ β² ). We say that πΈ1 is more specific than πΈ2 if the set of inputs covered by πΈ1 is a subset of the set of inputs covered by πΈ2 . Then redefine for sequences in the following way: (πΈπ )πβ[π] |= (π₯β² , π¦ β² ) iff there is π such that πΈπ |= (π₯β² , π¦ β² ) and there is no πΈπ such that πΈπ |= (π₯β² , π¦ β²β² ), where π¦ β² ΜΈ= π¦ β²β² . By this definition, even if π0 |= (π₯1 , 1), (π0 , π1 ) |= (π₯1 , 0) and (π0 , π1 ) ΜΈ|= (π₯1 , 1). This would make such relation consistent and non-monotonic. 4. Arbitrated dispute trees Dispute trees [11, 12] have been defined as explanations to π΄π΄-πΆπ΅π since Δyras et al. [2], and further developed in the form of arbitrated dispute trees for an extended version of it in Δyras et al. [3]. This last form is the one which will interest us, due to its symmetry regarding default and non-default outcomes. For compatibility with our presentation, we will redefine the main concepts, but omit results proven originally, since the adaptations are minimal. Definition 8. Let π΄πΉβͺ° (π·, π₯) = (Args, β). An arbitrated dispute tree (ADT) is a tree ππ― such that: 1. every node of ππ― is of the form [N : πΌ] for π β {π, πΏ} and πΌ β Args, any such node being called an π -node labelled by argument πΌ; 2. the root of ππ― is labelled by the default argument (πΏπΆ , πΏπ ) and is a W-node, if π΄π΄-πΆπ΅π βͺ° (π·, π₯) = πΏπ ; and a L-node otherwise; 3. for every W-node π labelled πΌ β Args and for every π½ attacker of πΌ in (Args, β), there is a child π of π such that π is a L-node labelled by π½; 4. for every L-node π labelled by πΌ β Args, there is exactly one child which is a W-node labelled by some π½ attacker of πΌ; and 5. there are no other nodes in ππ― . We will also refer to ππ― as an arbitrated dispute tree for (the prediction for) π₯. An example of a dispute tree is in Figure 4a. Dispute trees have been originally defined as being possibly infinite, while Δyras et al. [3] assumes casebases are coherent, and one can thus prove every arbitrated dispute tree in that context is finite2 . While we make no such assumption, we consider infinite dispute trees to be inappropriate for explaining, so we retain the original definition but explicitly define ADT explanations to limit ourselves to the finite case. Definition 9. An arbitrated dispute tree explanation is a finite arbitrated dispute tree. 5. Arbitrated dispute trees as interactive explanations While arbitrated dispute trees have been used as explanations in previous literature [3], here we instantiate our framework for interactive explanations as ADTs. Our main question is: given an ADT, to what model behaviour does it βcommitβ? What should the |= relation be for them? ADTs are way of presenting and explaining the behaviour of π΄π΄-πΆπ΅π as a more succinct structure than the entire argumentation framework. Our idea here is simply that such presenta- tion can be generalised to other cases: intuitively, an ADT for a new case π₯ may show a set of sufficient reasons for justifying a decision for another new case π₯2. In a sense, π₯2 does not need more information than was already presented in that ADT in order to be decided, therefore the rest of the AF is unnecessary. This can be verified simply by the structure of the decision tree and by checking for relevance. Let us first define this formally. 2 This not being the case, some results from the original presentation are inapplicable, such as [3, Prop. 3.4]. [L : ((30, 25), 0)] [L : ((30, 25), 0)] [L : ((30, 25), 0)] [W : ((40, 30), 1)] [W : ((40, 30), 1)] [W : ((40, 30), 1)] [L : ((60, 42), 0)] [L : ((60, 42), 0)] [L : ((60, 42), 0)] [W : ((86, 150), 1)] [W : ((86, 150), 1)] [W : ((50, 50), ?)] (a) (b) (c) Figure 4: Example of ADT explanation in Example 4. Figure 4a shows the ADT explanation for input π₯2 = {age:90; income:200}. Figures 4b and 4c show the process of adapting it to the new input π₯0 = {age:50; income:50}. In Figure 4b, in red are the nodes labelled by arguments which are irrelevant to π₯0 . There is a single leaf, which is a W node. Therefore this ADT can be adapted. This is done in Figure 4c, with the (single) attacker of the leaf added back, but now attacked by the new case. 5.1. Readaptation of ADTs Let π· be a dataset, and (Args, β) = π΄πΉβͺ° (π·) be its corresponding argumentation framework. Let π₯ be a new case, and ππ― be an ADT explanation for its prediction. Now let π₯2 be a second new case. Depending on the partial order relation, on the ADT and on the new cases, it might be possible to reuse the ADT ππ― , without looking at (Args, β) directly, to decide the outcome for π₯2. Indeed, in that case another ADT would be generated as well. The two core ideas are: 1) if a past case is relevant to the new case, then every case smaller than it is also relevant; 2) every attacker of a W-node is in an ADT (as a L-node). Theorem 10. Let ππ― β² be ππ― with all nodes labelled by π₯1 removed. For every leaf π of ππ― β² , let πππ₯π be the node in the path from the root to π which is maximally relevant to π₯2 (that is, there is no node in the path greater than it such that it is also relevant to π). If all πππ₯π are W-nodes, then the predicted outcome for π₯2 is the same as the predicted outcome for π₯. Besides, let ππ― π₯2 be the tree constructed by the following process: start with the subtree of ππ― β² containing only πππ₯π and their ancestors, add all L-nodes which are children of πππ₯π in ππ― β² , and, finally, for each L-node added in this way, add as a child a new W-node labelled by π₯2. Then ππ― π₯2 is an ADT for the prediction on π₯2. When this is the case for new cases π₯ and π₯2, we will say ππ― can be readapted to π₯2. Before proving the theorem, let us illustrate the process: Example 3. Given our casebase π·, suppose we would like to have the prediction for π₯2 = {age:90; income:200} and an ADT explanation. One would notice that the outcome is π¦2 = 1 with the ADT in Figure 4a. The idea is, if one would like to predict for π₯0 = {age:50; income:50}, this ADT could be readapted in a straightforward way. The processed is illustrated in the remaining of Figure 4. Proof of Theorem 10. For every π, πππ₯π exists: by regularity, the default case (thus the root) is relevant to π₯2. Let us see that ππ― π₯2 is an ADT. Clearly the condition 1 is satisfied. Let us check conditions 3 and 4 for each node. For every branch, every node until πππ₯π satisfies the conditions in Definition 8, since ππ― is an ADT. Now consider πππ₯π . By assumption, every πππ₯π is a W-node. Again, since ππ― is an ADT for π₯, for every πππ₯π , each child of it is included in ππ― as a L-node child. Since those are also in ππ― π₯2 , the condition is satisfied for each πππ₯π . Next, each of those attackers require exactly one W-node as a child, attacking it. This is satisfied by every added W-node labelled by π₯2, which is an attacker since the arguments which label such L-nodes are irrelevant (otherwise πππ₯π would not be maximally relevant). Finally, since π₯2 has no attackers in π΄πΉβͺ° (π·, π₯2,), it satisfies the conditions of a leaf. Condition 5 is clearly satisfied. The last requirement to check is whether condition 2 is satisfied. Indeed it is, since, given that the other conditions are satisfied, then the set of arguments labelling W-nodes is in the grounded extension of π΄πΉβͺ° (π·, π₯2), which can be verified by induction [3, Prop. 3.3]. Therefore if the root is a W-node, it is in the grounded extension and thus the prediction if πΏπ . Otherwise, it is a L-node and then it has as a child which is a W-node, that is, the default argument is attacked by the grounded extension (and thus not in it, since it is conflict-free) and so the prediction of π₯2 is the πΏΒ―π . 5.2. The |= relation for dispute trees Now we can instantiate our framework with dispute trees. We will define Eβ as a function that receives a new case π₯ and its predicted outcome π΄π΄-πΆπ΅π βͺ° (π·, π₯) = π¦, and returns an ADT explanation ππ― for it. While the choice of the decision tree could matter for practical applications, for our purposes our concern is only that some decision tree is returned. Regarding existence, it requires the argumentation frameworks to be acyclic. This can be achieved by restricting ourselves to coherent casebases, or instead applying ππ΄π΄-πΆπ΅π βͺ° , a variation of the original method which guarantees acyclic AFs [5]. For the rest of the section we will assume those conditions. We can then easily instantiate the remaining of the framework, extending for sequences as previously suggested E(((π₯π , π¦π ))πβ[π] ) = (Eβ (π₯π , π¦π ))πβ[π] , and defining |= as: (ππ― π )πβ[π] |= (π₯, π¦) iff there is π such that ππ― π can be readapted to π₯. Indeed, this readapted tree ππ― π₯ is precisely an explanation such that ππ― π β’π ππ― π₯ . The intuition for this relation is a possible answer to the question: what does and ADT explanation say about other inputs, beyond the originally one being explained? Surely it also reveals how the cases which label the appearing nodes are decided, but would that be all? What we show from Theorem 10 is precisely that, for a new input case, if, when filtering the tree for only nodes labelled by relevant arguments, all leaves are W-nodes, then not only we can guarantee what the output is, but also acquire an explanation for it via a transformation. Also notice that this all is done without running the prediction algorithm entirely from scratch. How could this be used to accelerate inference time in practice is outside the scope of this paper. Another interesting aspect to notice is that, if there is a single L-node as a maximally relevant node, then nothing can be said. This can be seen from Figure 4. For a new case {age:19; income:110}, the only relevant case is the root (the default case), which is a L-case. However the prediction for it is still 1 since the case ({age:20; income:100}, 1) is relevant, and would attack the default case in the grounded extension. The slightly different new case {age:21; income:110} would have no such attacker. We can then state the following consequences: Theorem 11. Let E(((π₯π , π¦π ))πβ[π] ) = (πΈπ )πβ[π] , and (π₯, π¦) β π Γ π Then: 1. E is interaction-stable; 2. (πΈπ )πβ[π] |= (π₯, π¦) iff there is an explanation πΈ such that (πΈπ )πβ[π] β’π πΈ and πΈ |= (π₯, π¦); 3. (faithfulness) if (πΈπ )πβ[π] |= (π₯, π¦), then π΄π΄-πΆπ΅π βͺ° (π·, π₯) = π¦. 4. |= is consistent; 5. |= is monotonic. Those are straightforward from |= being defined from existence of an explanation in the sequence and from the fact that all arguments labelling W-nodes are contained in the grounded extension [3, Prop. 3.3]. ADT explanations are simple, but provably faithful explanations for π΄π΄-πΆπ΅π . This should not be surprising, since this inference relation is conservative: a single L-node as a maximally relevant node in the tree makes the entire ADT unusable. 6. Discussion and conclusion A possible discussion is whether the entire set of ADTs for a input could not be returned instead. While this is a possibility, likely for a practical application all this information would be used, but still adapted and filtered in a way not to avoid overwhelming the final user. This is especially the case since many different dispute trees could have overlaps, being partly redundant. Depending on context, this could be less informative than having the entire argumentation framework, since although not a tree, there is no repetition of cases in there. Thus having a single ADT seems a more effective strategy. The condition is, as previously mentioned, that either the casebase is coherent, or that ππ΄π΄-πΆπ΅π βͺ° is used [5], since there is also a correspondence in that given an outcome, there is always a dispute for explaining that decision [3, Prop.3.4]. We leave for future work an exploration of how would ADTs be for real applications of π΄π΄-πΆπ΅π , including how often ADTs can be reused, and whether this can create gains in inference time. Another aspect left for future work is user studies for evaluating the effectiveness of ADT as explanations. An interesting question is whether ADTs with maximally relevant L-nodes could be readapted in some way, even if fallible. An idea would be considering those prima facie explanations, resulting in a non-monotonic entailment relation from sequences of explanations. Acknowledgments GPP was supported by Capes (Brazil, Ph.D. Scholarship 88881.174481/2018-01). FT was partially funded by the European Research Council (ERC) under the European Unionβs Horizon 2020 research and innovation programme (grant agreement No. 101020934) and by J.P. Morgan and by the Royal Academy of Engineering under the Research Chairs and Senior Research Fellowships scheme. Any views or opinions expressed herein are solely those of the authors. References [1] G. Paulino-Passos, F. Toni, On interactive explanations as non-monotonic reasoning, in: Workshop on Explainable Artificial Intelligence (XAI) at IJCAI 2022, 2022. [2] K. Δyras, K. Satoh, F. Toni, Explanation for case-based reasoning via abstract argumenta- tion, in: P. Baroni, T. F. Gordon, T. Scheffler, M. Stede (Eds.), Computational Models of Argument - Proceedings of COMMA 2016, Potsdam, Germany, 12-16 September, 2016, vol- ume 287 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2016, pp. 243β254. doi:10.3233/978-1-61499-686-6-243. [3] K. Δyras, D. Birch, Y. Guo, F. Toni, R. Dulay, S. Turvey, D. Greenberg, T. Hapuarachchi, Explanations by arbitrated argumentative dispute, Expert Syst. Appl. 127 (2019) 141β156. doi:10.1016/j.eswa.2019.03.012. [4] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games, Artificial Intelligence 77 (1995) 321 β 357. [5] G. Paulino-Passos, F. Toni, Monotonicity and noise-tolerance in case-based reasoning with abstract argumentation, in: M. Bienvenu, G. Lakemeyer, E. Erdem (Eds.), Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, Online event, November 3-12, 2021, 2021, pp. 508β518. doi:10.24963/kr.2021/ 48. [6] O. Cocarascu, A. Stylianou, K. Δyras, F. Toni, Data-empowered argumentation for dialec- tically explainable predictions, in: ECAI 2020 - 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 10-12 June 2020, 2020. [7] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the black box: Automated decisions and the gdpr, Harvard Journal of Law & Technology 31 (2018). [8] M. Hildebrandt, C. Castillo, L. E. Celis, S. Ruggieri, L. Taylor, G. Zanfir-Fortuna (Eds.), FAT* β20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, ACM, 2020. doi:10.1145/3351095. [9] D. Makinson, General patterns in nonmonotonic reasoning, in: D. M. Gabbay, C. J. Hogger, J. A. Robinson (Eds.), Handbook of Logic in Artificial Intelligence and Logic Programming - Volume 3 - Nonmonotonic Reasoning and Uncertain Reasoning, Oxford University Press, 1994, pp. 35β110. [10] S. Kraus, D. Lehmann, M. Magidor, Nonmonotonic reasoning, preferential models and cu- mulative logics, Artif. Intell. 44 (1990) 167β207. doi:10.1016/0004-3702(90)90101-5. [11] P. M. Dung, R. A. Kowalski, F. Toni, Dialectic proof procedures for assumption-based, admissible argumentation, Artif. Intell. 170 (2006) 114β159. doi:10.1016/j.artint. 2005.07.002. [12] P. M. Dung, P. Mancarella, F. Toni, Computing ideal sceptical argumentation, Artif. Intell. 171 (2007) 642β674. doi:10.1016/j.artint.2007.05.003.