Everyday Argumentative Explanations for Classification Jowan van Lente1 , AnneMarie Borg1 and Floris Bex1,2 1 Department of Information and Computing Sciences, Utrecht University, The Netherlands 2 Tilburg Institute for Law, Technology, and Society, Tilburg University, The Netherlands Abstract In this paper we study everyday explanations for classification tasks with formal argumentation. Every- day explanations describe how humans explain in day-to-day life, which is important when explaining decisions of AI systems to lay users. We introduce EVAX, a model-agnostic explanation method for classifiers with which contrastive, selected and social explanations can be generated. The resulting explanations can be adjusted in their size and retain high fidelity scores (an average of 0.95). Keywords Explainable artificial intelligence, Formal Argumentation, Classification 1. Introduction A recent trend in explainable artificial intelligence (XAI) is hybrid (or neuro-symbolic) ap- proaches, where the performance of learning-based systems is combined with the transparency of knowledge-based AI [1]. One such knowledge-based approach that seems suitable for this purpose is formal argumentation [2], see e.g., [3]. Formal argumentation is designed to model the argumentative nature and defeasible character of human reasoning, by means of argumenta- tion frameworks: a set of arguments and an attack relation between these arguments. Although argumentative XAI is relatively new, several methods have been proposed, see [3] for a recent overview. In this paper we are interested how formal argumentation can contribute to the modeling of explanations, as described in [4]: explanations of a specific event or decision for human (non-expert) end users. Specifically, we study: β€’ everyday explanations: explanations as used by humans in day-to-day life. Unlike scientific explanations, these need not be based on general laws. We will focus on local explanations (i.e., explanations for a specific outcome) and assume a receiver who benefits from a smaller explanation. β€’ contrastive, selected and social explanations: among the main findings in [4] is that ex- planations are: contrastive, when explaining 𝑃 , humans often expect the explanation to highlight the difference between 𝑃 and something else (e.g., 𝑄), the explanation answers ArgML’22: 1st International Workshop on Argumentation & Machine Learning, September 13, 2022, Cardiff, Wales, UK jowanvanlente@gmail.com (J. van Lente); a.borg@uu.nl (A. Borg); f.j.bex@uu.nl (F. Bex)  0000-0002-7204-6046 (A. Borg); 0000-0002-5699-9656 (F. Bex) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) why 𝑃 rather than 𝑄?; selected, not all possible explanations are returned, but rather just one or two are selected based on a cognitive bias, such as abnormality or responsibility; and social, explanations (e.g., their size or content) are adjusted to the receiver. β€’ faithfulness: when explaining the outcome of a black box, learning-based system, it should be faithful to the system and its mechanisms. In order to model the above properties in an argumentative explanation method, we introduce EVAX, an argumentative explanation method for everyday explanations of decisions derived with a classifier. EVAX is a model-agnostic method, which only requires the input and output of a classifier and can then compute, faithfully, explanations which are contrastive, selected and social. The paper is structured as follows. Section 2 contains the preliminaries after which EVAX is introduced (Section 3). We present a quantitative (Section 4) and qualitative (Section 5) evaluation. Related work is discussed in Section 6 and we conclude in Section 7. 2. Preliminaries In this section we recall the necessary preliminaries on formal argumentation and classification tasks and present our definition of arguments and defeats. 2.1. Formal argumentation An abstract argumentation framework (AF) [2] is a pair π’œβ„± = ⟨Args, Def⟩, where Args is a set of arguments and Def βŠ† Args Γ— Args is an defeat relation on these arguments. Given an argumentation framework π’œβ„±, Dung-style semantics [2] can be applied to it, to determine what combinations of arguments (called extensions) can collectively be accepted. Definition 1. Let π’œβ„± = ⟨Args, Def⟩ be an AF, S βŠ† Args be a set of arguments and 𝐴 ∈ Args an argument. Then S defeats 𝐴 if there is an 𝐴′ ∈ S such that (𝐴′ , 𝐴) ∈ Def; S defends 𝐴 if S defeats every defeater of 𝐴; S is conflict-free if there are no 𝐴1 , 𝐴2 ∈ S such that (𝐴1 , 𝐴2 ) ∈ Def. S is admissible if it is conflict-free and it defends all of its elements, S is complete if it is admissible and it contains all the arguments it defends. The grounded extension of π’œβ„± is the minimal (w.r.t. βŠ†) complete extension, denoted by Grd(π’œβ„±). In abstract argumentation, arguments are abstract entities and the attack relation is pre- determined. In contrast, in structured argumentation [5], arguments are constructed from a knowledge base and a set of rules and the attacks are based on the structure of the resulting arguments. In both cases, the strength of the arguments determines whether an attack is successful (e.g., an attack by a stronger argument is successful and therefore also a defeat, but an attack by a weaker argument is not successful). While there is a variety of approaches to structured argumentation, we will use a simple notion of an argument: a triple of a premise, a conclusion and the strength of the argument. Definition 2. An argument 𝐴 is a triple (πœ“, πœ‘, 𝑝), where πœ“ is the premise (e.g., a feature), denoted by prem(𝐴) = πœ“, πœ‘ is the conclusion inferred from πœ“ (e.g., a class), denoted by conc(𝐴) = πœ‘ and 𝑝 is the strength value of the argument, denoted by str(𝐴) = 𝑝 where 0 ≀ 𝑝 ≀ 1. Based on the structure of the arguments, the defeat relation is determined: Definition 3. Let π’œβ„± = ⟨Args, Def⟩ be an AF and 𝐴, 𝐡 ∈ Args. Then (𝐴, 𝐡) ∈ Def iff conc(𝐴) ΜΈ= conc(𝐡) and str(𝐴) β‰₯ str(𝐡). 2.2. Classification As mentioned in the introduction, in this paper we are interested in explaining the outcome of a classification task with argumentation. Intuitively, classification is an inference task in which it is checked whether an object (e.g., an image, sound or text file) belongs to a category [6]. Definition 4. A feature is an attribute-value pair (π‘Ž, 𝑣) ∈ β„±, where π‘Ž is the label of the feature and 𝑣 is its corresponding value. Let β„± be a set of features, 𝒳 = {π‘₯1 , . . . , π‘₯𝑛 } be the input space, consisting of 𝑛 input points such that π‘₯𝑖 βŠ† β„± for all 𝑖 ∈ {1, . . . , 𝑛} and π’ž = {𝑐1 , . . . , π‘π‘š }. A classification task is a function which assigns to an input point π‘₯π‘˜ a class 𝑐𝑖 ∈ π’ž based on the input space 𝒳 . Example 1. Let π‘₯1 , . . . , π‘₯6 ∈ 𝒳 , where every π‘₯ ∈ 𝒳 is a student, and let π’ž = {0, 1}, where 1 represents a student being accepted to university, and 0 a rejection. The set of features consists of {𝑔, 𝑑, π‘š} ∈ β„±, where 𝑔 corresponds to the (rounded) average grade of the student, 𝑑 to whether or not the student passed the entry test and π‘š to whether or not they are motivated. Suppose that we are given the following input space: 𝒳 𝑔 𝑑 π‘š 𝑐 π‘₯1 8 1 0 1 π‘₯2 7 0 0 0 π‘₯3 6 1 1 1 π‘₯4 8 1 1 1 π‘₯5 7 0 1 0 π‘₯6 6 1 1 ? A classification task is then to determine whether student π‘₯6 is accepted or not. 3. EVAX: everyday argumentative explanations In this paper, we are interested in everyday explanations as described in [4], i.e., explanations of why a specific event/property/decision occurred for end users in a day-to-day setting. To ensure that our explanations fulfill these requirements, we follow the major findings in [4]: β€’ contrastive explanations provide reasons pro and con the outcome [3, 7]. In an argumen- tative setting, explanations are contrastive when arguments and counterarguments for the outcome are present in the explanation. β€’ selected explanations have a fixed maximum size, the elements of which are selected based on at least one cognitive bias. In an argumentative setting, explanations contain a maximum number of arguments. β€’ social explanations can be adjusted to the receiver, by varying the complexity or size of an explanation. Since explanations based on argumentation frameworks can be represented in a variety of ways [3], argumentative explanations are social by definition. Additionally, the explanations should remain faithful to the model (i.e., they explain the behavior of the model accurately). Our method EVAX takes as input a trained black box model and constructs a global set of arguments: for each feature 𝑓 and each class 𝑐 it determines the probability that input containing 𝑓 will be assigned 𝑐. For a specific input point, consisting of a set of features, a local argumentation framework is created (i.e., only containing the arguments corresponding to features from that input point and the defeats between them), from which the conclusion is predicted. This local argumentation framework can then be used to derive explanations, the size of which can be set by the user. We have implemented two ways to present explanations: based on abnormality and in a dialogue form. We start by describing the method of EVAX (Section 3.1), we will illustrate it with a toy example in Section 3.2. 3.1. Method outline EVAX takes as input a labeled dataset, a trained black box model BB and a threshold value 𝜏select that controls the size of the output. EVAX returns a set of predictions 𝒴pred and a set of local explanations β„°. The explanations 𝑒 ∈ β„° answer the question: β€œWhy did black box BB assign class 𝑐 to input instance π‘₯?” These explanations are deployments of an argumentation framework that represent the behavior of BB around a single datapoint in argumentative terms. This AF thus forms the basis for the explanations, and will, for every classified instance, be referred to as π’œβ„± 𝑙 . The size of π’œβ„± 𝑙 can be manually altered by 𝜏select . Algorithm 1 EVAX 1: procedure EVAX(BB, labeled_dataset, 𝜏select = 20) 2: 𝒳train , 𝒳test , 𝒴train , 𝒴test ← split_dataset(labeled_dataset, test_size = 0.2) 3: global_arguments ← get_global_arguments(BB, 𝒳train ) ◁ step 1 4: for π‘₯𝑖 in 𝒳test do 5: π’œβ„± 𝑙 ← create_local_AF(π‘₯𝑖 , global_arguments, 𝜏select ) ◁ step 2 6: predict(π’œβ„± 𝑙 ) ◁ step 3 7: explain(π’œβ„± 𝑙 ) ◁ step 4 8: results() 9: get_results(BB, predictions, 𝒴test ) The procedure of EVAX is shown in Algorithm 1.1 First, EVAX divides the labeled dataset into a set of unlabeled datapoints 𝒳 (the input space) and a set of labels 𝒴 (the target space), which are then split up into a train set and a test set, respectively 𝒳train , 𝒳test and 𝒴train , 𝒴test . The default size of the test set is 0.2, and the default 𝜏select value is 20. Afterward, the method can be divided into four main steps, which are described below. The first step handles all datapoints 1 See https://github.com/jowanvanlente/EVAX for the implementation. and is executed just once, whereas the other three steps handle a single datapoint and may be repeated multiple times, up to a maximum of the size of the test set. β€’ Step 1: Extract a global list of arguments, to represent the global behavior of BB. – EVAX first iterates over all features (π‘Žπ‘˜ , π‘£π‘˜ ) ∈ β„± of all (unlabeled) datapoints π‘₯𝑗 ∈ 𝒳train and all output classes 𝑐𝑖 ∈ π’ž, and computes for every feature-class pair a decision rule. These rules are accompanied by a precision score 𝑝(π‘˜,𝑖) that articulates the probability that BB will assign a datapoint with that particular feature to that particular class. It then saves all 𝑝(π‘˜,𝑖) scores in a triple ((π‘Žπ‘˜ , π‘£π‘˜ ), 𝑐𝑖 , 𝑝(π‘˜,𝑖) ), which is added to a list of triples. – Arguments are constructed based on the list of triples. For every triple an argument is constructed in which the feature (π‘Žπ‘˜ , π‘£π‘˜ ) is the premise prem, the output-class 𝑐𝑖 is the conclusion conc and the precision score 𝑝(π‘˜,𝑖) is set as the argument strength str. Together these arguments form the global list of arguments. β€’ Step 2: Create a local AF, π’œβ„± 𝑙 . – EVAX creates a local AF: π’œβ„± 𝑙 in every iteration of this step. This is an argumentation framework π’œβ„± 𝑙 = ⟨Args𝑙 , Def𝑙 ⟩ that represents the classifier’s behavior around one particular datapoint. Based on the values of that datapoint, it selects a set of relevant arguments (Args𝑙 βŠ† Args) from the global list of arguments (Args) and determines the defeats (Def𝑙 ). – Argument selection is done by matching the features of the datapoint from 𝒳test with the premises of the arguments: given a datapoint π‘₯𝑖 ∈ 𝒳test and an argument 𝐴 ∈ Args, if prem(𝐴) is one of the features in π‘₯𝑖 then 𝐴 is added to the local AF (meaning 𝐴 ∈ Args𝑙 ). As a result, all arguments with a premise corresponding to one of the features of the datapoint are selected. To gain computational efficiency and maintain selectedness, a threshold 𝜏select can be defined, which ensures only the top 𝜏select strongest arguments are included in the list. – The defeats are determined as in Definition 3. β€’ Step 3: Predict the output class based on π’œβ„± 𝑙 . – First, the grounded extension of the local AF (Grd(π’œβ„± 𝑙 )) is computed, after which the conclusion of the arguments in Grd(π’œβ„± 𝑙 ) is picked as the prediction. Formally, this means that prediction 𝑦𝑖 ∈ 𝒴 is equal to conc(𝐴) such that 𝐴 ∈ Grd(π’œβ„± 𝑙 ). Since arguments in the grounded extension are non-conflicting, they always have the same conclusion. Therefore it does not matter what argument in Grd(π’œβ„± 𝑙 ) is picked. When Grd(π’œβ„± 𝑙 ) is empty, EVAX will predict the majority class. β€’ Step 4: Explain – The current implementation allows for two variations: adding selectedness based on abnormality and presenting the explanation in a conversational form. – Abnormality is one of the methods humans use when selecting an explanation and describes how people tend to choose a cause that is unusual [8]. We have defined the abnormality of an argument as 1 βˆ’ coverage. The coverage value refers to the fraction of datapoints that the decision rule, out of which the argument is constructed, β€˜rules over’. In other words, the coverage of argument 𝐴 refers to the fraction of input instances that have a feature equal to prem(𝐴). Since the coverage describes how often a feature is present in a dataset, it essentially describes how β€˜normal’ a feature is. Therefore, a lower coverage means that a feature becomes less normal, thus becomes increasingly abnormal. The deployment of π’œβ„± 𝑙 then amounts to selecting the argument with the highest abnormality score that argues for the predicted class. An example of the output is given in Figure 1. a130: odor = 6 β†’ (precision = 1.0, abnormality = 0.989) β€˜π‘₯1 is poisonous because of its unusual pungent odor’ Figure 1: Example output of the most abnormal argument of π’œβ„± 𝑙 that explains why BB as- signed π‘₯1 (a mushroom) to class 𝑐1 (poisonous). On the right, we see the same explanation, but in natural language. – EVAX can also provide a dialectical representation of π’œβ„± 𝑙 , similar to a dispute tree [9]. This representation has the form of a discussion between a proponent (P) and opponent (O) about what class to assign to the datapoint in question. A thresh- old 𝜏explain allows the user to choose the number of arguments to include in the explanation. Arguments are divided into pro and con arguments and are put forward by P and O, who take turns. If the value of threshold 𝜏explain is even, O starts the dispute, and if it is odd, P starts. After the first argument is put forward, the strongest counterargument is replied.2 Note that the threshold is different from 𝜏select , because it does not affect the size of π’œβ„± 𝑙 , but merely the size of the dialectical representation of π’œβ„± 𝑙 . See Figure 2 for an example with 𝜏explain = 4. O: a93: gill-color = 4 β†’ 0 (0.84) O: since gill color is black, it is likely π‘₯1 has class edible P: a16: gill-size = 1 β†’ 1 (0.91) P: since gill size is narrow, it is likely π‘₯1 has class poisonous O: a143: spore-print-color = 2 β†’ 0 (0.87) O: since spore print color is black, it is likely π‘₯1 has class edible P: a130: odor = 6 β†’ 1 (1.0) P: since odor is pungent, it is certain π‘₯1 has class poisonous Figure 2: The dialectical explanation of the assignment of π‘₯1 to 𝑐1 by BB, as in Figure 1. The values between brackets refer to the precision score. One must read from top to bottom; the arrows solely indicate the conflicts, not necessarily defeats. 2 The only requirement of the counterargument is a conflicting conclusion, and not necessarily a higher strength value, i.e., it does not have to defeat the argument. This is to ensure that counterarguments are included in this explanation form. 3.2. Toy example Recall the classification task described in Example 1, on students being accepted into university. In Figure 3 we present a similar case to illustrate EVAX. It represents one iteration of Steps 2, 3, and 4. It thus assumes that the global list of arguments has already been computed. In this example, a black box predicts that an input instance β€˜John’ will be accepted into university. The same input instance is used as input for EVAX. Based on that input, EVAX creates a local argumentation framework π’œβ„± 𝑙 by selecting three relevant arguments, based on the three different features, and defines defeats over them. It then calculates the grounded extension and predicts that John will be accepted into university. In addition, it computes an AF-based explanation, which in this case is a dialectical representation of π’œβ„± 𝑙 , as described in Step 4. The threshold 𝜏explain has a value of 3. The arrows in the representation are the defeats. Note that the arrows between the different components of EVAX do not represent defeats, but indicate the information flow. O: John is motivated, therefore he usually gets accepted P: John has grade 6, therefore he usually gets declined O: John passed the test, therefore it is certain he gets accepted P: 𝐴3 John Grade: 6 O: 𝐴1 Passed test: yes Motivated: yes P: 𝐴2 π’œβ„± 𝑙 𝐴1 : (grade < 8) β†’ declined (0.7) black-box EVAX 𝐴2 : (passed test = yes) β†’ accepted (1.0) 𝐴3 : (motivated = yes) β†’ accepted (0.6) Defeats: (𝐴1 , 𝐴3 ), (𝐴2 , 𝐴1 ) accepted accepted fidelity calculator Figure 3: Illustation of EVAX applied to Example 1. 4. Quantitative evaluation We have tested EVAX on four datasets and used five quantitative metrics for the evaluation. The (labeled) datasets are from the UCI Machine Learning Repository [10]: β€’ With the Adult dataset one tries to predict whether or not a person makes more than 50.000 dollars a year. We removed all datapoints with unknown values and discretized the continuous features. β€’ The Mushroom dataset includes instances of 23 different species of mushrooms. The task is to identify whether a mushroom is poisonous or edible. We did not perform any alterations on this dataset. β€’ The task of the Iris dataset is to predict the type of iris plant. We discretized the continuous values. β€’ With the Wine dataset one wants to predict the type of wine of an input instance. Again we discretized the continuous values. The discretization of continuous variables is necessary to constrain the number of arguments that are added to the global list of arguments. We have used the 𝑐𝑒𝑑 method by pandas [11] with a bin value of 10. Since higher bin values tend to give better performance but reduce the computational efficiency, we have tuned this value by incrementally increasing the value from 3 up to 20. We found that from a bin value of 10 and upwards, the fidelity did not significantly in- crease (sometimes it even decreased), while the computational efficiency consistently decreased with higher bin values. For each of these datasets we have chosen four different machine learning models with different complexity to test the performance and range of EVAX : logistic regression, support vector machines (SVM), random forest, and neural networks. All four models are initialized from the scikit-learn library [12]. Finally, we applied the following five metrics for the evaluation: β€’ Fidelity indicates how well the explanation approximates the prediction of the black box model. It represents the fraction of datapoints that are assigned to the same output class by EVAX and BB. β€’ Accuracy ( BB) indicates how well our model performs on unseen data. It represents the fraction of correctly classified datapoints. The value between brackets () refers to the original accuracy of BB. β€’ Size measures the average minimum amount of arguments necessary to retain the same prediction. In other words, it is the lowest possible 𝜏select score without affecting the accuracy or fidelity. A consistent low size value indicates the method can guarantee to compute small explanations that are consistently faithful. β€’ Empty Grd specifies the fraction of datapoints for which the grounded extension Grd(π’œβ„± 𝑙 ) is an empty set. When Grd(π’œβ„± 𝑙 ) is an empty set, EVAX relies on a default prediction. A higher β€˜Empty Grd’ value thus means that accuracy and fidelity scores are increasingly determined by the default prediction, and therefore become less reliable. β€’ Time indicates the number of seconds needed to run the program. The results in Table 1 show high fidelity (an average of 0.95) for all four ML models, which indicates a sufficient degree of faithfulness. Only the adult dataset and the neural network of the wine dataset have relatively low scores. This might be due to the relatively low accuracy of the BB in those cases. Since the argument with the highest argument strength is always in Grd(π’œβ„± 𝑙 ), the minimum size is always equal to 1. This indicates that the model is capable of computing small explanations without losing faithfulness. Moreover, we see that the method never computes an empty grounded extension Grd(π’œβ„± 𝑙 ), and hence requires no reliance on a Fidelity Accuracy ( BB ) Size Empty Grd Time (s) Adult Logistic regression 0.95 0.73 (0.72) 1 0.0 5.06 SVM 0.93 0.75 (0.74) 1 0.0 13.61 Random forest 0.88 0.77 (0.78) 1 0.0 5.06 Neural network 0.91 0.75 (0.75) 1 0.0 5.28 Mushroom Logistic regression 0.98 0.96 (0.95) 1 0.0 17.92 SVM 0.99 0.98 (0.99) 1 0.0 13.45 Random forest 1.0 0.99 (1.0) 1 0.0 13.63 Neural network 1.0 0.95 (0.95) 1 0.0 17.87 Iris Logistic regression 0.97 0.97 (0.9) 1 0.0 0.24 SVM 1.0 0.97 (0.97) 1 0.0 0.25 Random forest 1.0 0.97 (0.97) 1 0.0 0.26 Neural network 0.93 0.97 (0.9) 1 0.0 0.23 Wine Logistic regression 0.94 1.0 (0.94) 1 0.0 2.32 SVM 0.94 1.0 (0.94) 1 0.0 2.60 Random forest 0.92 1.0 (0.91) 1 0.0 2.73 Neural network 0.86 0.97 (0.83) 1 0.0 3.86 Table 1 Quantitative results of EVAX. default prediction. These results are obtained on a Windows 64-bit operating system with 16GB RAM and an Intel(R) Core(TM) i5-1145G7 @ 2.60GHz processor. 5. Qualitative evaluation The purpose of EVAX is the modeling of explanations as described in [4]. In this section we discuss how the explanations generated through EVAX are contrastive, selected and social, as described at the beginning of Section 3. Contrastive explanations explain the fact (e.g., 𝑃 ) by highlighting its differences with the foil (e.g., 𝑄), by answering the question why 𝑃 rather than 𝑄? [4]. Argumentative explanations are contrastive when they include arguments pro and con the conclusion. For explanations computed by EVAX this is the case when there is at least one argument with a fact conclusion and a counterargument with a foil conclusion. Such counterarguments make it possible to explain the outcome relative to an alternative outcome, by showing what features give reason to believe that foil. As shown in Figures 2 and 3, when 𝜏explain > 1, explanations contain at least one counterargument and are therefore contrastive. While an event might have infinitely many causes, humans are able to select one or two as the explanation. To this end a variety of cognitivie biases are employed [4]. EVAX incorporates selectedness by implementing both minimality and biasedness. Minimality amounts to including just a few arguments as the explanation. This is enabled by guaranteeing that the number of arguments in π’œβ„± 𝑙 does not exceed threshold 𝜏select . In addition, this restricted size has shown not to affect the fidelity score. EVAX allows for biasedness in the form of abnormality, which is a common cognitive bias in everyday explanations [8]. Finally, explanations are social, since the explainer will adapt the explanation to the explainee, for example, by adjusting the size, the content or the form of the explanation. Explanations can be adjusted in two ways with EVAX. First, the number of arguments that are included in π’œβ„± 𝑙 can be adjusted with 𝜏select and the size of the explanation can be adjusted with 𝜏explain . In that way, an inexperienced end-user who requires a single argument to explain the prediction can set 𝜏select = 𝜏explain = 1. A more experienced user who wants a completer set of arguments and counterarguments can set higher values. Second, because a computed explanation 𝑒 ∈ β„° stems from an AF, the explanation can be presented in various ways. In the current paper we have illustrated one of these representations in Figure 2, in the form of a dialogue. 6. Related work As the survey [3] shows, the field of argumentative XAI goes in several directions. In addition to explaining argumentation-based conclusions with argumentation (e.g., [9, 13, 14, 15]), ar- gumentation can also be employed to explain conclusions derived with other AI approaches. Here a distinction can be made between intrinsic methods, which provide explanations for conclusions drawn by argumentation mechanisms (e.g., [16, 17, 18]) and post-hoc methods, which provide argumentative explanations for conclusions drawn from non-argumentative methods (e.g., [19, 20, 21]). Closest related to our work is [22, 23]. There several agents engage in a dialogue, by putting forward arguments in the form of classification association rules. This dialogue results in a tree of arguments and counterarguments. The result is an overview of the agents’ point of view, with arguments that might contain several premises. Given the focus on the dialogue and the structure of arguments and counterarguments, [22, 23] provide social and contrastive explanations. Our approach also aims at minimal explanations: our arguments have only one premise, the size of the explanation can be adjusted and the selection of the arguments in the explanation can be based on argument strength or abnormality. The purpose of our work (i.e., providing small, everyday explanations) is therefore different. Following our interpretation of the explanations in [4], none of the other available argu- mentative XAI approaches are contrastive, selected and social. As mentioned, explanation methods based on argumentation frameworks are social by definition, since AFs allow for a variety of explanation representations, which can be used to adjust the explanation to a receiver. Additionally, most argumentative explanation methods are contrastive, since they present not only arguments for the conclusion, but also counterarguments. The exception to this is [19], since there is no clear relation between arguments and counterarguments in this approach. Selectedness, in the form of minimality and biasedness is most difficult to establish, it seems, since none of the mentioned approaches is selected in our sense. Selection based on a cognitive bias is not integrated at all. Reducing the size of the explanation is discussed, however, the reduction might not be sufficient. Even when a small explanation is provided, it might still contain many causes or result in large dispute trees [16, 17, 19]. In contrast, EVAX provides a method which is contrastive (arguments and counterarguments are part of the explanation), selected (the maximum size can be reduced to one argument and the selection of this argument can be based on abnormality) and social (the size of the explanation can be adjusted and the explanation can be represented as a dialogue). Since the explanations provided by EVAX rely on an argumentation framework, in future work we can look at representing them as dispute trees (as in [9]), apply selectedness based on necessity and sufficiency [24], derive explanations in terms of labeling [14] or strongly rejecting subframeworks [15]. 7. Conclusion We have introduced EVAX an argumentative explanation method for everyday explanations for classifiers. It takes as input a trained classifier, calculates a local argumentation framework π’œβ„± 𝑙 and can present explanations in a variety of ways (recall Figure 3). In particular, we have shown how explanations can be selected, based on abnormality and how explanations can be represented as a dialogue between proponent and opponent. Although the method might seem somewhat naive, our results show that it is a fast explanation method, which satisfies our requirements. Based on the quantitative results (recall Table 1) we have shown that our method is faithful. Moreover, in the qualitative evaluation (Section 5), we have discussed how EVAX produces everyday explanations that satisfy the findings from [4] (i.e., contrastive, selected and social explanations). The result is a model-agnostic explanation method with which local explanations can be provided in a faithful manner, based on findings from the social sciences. The results in this paper show that EVAX is a promising explanation method, which can be explored further. First, the properties of everyday explanations can be further worked out by including counterfactual statements, incorporating more cognitive biases, and testing more explanation deployments. Second, experimental evaluations with human users would more closely assess the quality of argumentative explanations. References [1] R. Calegari, G. Ciatto, A. Omicini, On the integration of symbolic and sub-symbolic techniques for XAI: A survey, Intelligenza Artificiale 14 (2020) 7–32. [2] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games, Artificial Intelligence 77 (1995) 321– 358. [3] K. Čyras, A. Rago, E. Albini, P. Baroni, F. Toni, Argumentative XAI: A survey, in: Z. Zhou (Ed.), Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI, 2021, 2021, pp. 4392–4399. [4] T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artificial Intelligence 267 (2019) 1–38. [5] P. Besnard, A. J. GarcΓ­a, A. Hunter, S. Modgil, H. Prakken, G. R. Simari, F. Toni, Introduction to structured argumentation, Argument & Computation 5 (2014) 1–4. [6] S. J. Russell, P. Norvig, Artificial Intelligence - A Modern Approach, Third International Edition, Pearson Education, 2010. [7] K. Čyras, R. Badrinath, S. K. Mohalik, A. Mujumdar, A. Nikou, A. Previti, V. Sun- dararajan, A. V. Feljan, Machine reasoning explainability, arXiv 2009.00418 (2020). arXiv:2009.00418. [8] P. Thagard, Explanatory coherence, Behavioral and brain sciences 12 (1989) 435–467. [9] X. Fan, F. Toni, On computing explanations in argumentation, in: B. Bonet, S. Koenig (Eds.), Proceedings of the 29th Conference on Artificial Intelligence, AAAI, 2015, AAAI Press, 2015, pp. 1496–1502. [10] D. Dua, C. Graff, UCI Machine Learning Repository, 2017. URL: http://archive.ics.uci.edu/ ml. [11] W. McKinney, Data Structures for Statistical Computing in Python, in: S. van der Walt, J. Millman (Eds.), Proceedings of the 9th Python in Science Conference, 2010, pp. 56 – 61. [12] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830. [13] A. Borg, F. Bex, A basic framework for explanations in argumentation, IEEE Intelligent Systems 36 (2021) 25–35. [14] B. Liao, L. van der Torre, Explanation semantics for abstract argumentation, in: H. Prakken, S. Bistarelli, F. Santini, C. Taticchi (Eds.), Proceedings of the 8th International Conference on Computational Models of Argument, COMMA, 2020, volume 326 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2020, pp. 271–282. [15] Z. Saribatur, J. Wallner, S. Woltran, Explaining non-acceptability in abstract argumentation, in: G. D. Giacomo, A. CatalΓ‘, B. Dilkina, M. Milano, S. Barro, A. BugarΓ­n, J. Lang (Eds.), Proceedings of the 24th European Conference on Artificial Intelligence, ECAI, 2020, volume 325 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2020, pp. 881–888. [16] O. Cocarascu, A. Stylianou, K. Čyras, F. Toni, Data-empowered argumentation for dialecti- cally explainable predictions, in: G. D. Giacomo, A. CatalΓ‘, B. Dilkina, M. Milano, S. Barro, A. BugarΓ­n, J. Lang (Eds.), Proceedings of the 24th European Conference on Artificial Intelligence, ECAI, 2020, volume 325 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2020, pp. 2449–2456. [17] K. Čyras, D. Birch, Y. Guo, F. Toni, R. Dulay, S. Turvey, D. Greenberg, T. Hapuarachchi, Explanations by arbitrated argumentative dispute, Expert Systems with Applications 127 (2019) 141–156. [18] H. Prakken, R. Ratsma, A top-level model of case-based argumentation for explanation: Formalisation and experiments, Argument & Computation (2021) 1–36. [19] E. Albini, P. Lertvittayakumjorn, A. Rago, F. Toni, DAX: deep argumentative explanation for neural networks, arXiv 2012.05766 (2020). arXiv:2012.05766. [20] L. Amgoud, Non-monotonic explanation functions, in: J. VejnarovΓ‘, N. Wilson (Eds.), Proceedings of the 16th European Conference of Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU, 2021, volume 12897 of Lecture Notes in Computer Science, Springer, 2021, pp. 19–31. [21] N. Sendi, N. Abchiche-Mimouni, F. Zehraoui, A new transparent ensemble method based on deep learning, in: I. J. Rudas, J. Csirik, C. Toro, J. Botzheim, R. J. Howlett, L. C. Jain (Eds.), Proceedings of the 23rd International Conference of Knowledge-Based and Intelligent Information & Engineering Systems, KES, 2019, volume 159 of Procedia Computer Science, Elsevier, 2019, pp. 271–280. [22] M. Wardeh, T. Bench-Capon, F. Coenen, PADUA: a protocol for argumentation dialogue using association rules, Artificial Intelligence and Law 17 (2009) 183–215. [23] M. Wardeh, F. Coenen, T. Bench-Capon, Pisa: A framework for multiagent classification using argumentation, Data & Knowledge Engineering 75 (2012) 34–57. [24] A. Borg, F. Bex, Necessary and sufficient explanations for argumentation-based conclusions, in: J. VejnarovΓ‘, N. Wilson (Eds.), Proceedings of the 16th European Conference of Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU, 2021, volume 12897 of Lecture Notes in Computer Science, 2021, pp. 19–31.