<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Everyday Argumentative Explanations for Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jowan van Lente</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>AnneMarie Borg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Floris Bex</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Computing Sciences, Utrecht University</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tilburg Institute for Law, Technology, and Society, Tilburg University</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we study everyday explanations for classification tasks with formal argumentation. Everyday explanations describe how humans explain in day-to-day life, which is important when explaining decisions of AI systems to lay users. We introduce EVAX, a model-agnostic explanation method for classifiers with which contrastive, selected and social explanations can be generated. The resulting explanations can be adjusted in their size and retain high fidelity scores (an average of 0.95).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable artificial intelligence</kwd>
        <kwd>Formal Argumentation</kwd>
        <kwd>Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>why  rather than ?; selected, not all possible explanations are returned, but rather just
one or two are selected based on a cognitive bias, such as abnormality or responsibility;
and social, explanations (e.g., their size or content) are adjusted to the receiver.
• faithfulness: when explaining the outcome of a black box, learning-based system, it should
be faithful to the system and its mechanisms.</p>
      <p>In order to model the above properties in an argumentative explanation method, we introduce
EVAX, an argumentative explanation method for everyday explanations of decisions derived
with a classifier. EVAX is a model-agnostic method, which only requires the input and output
of a classifier and can then compute, faithfully, explanations which are contrastive, selected and
social.</p>
      <p>The paper is structured as follows. Section 2 contains the preliminaries after which EVAX
is introduced (Section 3). We present a quantitative (Section 4) and qualitative (Section 5)
evaluation. Related work is discussed in Section 6 and we conclude in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>In this section we recall the necessary preliminaries on formal argumentation and classification
tasks and present our definition of arguments and defeats.</p>
      <sec id="sec-2-1">
        <title>2.1. Formal argumentation</title>
        <p>
          An abstract argumentation framework (AF) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a pair ℱ = ⟨Args, Def⟩, where Args is a
set of arguments and Def ⊆ Args × Args is an defeat relation on these arguments. Given an
argumentation framework ℱ , Dung-style semantics [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] can be applied to it, to determine
what combinations of arguments (called extensions) can collectively be accepted.
Definition 1. Let ℱ = ⟨Args, Def⟩ be an AF, S ⊆ Args be a set of arguments and  ∈ Args
an argument. Then S defeats  if there is an ′ ∈ S such that (′, ) ∈ Def; S defends  if S
defeats every defeater of ; S is conflict-free if there are no 1, 2 ∈ S such that (1, 2) ∈ Def.
S is admissible if it is conflict-free and it defends all of its elements, S is complete if it is admissible
and it contains all the arguments it defends. The grounded extension of ℱ is the minimal (w.r.t.
⊆ ) complete extension, denoted by Grd(ℱ ).
        </p>
        <p>
          In abstract argumentation, arguments are abstract entities and the attack relation is
predetermined. In contrast, in structured argumentation [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], arguments are constructed from a
knowledge base and a set of rules and the attacks are based on the structure of the resulting
arguments. In both cases, the strength of the arguments determines whether an attack is
successful (e.g., an attack by a stronger argument is successful and therefore also a defeat, but
an attack by a weaker argument is not successful). While there is a variety of approaches to
structured argumentation, we will use a simple notion of an argument: a triple of a premise, a
conclusion and the strength of the argument.
        </p>
        <p>Definition 2. An argument  is a triple (, ,  ), where  is the premise (e.g., a feature), denoted
by prem() =  ,  is the conclusion inferred from  (e.g., a class), denoted by conc() = 
and  is the strength value of the argument, denoted by str() =  where 0 ≤  ≤ 1.</p>
        <p>Based on the structure of the arguments, the defeat relation is determined:
Definition 3. Let ℱ = ⟨Args, Def⟩ be an AF and ,  ∈ Args. Then (, ) ∈ Def if
conc() ̸= conc() and str() ≥ str().</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Classification</title>
        <p>
          As mentioned in the introduction, in this paper we are interested in explaining the outcome of
a classification task with argumentation. Intuitively, classification is an inference task in which
it is checked whether an object (e.g., an image, sound or text file) belongs to a category [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Definition 4. A feature is an attribute-value pair (, ) ∈ ℱ , where  is the label of the feature
and  is its corresponding value. Let ℱ be a set of features,  = {1, . . . , } be the input space,
consisting of  input points such that  ⊆ ℱ for all  ∈ {1, . . . , } and  = {1, . . . , }. A
classification task is a function which assigns to an input point  a class  ∈  based on the
input space  .
        </p>
        <p>Example 1. Let 1, . . . , 6 ∈  , where every  ∈  is a student, and let  = {0, 1}, where 1
represents a student being accepted to university, and 0 a rejection. The set of features consists of
{, , } ∈ ℱ , where  corresponds to the (rounded) average grade of the student,  to whether or
not the student passed the entry test and  to whether or not they are motivated. Suppose that we
are given the following input space:

1
2
3
4
5
6
A classification task is then to determine whether student 6 is accepted or not.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. EVAX : everyday argumentative explanations</title>
      <p>
        In this paper, we are interested in everyday explanations as described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], i.e., explanations
of why a specific event/property/decision occurred for end users in a day-to-day setting. To
ensure that our explanations fulfill these requirements, we follow the major findings in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
• contrastive explanations provide reasons pro and con the outcome [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ]. In an
argumentative setting, explanations are contrastive when arguments and counterarguments for
the outcome are present in the explanation.
• selected explanations have a fixed maximum size, the elements of which are selected
based on at least one cognitive bias. In an argumentative setting, explanations contain a
maximum number of arguments.
• social explanations can be adjusted to the receiver, by varying the complexity or size of an
explanation. Since explanations based on argumentation frameworks can be represented
in a variety of ways [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], argumentative explanations are social by definition.
      </p>
      <p>Additionally, the explanations should remain faithful to the model (i.e., they explain the behavior
of the model accurately).</p>
      <p>Our method EVAX takes as input a trained black box model and constructs a global set
of arguments: for each feature  and each class  it determines the probability that input
containing  will be assigned . For a specific input point, consisting of a set of features, a
local argumentation framework is created (i.e., only containing the arguments corresponding to
features from that input point and the defeats between them), from which the conclusion is
predicted. This local argumentation framework can then be used to derive explanations, the
size of which can be set by the user. We have implemented two ways to present explanations:
based on abnormality and in a dialogue form.</p>
      <p>We start by describing the method of EVAX (Section 3.1), we will illustrate it with a toy
example in Section 3.2.</p>
      <sec id="sec-3-1">
        <title>3.1. Method outline</title>
        <p>EVAX takes as input a labeled dataset, a trained black box model BB and a threshold value
 select that controls the size of the output. EVAX returns a set of predictions pred and a set of
local explanations ℰ . The explanations  ∈ ℰ answer the question: “Why did black box BB
assign class  to input instance ?” These explanations are deployments of an argumentation
framework that represent the behavior of BB around a single datapoint in argumentative terms.
This AF thus forms the basis for the explanations, and will, for every classified instance, be
referred to as ℱ . The size of ℱ  can be manually altered by  select.</p>
        <p>Algorithm 1 EVAX
1: procedure EVAX(BB, labeled_dataset,  select = 20)
2: train, test, train, test ← split_dataset(labeled_dataset, test_size = 0.2)
3: global_arguments ← get_global_arguments(BB, train)
4: for  in test do
5: ℱ  ← create_local_AF(, global_arguments,  select)
6: predict(ℱ )
7: explain(ℱ )
8: results()
9:
get_results(BB, predictions, test)
◁ step 1
◁ step 2
◁ step 3
◁ step 4</p>
        <p>The procedure of EVAX is shown in Algorithm 1.1 First, EVAX divides the labeled dataset into
a set of unlabeled datapoints  (the input space) and a set of labels  (the target space), which
are then split up into a train set and a test set, respectively train, test and train, test. The
default size of the test set is 0.2, and the default  select value is 20. Afterward, the method can
be divided into four main steps, which are described below. The first step handles all datapoints
1See https://github.com/jowanvanlente/EVAX for the implementation.
and is executed just once, whereas the other three steps handle a single datapoint and may be
repeated multiple times, up to a maximum of the size of the test set.</p>
        <p>• Step 1: Extract a global list of arguments, to represent the global behavior of BB.
• Step 2: Create a local AF, ℱ .</p>
        <p>– EVAX first iterates over all features (, ) ∈ ℱ of all (unlabeled) datapoints
 ∈ train and all output classes  ∈ , and computes for every feature-class pair a
decision rule. These rules are accompanied by a precision score (,) that articulates
the probability that BB will assign a datapoint with that particular feature to that
particular class. It then saves all (,) scores in a triple ((, ), , (,)), which is
added to a list of triples.
– Arguments are constructed based on the list of triples. For every triple an argument
is constructed in which the feature (, ) is the premise prem, the output-class 
is the conclusion conc and the precision score (,) is set as the argument strength
str. Together these arguments form the global list of arguments.
– EVAX creates a local AF: ℱ  in every iteration of this step. This is an argumentation
framework ℱ  = ⟨Args, Def⟩ that represents the classifier’s behavior around
one particular datapoint. Based on the values of that datapoint, it selects a set of
relevant arguments (Args ⊆ Args) from the global list of arguments (Args) and
determines the defeats (Def).
– Argument selection is done by matching the features of the datapoint from test
with the premises of the arguments: given a datapoint  ∈ test and an argument
 ∈ Args, if prem() is one of the features in  then  is added to the local AF
(meaning  ∈ Args). As a result, all arguments with a premise corresponding to
one of the features of the datapoint are selected. To gain computational eficiency
and maintain selectedness, a threshold  select can be defined, which ensures only the
top  select strongest arguments are included in the list.</p>
        <p>– The defeats are determined as in Definition 3.
• Step 3: Predict the output class based on ℱ .</p>
        <p>
          – First, the grounded extension of the local AF (Grd(ℱ )) is computed, after which
the conclusion of the arguments in Grd(ℱ ) is picked as the prediction. Formally,
this means that prediction  ∈  is equal to conc() such that  ∈ Grd(ℱ ).
Since arguments in the grounded extension are non-conflicting, they always have
the same conclusion. Therefore it does not matter what argument in Grd(ℱ ) is
picked. When Grd(ℱ ) is empty, EVAX will predict the majority class.
• Step 4: Explain
– The current implementation allows for two variations: adding selectedness based
on abnormality and presenting the explanation in a conversational form.
– Abnormality is one of the methods humans use when selecting an explanation
and describes how people tend to choose a cause that is unusual [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We have
defined the abnormality of an argument as 1 − coverage. The coverage value refers
to the fraction of datapoints that the decision rule, out of which the argument is
constructed, ‘rules over’. In other words, the coverage of argument  refers to the
fraction of input instances that have a feature equal to prem(). Since the coverage
describes how often a feature is present in a dataset, it essentially describes how
‘normal’ a feature is. Therefore, a lower coverage means that a feature becomes
less normal, thus becomes increasingly abnormal. The deployment of ℱ  then
amounts to selecting the argument with the highest abnormality score that argues
for the predicted class. An example of the output is given in Figure 1.
a130: odor = 6 → (precision = 1.0, abnormality = 0.989)
        </p>
        <p>
          ‘1 is poisonous because of its unusual pungent odor’
– EVAX can also provide a dialectical representation of ℱ , similar to a dispute
tree [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This representation has the form of a discussion between a proponent (P)
and opponent (O) about what class to assign to the datapoint in question. A
threshold  explain allows the user to choose the number of arguments to include in the
explanation. Arguments are divided into pro and con arguments and are put forward
by P and O, who take turns. If the value of threshold  explain is even, O starts the
dispute, and if it is odd, P starts. After the first argument is put forward, the strongest
counterargument is replied.2 Note that the threshold is diferent from  select, because
it does not afect the size of ℱ , but merely the size of the dialectical representation
of ℱ . See Figure 2 for an example with  explain = 4.
        </p>
        <p>O: a93: gill-color = 4 → 0 (0.84)
P: a16: gill-size = 1 → 1 (0.91)</p>
        <p>O: since gill color is black, it is likely 1 has class edible</p>
        <p>P: since gill size is narrow, it is likely 1 has class poisonous
O: a143: spore-print-color = 2 → 0 (0.87)</p>
        <p>O: since spore print color is black, it is likely 1 has class edible
P: a130: odor = 6 → 1 (1.0)</p>
        <p>P: since odor is pungent, it is certain 1 has class poisonous
2The only requirement of the counterargument is a conflicting conclusion, and not necessarily a higher strength
value, i.e., it does not have to defeat the argument. This is to ensure that counterarguments are included in this
explanation form.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Toy example</title>
        <p>Recall the classification task described in Example 1, on students being accepted into university.
In Figure 3 we present a similar case to illustrate EVAX. It represents one iteration of Steps 2, 3,
and 4. It thus assumes that the global list of arguments has already been computed.</p>
        <p>In this example, a black box predicts that an input instance ‘John’ will be accepted into
university. The same input instance is used as input for EVAX. Based on that input, EVAX
creates a local argumentation framework ℱ  by selecting three relevant arguments, based
on the three diferent features, and defines defeats over them. It then calculates the grounded
extension and predicts that John will be accepted into university. In addition, it computes an
AF-based explanation, which in this case is a dialectical representation of ℱ , as described in
Step 4. The threshold  explain has a value of 3. The arrows in the representation are the defeats.
Note that the arrows between the diferent components of EVAX do not represent defeats, but
indicate the information flow.</p>
        <p>O: John is motivated, therefore he usually gets accepted</p>
        <p>P: John has grade 6, therefore he usually gets declined</p>
        <p>O: John passed the test, therefore it is certain he gets accepted</p>
        <p>John</p>
        <p>Grade: 6
Passed test: yes
Motivated: yes
black-box</p>
        <p>EVAX</p>
        <p>ℱ 
1: (grade &lt; 8) → declined (0.7)
2: (passed test = yes) → accepted (1.0)
3: (motivated = yes) → accepted (0.6)</p>
        <p>Defeats: (1, 3), (2, 1)</p>
        <p>P: 3
O: 1</p>
        <p>P: 2
accepted
accepted</p>
        <p>fidelity calculator</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Quantitative evaluation</title>
      <p>
        We have tested EVAX on four datasets and used five quantitative metrics for the evaluation.
The (labeled) datasets are from the UCI Machine Learning Repository [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
• With the Adult dataset one tries to predict whether or not a person makes more than
50.000 dollars a year. We removed all datapoints with unknown values and discretized
the continuous features.
• The Mushroom dataset includes instances of 23 diferent species of mushrooms. The
task is to identify whether a mushroom is poisonous or edible. We did not perform any
alterations on this dataset.
• The task of the Iris dataset is to predict the type of iris plant. We discretized the continuous
values.
• With the Wine dataset one wants to predict the type of wine of an input instance. Again
we discretized the continuous values.
      </p>
      <p>
        The discretization of continuous variables is necessary to constrain the number of arguments
that are added to the global list of arguments. We have used the  method by pandas [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
with a bin value of 10. Since higher bin values tend to give better performance but reduce the
computational eficiency, we have tuned this value by incrementally increasing the value from 3
up to 20. We found that from a bin value of 10 and upwards, the fidelity did not significantly
increase (sometimes it even decreased), while the computational eficiency consistently decreased
with higher bin values.
      </p>
      <p>
        For each of these datasets we have chosen four diferent machine learning models with
diferent complexity to test the performance and range of EVAX : logistic regression, support
vector machines (SVM), random forest, and neural networks. All four models are initialized
from the scikit-learn library [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>Finally, we applied the following five metrics for the evaluation:
• Fidelity indicates how well the explanation approximates the prediction of the black box
model. It represents the fraction of datapoints that are assigned to the same output class
by EVAX and BB.
• Accuracy ( BB) indicates how well our model performs on unseen data. It represents the
fraction of correctly classified datapoints. The value between brackets () refers to the
original accuracy of BB.
• Size measures the average minimum amount of arguments necessary to retain the same
prediction. In other words, it is the lowest possible  select score without afecting the
accuracy or fidelity. A consistent low size value indicates the method can guarantee to
compute small explanations that are consistently faithful.
• Empty Grd specifies the fraction of datapoints for which the grounded extension
Grd(ℱ ) is an empty set. When Grd(ℱ ) is an empty set, EVAX relies on a default
prediction. A higher ‘Empty Grd’ value thus means that accuracy and fidelity scores are
increasingly determined by the default prediction, and therefore become less reliable.
• Time indicates the number of seconds needed to run the program.</p>
      <p>The results in Table 1 show high fidelity (an average of 0.95) for all four ML models, which
indicates a suficient degree of faithfulness. Only the adult dataset and the neural network of
the wine dataset have relatively low scores. This might be due to the relatively low accuracy
of the BB in those cases. Since the argument with the highest argument strength is always in
Grd(ℱ ), the minimum size is always equal to 1. This indicates that the model is capable of
computing small explanations without losing faithfulness. Moreover, we see that the method
never computes an empty grounded extension Grd(ℱ ), and hence requires no reliance on a
Adult
Mushroom
Iris
Wine</p>
      <p>Logistic regression</p>
      <p>SVM
Random forest</p>
      <p>Neural network
Logistic regression</p>
      <p>SVM
Random forest
Neural network
Logistic regression</p>
      <p>SVM
Random forest
Neural network
Logistic regression</p>
      <p>SVM
Random forest
Neural network
default prediction. These results are obtained on a Windows 64-bit operating system with 16GB
RAM and an Intel(R) Core(TM) i5-1145G7 @ 2.60GHz processor.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Qualitative evaluation</title>
      <p>
        The purpose of EVAX is the modeling of explanations as described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this section we
discuss how the explanations generated through EVAX are contrastive, selected and social, as
described at the beginning of Section 3.
      </p>
      <p>
        Contrastive explanations explain the fact (e.g.,  ) by highlighting its diferences with the foil
(e.g., ), by answering the question why  rather than ? [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Argumentative explanations
are contrastive when they include arguments pro and con the conclusion. For explanations
computed by EVAX this is the case when there is at least one argument with a fact conclusion
and a counterargument with a foil conclusion. Such counterarguments make it possible to
explain the outcome relative to an alternative outcome, by showing what features give reason
to believe that foil. As shown in Figures 2 and 3, when  explain &gt; 1, explanations contain at least
one counterargument and are therefore contrastive.
      </p>
      <p>
        While an event might have infinitely many causes, humans are able to select one or two as
the explanation. To this end a variety of cognitivie biases are employed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. EVAX incorporates
selectedness by implementing both minimality and biasedness. Minimality amounts to including
just a few arguments as the explanation. This is enabled by guaranteeing that the number of
arguments in ℱ  does not exceed threshold  select. In addition, this restricted size has shown
not to afect the fidelity score. EVAX allows for biasedness in the form of abnormality, which is
a common cognitive bias in everyday explanations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Finally, explanations are social, since the explainer will adapt the explanation to the explainee,
for example, by adjusting the size, the content or the form of the explanation. Explanations can
be adjusted in two ways with EVAX. First, the number of arguments that are included in ℱ 
can be adjusted with  select and the size of the explanation can be adjusted with  explain. In that
way, an inexperienced end-user who requires a single argument to explain the prediction can
set  select =  explain = 1. A more experienced user who wants a completer set of arguments and
counterarguments can set higher values. Second, because a computed explanation  ∈ ℰ stems
from an AF, the explanation can be presented in various ways. In the current paper we have
illustrated one of these representations in Figure 2, in the form of a dialogue.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Related work</title>
      <p>
        As the survey [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] shows, the field of argumentative XAI goes in several directions. In addition
to explaining argumentation-based conclusions with argumentation (e.g., [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref9">9, 13, 14, 15</xref>
        ]),
argumentation can also be employed to explain conclusions derived with other AI approaches.
Here a distinction can be made between intrinsic methods, which provide explanations for
conclusions drawn by argumentation mechanisms (e.g., [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16, 17, 18</xref>
        ]) and post-hoc methods,
which provide argumentative explanations for conclusions drawn from non-argumentative
methods (e.g., [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
        ]).
      </p>
      <p>Closest related to our work is [22, 23]. There several agents engage in a dialogue, by putting
forward arguments in the form of classification association rules. This dialogue results in a
tree of arguments and counterarguments. The result is an overview of the agents’ point of
view, with arguments that might contain several premises. Given the focus on the dialogue
and the structure of arguments and counterarguments, [22, 23] provide social and contrastive
explanations. Our approach also aims at minimal explanations: our arguments have only one
premise, the size of the explanation can be adjusted and the selection of the arguments in the
explanation can be based on argument strength or abnormality. The purpose of our work (i.e.,
providing small, everyday explanations) is therefore diferent.</p>
      <p>
        Following our interpretation of the explanations in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], none of the other available
argumentative XAI approaches are contrastive, selected and social. As mentioned, explanation
methods based on argumentation frameworks are social by definition, since AFs allow for a
variety of explanation representations, which can be used to adjust the explanation to a receiver.
Additionally, most argumentative explanation methods are contrastive, since they present not
only arguments for the conclusion, but also counterarguments. The exception to this is [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
since there is no clear relation between arguments and counterarguments in this approach.
Selectedness, in the form of minimality and biasedness is most dificult to establish, it seems,
since none of the mentioned approaches is selected in our sense. Selection based on a cognitive
bias is not integrated at all. Reducing the size of the explanation is discussed, however, the
reduction might not be suficient. Even when a small explanation is provided, it might still
contain many causes or result in large dispute trees [
        <xref ref-type="bibr" rid="ref16 ref17 ref19">16, 17, 19</xref>
        ].
      </p>
      <p>
        In contrast, EVAX provides a method which is contrastive (arguments and counterarguments
are part of the explanation), selected (the maximum size can be reduced to one argument
and the selection of this argument can be based on abnormality) and social (the size of the
explanation can be adjusted and the explanation can be represented as a dialogue). Since
the explanations provided by EVAX rely on an argumentation framework, in future work
we can look at representing them as dispute trees (as in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), apply selectedness based on
necessity and suficiency [ 24], derive explanations in terms of labeling [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or strongly rejecting
subframeworks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>
        We have introduced EVAX an argumentative explanation method for everyday explanations
for classifiers. It takes as input a trained classifier, calculates a local argumentation framework
ℱ  and can present explanations in a variety of ways (recall Figure 3). In particular, we have
shown how explanations can be selected, based on abnormality and how explanations can
be represented as a dialogue between proponent and opponent. Although the method might
seem somewhat naive, our results show that it is a fast explanation method, which satisfies our
requirements. Based on the quantitative results (recall Table 1) we have shown that our method
is faithful. Moreover, in the qualitative evaluation (Section 5), we have discussed how EVAX
produces everyday explanations that satisfy the findings from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (i.e., contrastive, selected
and social explanations). The result is a model-agnostic explanation method with which local
explanations can be provided in a faithful manner, based on findings from the social sciences.
      </p>
      <p>The results in this paper show that EVAX is a promising explanation method, which can be
explored further. First, the properties of everyday explanations can be further worked out by
including counterfactual statements, incorporating more cognitive biases, and testing more
explanation deployments. Second, experimental evaluations with human users would more
closely assess the quality of argumentative explanations.
Elsevier, 2019, pp. 271–280.
[22] M. Wardeh, T. Bench-Capon, F. Coenen, PADUA: a protocol for argumentation dialogue
using association rules, Artificial Intelligence and Law 17 (2009) 183–215.
[23] M. Wardeh, F. Coenen, T. Bench-Capon, Pisa: A framework for multiagent classification
using argumentation, Data &amp; Knowledge Engineering 75 (2012) 34–57.
[24] A. Borg, F. Bex, Necessary and suficient explanations for argumentation-based conclusions,
in: J. Vejnarová, N. Wilson (Eds.), Proceedings of the 16th European Conference of Symbolic
and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU, 2021, volume
12897 of Lecture Notes in Computer Science, 2021, pp. 19–31.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Calegari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>On the integration of symbolic and sub-symbolic techniques for XAI: A survey</article-title>
          ,
          <source>Intelligenza Artificiale</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>7</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Dung</surname>
          </string-name>
          ,
          <article-title>On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>77</volume>
          (
          <year>1995</year>
          )
          <fpage>321</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Čyras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          , E. Albini,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Argumentative</surname>
            <given-names>XAI</given-names>
          </string-name>
          :
          <article-title>A survey</article-title>
          , in: Z.
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI</source>
          ,
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>4392</fpage>
          -
          <lpage>4399</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>267</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Besnard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modgil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prakken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Simari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          , Introduction to structured argumentation,
          <source>Argument &amp; Computation</source>
          <volume>5</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          ,
          <string-name>
            <surname>Artificial Intelligence - A Modern Approach</surname>
          </string-name>
          , Third International Edition, Pearson Education,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Čyras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Badrinath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Mohalik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mujumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Previti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Feljan</surname>
          </string-name>
          ,
          <article-title>Machine reasoning explainability</article-title>
          , arXiv
          <year>2009</year>
          .
          <volume>00418</volume>
          (
          <year>2020</year>
          ). arXiv:
          <year>2009</year>
          .00418.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Thagard</surname>
          </string-name>
          , Explanatory coherence,
          <source>Behavioral and brain sciences 12</source>
          (
          <year>1989</year>
          )
          <fpage>435</fpage>
          -
          <lpage>467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>On computing explanations in argumentation</article-title>
          , in: B.
          <string-name>
            <surname>Bonet</surname>
          </string-name>
          , S. Koenig (Eds.),
          <source>Proceedings of the 29th Conference on Artificial Intelligence, AAAI</source>
          ,
          <year>2015</year>
          , AAAI Press,
          <year>2015</year>
          , pp.
          <fpage>1496</fpage>
          -
          <lpage>1502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Graf</surname>
          </string-name>
          ,
          <source>UCI Machine Learning Repository</source>
          ,
          <year>2017</year>
          . URL: http://archive.ics.uci.edu/ ml.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <article-title>Data Structures for Statistical Computing in Python</article-title>
          , in: S. van der Walt, J. Millman (Eds.),
          <source>Proceedings of the 9th Python in Science Conference</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bex</surname>
          </string-name>
          ,
          <article-title>A basic framework for explanations in argumentation</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          <volume>36</volume>
          (
          <year>2021</year>
          )
          <fpage>25</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. van der Torre</surname>
          </string-name>
          ,
          <article-title>Explanation semantics for abstract argumentation</article-title>
          , in: H.
          <string-name>
            <surname>Prakken</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bistarelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Taticchi (Eds.),
          <source>Proceedings of the 8th International Conference on Computational Models of Argument, COMMA</source>
          ,
          <year>2020</year>
          , volume
          <volume>326</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2020</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Saribatur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wallner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Woltran</surname>
          </string-name>
          ,
          <article-title>Explaining non-acceptability in abstract argumentation</article-title>
          , in: G. D.
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Catalá</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Dilkina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Milano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Barro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bugarín</surname>
          </string-name>
          , J. Lang (Eds.),
          <source>Proceedings of the 24th European Conference on Artificial Intelligence, ECAI</source>
          ,
          <year>2020</year>
          , volume
          <volume>325</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2020</year>
          , pp.
          <fpage>881</fpage>
          -
          <lpage>888</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stylianou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Čyras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Data-empowered argumentation for dialectically explainable predictions</article-title>
          , in: G. D.
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Catalá</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Dilkina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Milano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Barro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bugarín</surname>
          </string-name>
          , J. Lang (Eds.),
          <source>Proceedings of the 24th European Conference on Artificial Intelligence, ECAI</source>
          ,
          <year>2020</year>
          , volume
          <volume>325</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2020</year>
          , pp.
          <fpage>2449</fpage>
          -
          <lpage>2456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Čyras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Birch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dulay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Turvey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          , T. Hapuarachchi,
          <article-title>Explanations by arbitrated argumentative dispute</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>127</volume>
          (
          <year>2019</year>
          )
          <fpage>141</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Prakken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ratsma</surname>
          </string-name>
          ,
          <article-title>A top-level model of case-based argumentation for explanation: Formalisation and experiments</article-title>
          ,
          <source>Argument &amp; Computation</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Albini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lertvittayakumjorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>DAX: deep argumentative explanation for neural networks</article-title>
          ,
          <year>arXiv 2012</year>
          .
          <volume>05766</volume>
          (
          <year>2020</year>
          ). arXiv:
          <year>2012</year>
          .05766.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Amgoud</surname>
          </string-name>
          ,
          <article-title>Non-monotonic explanation functions</article-title>
          , in: J.
          <string-name>
            <surname>Vejnarová</surname>
          </string-name>
          , N. Wilson (Eds.),
          <source>Proceedings of the 16th European Conference of Symbolic and Quantitative Approaches to Reasoning with Uncertainty</source>
          ,
          <source>ECSQARU</source>
          ,
          <year>2021</year>
          , volume
          <volume>12897</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sendi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abchiche-Mimouni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zehraoui</surname>
          </string-name>
          ,
          <article-title>A new transparent ensemble method based on deep learning</article-title>
          , in: I. J.
          <string-name>
            <surname>Rudas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Csirik</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Toro</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Botzheim</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          <string-name>
            <surname>Howlett</surname>
          </string-name>
          , L. C. Jain (Eds.),
          <source>Proceedings of the 23rd International Conference of Knowledge-Based and Intelligent Information &amp; Engineering Systems</source>
          , KES,
          <year>2019</year>
          , volume
          <volume>159</volume>
          of Procedia Computer Science,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>