<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Impact of Sparsification on Quantitative Argumentative Explanations in Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Peacock</string-name>
          <email>daniel.peacock20@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mansi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nico Potyka</string-name>
          <email>potykan@cardif.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Toni</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiang Yin</string-name>
          <email>x.yin20@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cardif University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imperial College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neural Networks (NNs) are powerful decision-making tools, but their lack of explainability limits their use in high-stakes domains such as healthcare and criminal justice. The recent SpArX framework sparsifies NNs and maps them to (weighted) Quantitative Bipolar Argumentation Frameworks (QBAFs) to provide an argumentative understanding of their mechanics. QBAFs can be explained by various quantitative argumentative explanation methods such as Argument Attribution Explanations (AAEs), Relation Attribution Explanations (RAEs), and Contestability Explanations (CEs) - which assign numerical scores to arguments or relations to quantify their influence on the dialectical strength of an argument to be explained. However, it remains unexplored how sparsification of NNs impacts the explanations derived from the corresponding (weighted) QBAFs. In this paper we explore two directions for impact. First, we empirically investigate how varying the sparsification levels of NNs afects the preservation of these explanations: using four datasets (Iris, Diabetes, Cancer, and COMPAS), we ifnd that AAEs are generally well preserved, whereas RAEs are not. Then, for CEs, we find that sparsification can improve computational eficiency in several cases. Overall, this study ofers a preliminary investigation into the potential synergy between sparsification and explanation methods, opening up new avenues for future research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainability</kwd>
        <kwd>Neural Networks</kwd>
        <kwd>Argumentative Explanations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Layer 1 Neuron 1
Activation: 0.83
Layer 1 Neuron 2
Activation: 0.00</p>
      <p>Layer 2 Neuron 1
Activation: 0.02
Layer 2 Neuron 2
Activation: 0.00
Layer 2 Neuron 3
Activation: 1.00</p>
      <p>Iris-setosa
Activation: 0.00
Iris-versicolor
Activation: 0.99
Iris-virginica</p>
      <p>Activation: 1.00
(a) Original MLP.</p>
      <p>Sepal Length
Strength:0.4722
Sepal Width
Strength:0.3182
Petal Length
Strength:0.7193
Petal Width
Strength:0.6250</p>
      <p>Layer 1 Cluster 1
Strength:0.0142</p>
      <p>Layer 2 Cluster 1</p>
      <p>Strength:0.0001
(b) Sparse QBAF.</p>
      <p>StrIernisg-tshe:t0o.s0a000
Iris-versicolor
Strength:1.0000
Iris-virginica
Strength:1.0000
and attack relations, Contestability Explanations (CEs) [18] determine how the edge weights can be
modified to reach a desired dialectical strength for the topic argument. Together, these methods ofer a
ifne-grained and quantitative understanding of the reasoning process within (weighted) QBAFs.</p>
      <p>Both sparsification and quantitative argumentative explanations (i.e. AAEs, RAEs and CEs) advance
the interpretability of NNs, but there has been little investigation into how the former impacts the latter.
This gap is particularly concerning because sparsification simplifies the structure of MLPs, which may
alter or distort the quantitative explanations derived from the resulting (weighted) QBAF1, potentially
misleading users, and even resulting in ethical or legal risks. For example, in a healthcare setting,
misleading explanations may lead incorrect treatments being given to patients.</p>
      <p>
        To address this gap, we focus on two core research questions (as illustrated in Figure 2): (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) To what
extent does sparsification preserve AAEs and RAEs? (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Can sparsification improve CEs’ computational
eficiency? We distinguish these two questions because the nature of the explanations difers: AAEs and
RAEs quantify how arguments and relations contribute to a fixed outcome (as opposed to identifying
changes leading to a diferent outcome, as in CEs), so preservation under sparsification is crucial
to assess whether interpretability can be maintained. In contrast, generating CEs typically involves
heuristic or optimization-based search procedures, rather than direct computation as in AAEs and
RAEs. Therefore, the primary concern for CEs is whether sparsification can accelerate this search. We
empirically investigate these questions and make the following contributions:
1. We analyse the impact of sparsification on AAEs, and find that AAEs are generally well-preserved
across varying sparsity levels.
2. We analyse the impact of sparsification on RAEs, and find that RAEs are not as well-preserved
under sparsification as AAEs.
      </p>
      <p>3. We propose a method that leverages the sparsification to improve the runtime of computing CEs.
The code is available at https://github.com/DanielPeacock/ArguingWithNeuralNetworksPublic.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>In this paper we focus on NNs in the form of MLPs. These are directed, acyclic graphs as illustrated in
Figure 1a, processing inputs in an input layer (on the left in Figure 1a) through hidden layers (layer
1 and 2 in Figure 1a) to obtain a prediction in the last layer (on the right in Figure 1a). Nodes in all
layers amount to neurons, whose activation is determined by an activation function applied to the
(edge) weighted sum of the neuron’s incoming connections plus a bias value assigned to each neuron.
Throughout this paper we use the logistic activation function.
1Note that AAEs and RAEs use the unweighted QBAFs, while CEs use the weighted QBAFs. With a slight abuse of notation,
we use “QBAF” to refer to both throughout the paper.
Sparsifying</p>
      <p>SpArX</p>
      <p>Translation
Sparsified MLP Translation</p>
      <p>Sparse QBAF</p>
      <p>Explanations Methods</p>
      <p>AAEs
RAEs
CEs</p>
      <p>Explanations for Original QBAF</p>
      <p>Preservation (AAEs/RAEs)</p>
      <p>&amp; Efficiency (CEs)
Explanations for Sparse QBAF
(a) AAE Scores for arguments (input features/
neurons). Red/blue scores indicate a
negative/positive influence on the topic argument
(output).
(b) RAE Scores for relations. Red/blue scores
indicate a negative/positive influence on the topic
argument (output).</p>
      <p>Each MLP can be represented by an equivalent Quantitative Bipolar Argumentation Framework
(QBAFs). Here, we view each neuron as an argument, and each edge as a relation. Edges with negative
weights are attacks, and edges with positive weights are supports. Each argument has a base score
corresponding to its initial strength and relations are weighted. For more details on the translation
process see [9]. This provides a new argumentative interpretation of MLPs with dialectical strength
values for each argument (mathematically equivalent to the activations of neurons in the MLP).</p>
      <p>We use several existing argumentation-based explanation methods, overviewed here (see original
papers for more details).</p>
      <p>SpArX [7] The QBAF interpretation of MLPs does not necessarily improve explainability since QBAFs
are of the same size and density as the MLPs, which can be very large. SpArX provides explanations
by reducing the size of the given MLPs first. The neurons in the hidden layers are clustered based on
their activations, and then merged by averaging their biases and edge weights. The sparse MLPs are
then converted to equivalent QBAFs (Figure 1b) from which qualitative explanations can be found, for
example by creating word clouds of the most important input features or examining the dialectical
relationships between the arguments. In this paper we consider instead quantitative argumentative
explanations drawn from the sparsified QBAFs.</p>
      <p>AAEs [16] AAEs attempt to explain QBAFs by examining the contribution of other arguments to a
topic argument. Throughout this paper, we use topic argument to refer to the argument we are trying to
explain (usually this is an argument corresponding to one of the output neurons in the equivalent MLP).
We focus on Gradient-based AAEs (although other types exist such as Shapley-based [19] and
Removalbased AAEs [20]). Gradient methods work by computing a score which represents the sensitivity of the
topic argument to changes in the base score of other arguments. An example is shown in Figure 3a.
RAEs [17] RAEs attempt to understand the role of the relations in contributing to the strength
of a topic argument. In this paper, we focus on Shapley-based RAEs (although other types such as
Gradient-based RAEs [18] also exist). These are based on Shapley values [21], and look at every subset
of the attacks and supports to understand the influence of each one on the topic argument. Due to the
complexity in computing these scores, an approximation is used. Figure 3b shows an example.</p>
      <sec id="sec-2-1">
        <title>CEs [18] CEs calculate how the</title>
        <p>weights of each relation in the QBAF
must be modified in order to reach a
certain dialectical strength in the topic
argument (called the desired strength).
This is similar to the counterfactual
problem in AI, where methods are
used to try and explain how a model’s
outputs would change with
modifications to the inputs [22, p. 847
848]. CEs are computed by
iteratively updating the weights using the
gradient-based RAE (G-RAE) to guide
the search until the desired strength
is reached. Table 1 shows an example.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In order to ascertain whether AAEs and RAEs are preserved after sparsification, we train MLPs of various
sizes and compare an aggregation of the scores for the original MLPs to the scores for the MLP after
sparsification. The aggregation is needed to allow a comparison of scores since there are significantly
more scores for the original MLPs due to their larger size. Here, we define these aggregations.</p>
      <p>Let  = {1, . . . , } be a cluster of interest after sparsification in hidden layer , containing neurons
 for  = 1, . . . , . Similarly, let ′ = {′1, . . . , ′} be another cluster of interest in the next layer  + 1
containing neurons ′ for  = 1, . . . , .</p>
      <p>Aggregation of AAEs We aggregate the AAE scores by averaging the score for each neuron in the
cluster of interest. Formally, the aggregated score for cluster  is</p>
      <p>Agg_aae_score() = 1 ∑︁ aae_score().</p>
      <p>∈
A simple example of this process is shown in Figure 4.</p>
      <p>Aggregation of RAEs We aggregate the RAE scores by averaging the RAE scores of edges between
all pairs of neurons contained in two clusters of interest. Formally, the aggregated score for the edge
between clusters  and ′ is</p>
      <p>Agg_rae_score(, ′) = 1 ∑︁ 1 ∑︁ rae_score(,  ).</p>
      <p>∈
∈′
A simple example of this process is shown in Figure 5.</p>
      <p>Output
Cluster Score: 1.5</p>
      <p>Score: 2</p>
      <p>Score: 1</p>
      <p>De-aggregation of CEs We do not attempt to directly understand if CEs are preserved with
sparsification since this question is ill-defined. Indeed, CEs do not give a fixed score to each component in the
same way as AAEs and RAEs and so it is challenging to define what preservation means in this setting.
Instead, we look at CEs in the opposite direction: that is, we attempt to de-aggregate the CEs for the
sparse MLPs to approximate/recover the CE for the original MLPs and improve computational eficiency.
The sparse CE assigns weights to each edge in the sparse QBAF. We de-aggregate by assuming the
weights are equally distributed amongst edges merged together by SpArX. Every edge merged together
is assigned the same weight in our approximate CE. Formally, consider the edge between two clusters
 and ′, assigned weight  in the sparse CE. There are a total of  edges between every pair of
neurons in these clusters. So every edge between these pairs is assigned weight  in the approximate
CE. A simple example of this process is shown in Figure 6.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>To compare the aggregated AAE and RAE scores with the sparse MLP scores, we look at two approaches:
the overall pattern in the scores and the highest scoring arguments/ edges. All of our analysis is for
the Iris [23], Diabetes [24], Cancer [25] and COMPAS [26] datasets and we average our results over
the test set for each dataset. These datasets are commonly used and of varying levels of complexity
(number of input features and dataset size). For AAEs we train MLPs with 1 - 2 hidden layers each with
10 - 100 neurons. For RAEs we train MLPs with 1 - 2 hidden layers each with 2 - 10 neurons. These are
significantly smaller than the MLPs used for AAEs due to the time complexity involved in computing
Shapley-based RAEs making it impractical to use large MLPs.</p>
      <p>Sparse QBAF
(With CE Weights)</p>
      <p>Original QBAF
(With Approximated CE Weights)</p>
      <p>Overall Pattern To check if the overall pattern was preserved we check the Spearman Rank [27] and
Kendall- coeficient [ 28]. These provide a measure of the strength of the relationship between two
variables. We use these measures to examine the strength of the correlation between the aggregated
scores and the sparse scores. A rank/coeficient close to 1 means a strong correlation, indicating the
pattern in both sets of scores is similar and hence the pattern in the scores is preserved with sparsification.
We also rank the arguments and edges based on their scores. We then look at the percentage diference
in ranking between each aggregated argument/edge and the corresponding argument/edge in the sparse
QBAF. A small diference in rankings would indicate a similar pattern after sparsifying.
Highest Scores We also look specifically at the highest scoring arguments/edges. These are important
since they are the most influential components of the MLPs so it is important these are preserved. Firstly,
we look at the top-ten scoring arguments/edges and check what percentage of arguments/edges in the
aggregated scores are also in the top-ten of the sparse MLP scores i.e. how many of the highest scoring
arguments/edges stay the same after sparsification. In addition, for RAEs we also look at the top-scoring
aggregated edge and check whether this edge is in the top-ten of the sparse scores i.e. checking that the
most important edge remains important after sparsification. High percentages would indicate that the
highest scoring arguments/edges are preserved after sparsifying.</p>
      <sec id="sec-4-1">
        <title>4.1. Preservation of AAEs</title>
        <p>Overall our results are positive, showing that AAEs are preserved well by sparsification.
Overall Pattern The results can be found in Table 2. We find that the pattern/distribution of scores
matches closely before and after sparsification. We can see that the correlation between the scores
is very strong. The coeficients are always higher than 0.7 and in most cases at least 0.9. Since the
coeficients/ranks computed are all close to 1, we can conclude that the pattern in scores is preserved
well with sparsification. We should note that the coeficients do reduce slightly towards higher levels of
sparsification (around 90%), but this is very little and the correlation still remains strong. Considering the
rankings diferences ( Δ in Table 2), we can see that the rankings are similar. Towards the lower levels of
sparsification, there is only around a 10% diference in rankings, and at high levels of sparsification this
goes up to 30%. However, this is still relatively low, and only appears with high levels of sparsification
(90%). This is also to be expected, since high levels of sparsification should result in greater loss of
information.
Results for the preservation of AAE scores (↑: higher is better; ↓: lower is better), including Spearman (↑) ,
Kendall  (↑) , and the percentage rankings diferences (%)
Δ(↓) between the aggregated and sparse AAE scores.</p>
        <p>Dataset</p>
        <p>Iris
Diabetes
Cancer
COMPAS 
but again this is to be expected as the amount of information lost should increase as we sparsify more.
There is a balance between sparsification and loss of information, but this depends on the type of dataset
and its complexity e.g. the COMPAS dataset (the most complex) loses more information with high
sparsification levels, but this does not happen to the same extent with less complex datasets such as Iris.
However, in general, most of the top scoring arguments do remain within the top 10, and so we can
conclude that the highest scoring arguments with AAEs are preserved well.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Preservation of RAEs</title>
        <p>Overall, the results are mixed and we find that RAEs are not preserved well in the way seen for AAEs.
Overall Pattern</p>
        <p>The results can be found in Tables 4a and 4b. Considering first the Spearman Rank
and Kendall  coeficients (Table 4a), there is a relatively strong correlation between the aggregated
scores and sparse scores. The ranks/coeficients are around 0.8, although this reduces as the sparsification
level is increased. For example, the Spearman Rank of the Diabetes dataset decreases from 0.888 at
20% sparsification to only 0.696 at 80%. This is a similar pattern to what we saw for AAEs, and largely
what is to be expected; the sparser an MLP, the more information is lost. This indicates that the overall
pattern in scores is relatively well preserved. However, compared to the equivalent AAEs analysis
(Table 2), the correlation is significantly lower (around 0.9 for AAEs). Therefore, although the pattern
looks to be preserved, we do lose more information about RAEs with sparsification compared to AAEs.
very high levels of sparsification (90%), but this is always the case for RAEs, even at very low levels
of sparsification. This indicates that significantly more information is lost through sparsification for
RAEs, and the overall pattern in scores is not preserved well for RAEs. This does seem to contradict the
correlation coeficients/ranks seen previously, but this only measures the correlation in the scores and
does not look at the individual scores themselves.</p>
        <p>Looking at both sets of results, we can conclude that although the scores before and after sparsification
are highly correlated, the individual scores and their rankings are afected by sparsification. The overall
pattern is to some extent preserved (strong correlation), but lots of information is lost as a result
especially in the individual scores.</p>
        <p>Highest Scoring Edges</p>
        <p>The results can be found in Tables 5a and 5b. We can see that the most
important edges are not well preserved by sparsification. First looking at Table 5a, we see that in
all cases, a very low percentage of edges remain in the top ten. The results indicate that generally
around 30% of the top-ten edges stay the same, but is as low as 23% in some cases. This fits with our
previous analysis that the individual rankings of edges is not preserved well, and there is a large change
in rankings. This tells us that in general the preservation of the highest scoring edges is poor, and
information about RAEs is lost as a result of sparsification.</p>
        <p>Looking at Table 5b, we again see poor preservation. In most cases the top scoring edge does not
remain high scoring after sparsification. In general, only in around 40% of cases does the top scoring
edge remain high-scoring and this is as low as 23% in some cases. We should note that for the COMPAS
dataset, the highest scoring edges do look to be better preserved. Due to the size and the complexity
of the dataset, significantly fewer MLPs were tested compared to the other datasets. This may have
resulted in the slightly diferent results for COMPAS compared to the other datasets. However, the
pattern across all datasets tested indicates that the highest scoring edges are not well preserved.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. De-aggregated CEs</title>
        <p>We analyse the de-aggregated CEs using a diferent methodology to that used for AAEs and RAEs. We
look at the validity and distance to check the quality of the de-aggregated (approximate) CEs and use
this methodology to improve runtime.</p>
        <p>Validity A CE is valid if the topic argument attains the desired strength using the edge weights
provided by the CE. We create an approximate CE for the original QBAF from the sparse QBAF using
the de-aggregation method in Section 3. We check the validity of our approximate CE and the results are
shown in Table 6 (results labelled 1). Clearly, we can see that the approximation does not successfully
produce a valid CE in the majority of cases. In many cases, the percentage of valid CEs is less than 10%
so our approximation clearly does not work efectively. For the COMPAS dataset, the percentage of
valid CEs does increase, up to around 20% in some cases. This is positive since the COMPAS dataset is
the most complex of the datasets. However, this is still a very low percentage, and therefore we can
conclude that our approximation is not efective. It is likely that our assumption in the approximation
that the edge weights were equally distributed is incorrect, causing this poor performance.
Distance To further understand the quality of the approximate CEs, we also look at the distance. We
check if the topic argument’s strength gets closer to the desired strength than before the CE weights are
applied. The results can be found in Table 6 (results labelled 2). We can see that our approximate CE
does bring the strength of the topic argument closer to the desired strength in the majority of cases. For
the Iris and Cancer datasets, consistently in over 90% of cases, the approximate CE brings the strength
closer. For the Diabetes dataset, this percentage does reduce to around 80%, but this is still high and
in most cases the approximation does succeed. Finally, for the COMPAS dataset, around 85%-90% of
cases generally are closer, except for the 90% sparsification case, where the percentage reduces to 73%.
Overall, however, this is still a positive result in bringing the strength of the topic argument closer to
the desired strength. Note also that the approximation does successfully get closer even at very high
levels of sparsification. This implies that CEs are preserved well with sparsification.</p>
        <p>
          (a)
(b)
Runtime The CE algorithm in [18] works by iteratively updating the weights of the QBAF until the
desired strength is reached. However, the initial weights are randomised and so the algorithm can
take some time to converge to the desired strength (the algorithm may get stuck in a local minimum).
Therefore, to improve convergence, we can initialise the weights of the QBAF with our approximated
CE instead of randomised weights. We perform experiments to see if the runtime was improved using
our approximation. We run all experiments on a Linux PC running 64-bit Ubuntu 24.04, Intel Core
i7-8700 3.20GHz processor and 16GB memory. We compare the following:
(a) Apply the usual CE algorithm i.e. translate the MLP to the equivalent QBAF and apply the CE
algorithm using randomised initial weights.
(b) Use our approximation method i.e. sparsify the MLP, translate to a sparse QBAF, apply the CE
algorithm to the sparse QBAF with random initial weights, create an approximation for the original
MLP and apply the CE algorithm to the original MLP using the approximation as the initial weights.
We plot graphs of the average runtime using the two methods for each dataset and MLP size. For the
second method, this involves checking both how much time is spent on (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) computing the CE on the
sparsified MLP and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) computing the CE on the original MLP starting from the previously computed CE.
We give two of these graphs in Figure 7 (one small MLP, and one larger MLP) for the COMPAS dataset
only (the most complex analysed) for succinctness although the full set of graphs for each dataset can be
found in Appendix A. From the figure, we can see that when the MLP is small (with a small number of
edges), the runtime is longer using our method as more steps must be done to compute the CE, and due
to the small number of edges, the CE algorithm converges quickly anyway. However, when the MLP is
larger, our method does improve the runtime. For this reason, we calculate the percentage reduction in
the average runtime for MLPs of more than 500 edges only. The results are given in Table 7. We see that
for low levels of sparsification our method can still be slower (e.g. for Diabetes). However, in all cases,
for high levels of sparsification there is a reduction in runtime, often by more than 50%. The much
larger number of edges (over 1300 in the graph shown in Figure 7b) means that the CE algorithm takes
longer to converge, so initialising the weights with the approximate CE does improve the runtime. We
can also note that using our initial guess from a very high level of sparsification (e.g. 90%), our method
performs well. This is positive, as even at a high level of sparsification, we can still recover a large
amount of information. Although we cannot easily recover a valid CE, we can still make a good guess.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we explored the impact of sparsification on quantitative argumentative explanations.
Our investigations allow us to understand whether sparsification alters or distorts the argumentative
explanations (AAEs, RAEs, CEs) produced from the resulting QBAF. Without this, users could be misled
by explanations, leading to ethical or legal risks. Our findings showed that AAEs are well-preserved
under sparsification, suggesting that AAEs can be reliably used alongside sparsification to enhance
the interpretability of NNs. In contrast, RAEs appeared less robust, making them challenging to use
alongside sparsification. Finally, for CEs, we saw that sparsification can improve the computational
eficiency of explanation generation, which is particularly useful for large and dense MLPs.</p>
      <p>There are a few avenues for future work. While we found promising empirical results for the
preservation of AAEs, further work is needed to establish theoretical guarantees for such preservation.
Additionally, while our findings do not support the preservation of RAEs, future work could explore
whether gradient-based RAEs (G-RAEs) [18] exhibit better consistency, potentially enabling RAEs to
contribute more efectively to explanations in sparsified settings. Future work could also explore using
a weighted aggregation of the RAE scores, similar to the weighted averaging used by SpArX when
merging the edge weights ([7, Def. 6]) to see if this results in better preservation. Further, for both
AAEs and RAEs, other aggregation methods could be explored. Using other techniques instead of mean
aggregation (e.g. min./max. aggregation) may result in sparsification having a lower impact.</p>
      <p>Finally, although we observed that sparsification can improve the speed of computing CEs in large
MLPs, our method is a heuristic. Further work is necessary to find guarantees as to when our method is
faster, perhaps by finding a lower bound on MLP size for which our method is guaranteed to improve
the runtime. Our approximation method also did not directly produce valid CEs; further work should be
done to find a method of de-aggregating the CEs to produce valid CEs for the original QBAFs. Perhaps
weighting the edges diferently rather than assuming a equal distribution would help.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was partially funded by the European Research Council (ERC) under the European Union’s
Horizon 2020 research and innovation programme (grant agreement No. 101020934, ADIX) and by J.P.
Morgan and by the Royal Academy of Engineering under the Research Chairs and Senior Research
Fellowships scheme. Any views or opinions expressed herein are solely those of the authors.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[4] P. Angelov, E. Soares, Towards explainable deep neural networks (xdnn), Neural Networks 130
(2020) 185–194.
[5] A. Rago, K. Cyras, J. Mumford, O. Cocarascu, Argumentation and Machine Learning, in: D. Gabbay,
G. Kern-Isberner, G. R. Simari, M. Thimm (Eds.), Handbook of Formal Argumentation, Volume 3,
2024. arXiv:2410.23724.
[6] K. Čyras, A. Rago, E. Albini, P. Baroni, F. Toni, Argumentative XAI: A survey, in: Z.-H. Zhou (Ed.),</p>
        <p>IJCAI-21, 2021, pp. 4392–4399.
[7] H. Ayoobi, N. Potyka, F. Toni, SpArX: Sparse Argumentative Explanations for Neural Networks,
in: ECAI, volume 372 of Frontiers in Artificial Intelligence and Applications, 2023, pp. 149–156.
[8] T. Mossakowski, F. Neuhaus, Modular semantics and characteristics for bipolar weighted
argumentation graphs, CoRR (2018). arXiv:1807.06685.
[9] N. Potyka, Interpreting Neural Networks as Quantitative Argumentation Frameworks, Proceedings
of the AAAI Conference on Artificial Intelligence 35 (2021) 6463–6470.
[10] P. Baroni, M. Romano, F. Toni, M. Aurisicchio, G. Bertanza, Automatic evaluation of design
alternatives with quantitative argumentation, Argument &amp; Computation 6 (2015) 24–49.
[11] A. Rago, F. Toni, M. Aurisicchio, P. Baroni, Discontinuity-free decision support with
quantitative argumentation debates, in: 15th International Conference on the Principles of Knowledge
Representation and Reasoning (KR), 2016.
[12] N. Potyka, Continuous dynamical systems for weighted bipolar argumentation, in: International</p>
        <p>Conference on Principles of Knowledge Representation and Reasoning (KR), 2018, pp. 148–157.
[13] L. Amgoud, J. Ben-Naim, Evaluation of arguments in weighted bipolar graphs, International</p>
        <p>Journal of Approximate Reasoning 99 (2018) 39–55.
[14] C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, S. Liu, Salun: Empowering machine unlearning via
gradient-based weight saliency in both image classification and generation, in: The Twelfth
International Conference on Learning Representations, 2024.
[15] S. Han, J. Pool, J. Tran, W. J. Dally, Learning both weights and connections for eficient neural
networks, in: Proceedings of the 29th International Conference on Neural Information Processing
Systems - Volume 1, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, p. 1135–1143.
[16] X. Yin, N. Potyka, F. Toni, Argument attribution explanations in quantitative bipolar argumentation
frameworks, in: ECAI, volume 372, 2023, pp. 2898–2905.
[17] X. Yin, N. Potyka, F. Toni, Explaining arguments’ strength: Unveiling the role of attacks and
supports, in: K. Larson (Ed.), IJCAI-24, 2024, pp. 3622–3630.
[18] X. Yin, N. Potyka, A. Rago, T. Kampik, F. Toni, Contestability in quantitative argumentation, arXiv
preprint arXiv:2507.11323 (2025).
[19] T. Kampik, N. Potyka, X. Yin, K. Čyras, F. Toni, Contribution functions for quantitative bipolar
argumentation graphs: A principle-based analysis, International Journal of Approximate Reasoning
173 (2024) 109255.
[20] J. Delobelle, S. Villata, Interpretability of gradual semantics in abstract argumentation, in:
G. Kern-Isberner, Z. Ognjanović (Eds.), Symbolic and Quantitative Approaches to Reasoning with
Uncertainty, Cham, 2019, pp. 27–38.
[21] L. S. Shapley, Notes on the N-Person Game II: The Value of an n-Person Game, Santa Monica, CA,
1951.
[22] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the black box:</p>
        <p>Automated decisions and the GDPR, Harvard Journal of Law and Technology 31 (2018) 841–887.
[23] R. A. Fisher, Iris, UCI Machine Learning Repository, 1936.
[24] National Institute of Diabetes and Digestive and Kidney Diseases, Diabetes Dataset, Kaggle, 1990.
[25] W. N. Street, W. H. Wolberg, O. L. Mangasarian, Breast Cancer Wisconsin (Diagnostic), UCI</p>
        <p>Machine Learning Repository, 1993.
[26] ProPublica, Compas recidivism risk score data and analysis, GitHub, 2016.
[27] C. Spearman, The proof and measurement of association between two things. (1961).
[28] M. G. Kendall, A new measure of rank correlation, Biometrika 30 (1938) 81–93.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. Runtime graphs</title>
      <p>Here we give the full results for the runtime of our modified CE method compared to the original
CE method. In the plots, the x-axis is the sparsification percentage, and the y-axis is the runtime (in
seconds). The red line represents the average runtime (over the test dataset) of the original CE algorithm,
and the black line is the runtime using our approximation method with various levels of sparsification.
• In Figure 8, we see the results for the Iris dataset.
• In Figure 9, we see the results for the Diabetes dataset.
• In Figure 10 we see the results for the Cancer dataset.</p>
      <p>• In Figure 11, we see the results for the COMPAS dataset.</p>
      <p>We can see from these plots that in general when the MLP is small (a low number of edges), the
original CE method is faster than our approximation method. However, as the MLP size increases, our
method can outperform the original CE method.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>"Why should I trust you?": Explaining the predictions of any classifier</article-title>
          ,
          <source>in: ACM SIGKDD</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Binder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Montavon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Klauschen</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          , W. Samek,
          <article-title>On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation</article-title>
          ,
          <source>PLOS ONE 10</source>
          (
          <year>2015</year>
          )
          <article-title>e0130140</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>