=Paper= {{Paper |id=Vol-3014/paper7 |storemode=property |title=LRP-Based Argumentative Explanations for Neural Networks |pdfUrl=https://ceur-ws.org/Vol-3014/paper6.pdf |volume=Vol-3014 |authors=Purin Sukpanichnant,Antonio Rago,Piyawat Lertvittayakumjorn,Francesca Toni }} ==LRP-Based Argumentative Explanations for Neural Networks== https://ceur-ws.org/Vol-3014/paper6.pdf
LRP-Based Argumentative Explanations for Neural
Networks
Purin Sukpanichnant1 , Antonio Rago1 , Piyawat Lertvittayakumjorn1 and
Francesca Toni1
1
    Imperial College London, Exhibition Rd, South Kensington, London SW7 2AZ, United Kingdom


                                         Abstract
                                         In recent years, there have been many attempts to combine XAI with the field of symbolic AI in
                                         order to generate explanations for neural networks that are more interpretable and better align with
                                         human reasoning, with one prominent candidate for this synergy being the sub-field of computational
                                         argumentation. One method is to represent neural networks with quantitative bipolar argumentation
                                         frameworks (QBAFs) equipped with a particular semantics. The resulting QBAF can then be viewed as
                                         an explanation for the associated neural network. In this paper, we explore a novel LRP-based semantics
                                         under a new QBAF variant, namely neural QBAFs (nQBAFs). Since an nQBAF of a neural network
                                         is typically large, the nQBAF must be simplified before being used as an explanation. Our empirical
                                         evaluation indicates that the manner of this simplification is all important for the quality of the resulting
                                         explanation.

                                         Keywords
                                         Neural networks, Computational argumentation, Image classification




1. Introduction
Several attempts have been made to improve explainability of AI systems. One prominent
research area of XAI is devoted to explaining black-box methods such as deep learning. A
popular method from this area is Layer-wise Relevance Propagation (LRP) [1]. This method
determines how relevant nodes in a neural network are towards the neural network output.
However, LRP does not explicitly indicate the relationship between each node. To address this
issue, we combine this method with computational argumentation. This is a field of study about
how knowledge can be represented as relationships between arguments. Each complete set of
relationship(s) is referred to as an Argumentation Framework (AF) [2]. There are several types
of AF, depending on types of relationships. In this paper, we consider a type of AF known as
Quantitative Bipolar Argumentation Frameworks (QBAFs) [3], which is a form of knowledge
representation displaying relationships between arguments in forms of support and attack.
These attacks and supports lend themselves well to represent negative and positive influences
from input features as obtained using LRP.
   QBAFs are interpreted by semantics which, in a nutshell, determine the arguments’ dialectical
strengths, taking into account (the dialectical strength of) their attackers and supporters.
XAI.it 2021 - Italian Workshop on Explainable Artificial Intelligence
Envelope-Open ps1620@imperial.ac.uk (P. Sukpanichnant); a.rago@imperial.ac.uk (A. Rago); pl1515@imperial.ac.uk
(P. Lertvittayakumjorn); ft@imperial.ac.uk (F. Toni)
                                       Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   As QBAFs illustrate how arguments relate to one another, they can be applied to reflect
the relationship between nodes of a neural network, which can be viewed as an explanation.
However, to do this, one needs to match the neural network functioning and the QBAF semantics.
In this paper, we focus on LRP as a semantics for suitable forms of the QBAFs that we introduce.
QBAFs derived by an LRP-based semantics may be very large and too complicated for human
cognition in the context of explanation. Hence a new variant of QBAF is needed. To address this
issue, we introduce a new variant of QBAF, namely neural QBAFs (nQBAFs), under LRP-based
semantics for generating argumentative explanations from neural networks and prove their
dialectical properties. Finally, we conduct some preliminary experiments by applying our
LRP-based semantics to the Deep Argumentative Explanation (DAX) method from [4] and the
method from [5] in order to show practical issues with nQBAFs as explanations. This is work in
progress, on exploring the use of LRP, in combination with other techniques, in visualisation for
image classification: we leave a comparison with visualisations drawn from nQBAFs as future
work.


2. Background
We start by defining relevant concepts for our setting. These amount to multi-layer perceptrons
(MLPs), Layer-wise Relevance Propagation (LRP) and Quantitative Bipolar Argumentation
Frameworks (QBAFs).

2.1. MLP Basics
A MLP is a form of feed-forward neural network where all neurons in one layer are connected to
all neurons in the next layer. We follow [6] for background on MLPs, captured by Definitions 1
and 2 below.
Definition 1. A Multi-layer Perceptron (MLP) is a tuple βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© where
     β€’ βŸ¨π‘‰ , 𝐸⟩ is an acyclic directed graph.
     β€’ 𝑉 = βŠŽπ‘‘+1
              0 𝑉𝑖 is the disjoint union of sets of nodes 𝑉𝑖 ;
     β€’ We call 𝑉0 the input layer, 𝑉𝑑+1 the output layer and 𝑉𝑖 the i-th hidden layer for 1 ≀ 𝑖 ≀ 𝑑;
               𝑑
     β€’ 𝐸 βŠ† ⋃𝑖=0 (𝑉𝑖 Γ— 𝑉𝑖+1 ) is a set of edges between subsequent layers;
     β€’ 𝐡 ∢ (𝑉 ⧡ 𝑉0 ) β†’ ℝ assigns a bias to every non-input node;
     β€’ πœƒ ∢ 𝐸 β†’ ℝ assigns a weight to every edge.
   Figure 1 (left) visualises a fragment of an MLP with at least two hidden layers. Note that any
MLP referred to afterwards only has one output node. This may be obtained from extracting a
fragment of another MLP all nodes that have paths1 to the chosen output node, including the
output node itself.
   MLPs typically result from training with sample data. Since this training is not a focus of
this paper, we will simply assume that a trained MLP is available. For example, in Section 5, we
will conduct experiments with a pre-trained MLP for image classification.
    1
     The definition of path is adopted from [4], where there exists a path via 𝐸 (set of edges) from π‘›π‘Ž to 𝑛𝑏 (from a
node to another) iff βˆƒπ‘›1 , ..., 𝑛𝑑 with 𝑛1 = π‘›π‘Ž and 𝑛𝑑 = 𝑛𝑏 such that (𝑛1 , 𝑛2 ), ..., (π‘›π‘‘βˆ’1 , 𝑛𝑑 ) ∈ 𝐸.
   The next definition explains how we obtain an activation value for each node.

Definition 2. For any 𝑗 ∈ 𝑉0 , the activation π‘₯𝑗 ∈ ℝ of node 𝑗 is an input value for 𝑗. For any π‘˜
such that 1 ≀ π‘˜ ≀ 𝑑 + 1, the activation of node 𝑖 ∈ π‘‰π‘˜ is π‘₯𝑖 = π‘Žπ‘π‘‘(𝐡(𝑖) + Ξ£π‘›βˆˆπ‘‰π‘˜βˆ’1 π‘₯𝑛 πœƒ(𝑛, 𝑖)) where act:
ℝ β†’ ℝ is an activation function.2

  Activations are a fundamental component of a neural network. They are involved in the
calculation process of a neural network from a given input towards the output layer. An
activation of each node can also be used to explain what the neural network is emphasising, as
we discuss in the next section.

2.2. LRP Basics
Layer-Wise Relevance Propagation (LRP) [1] is a method for obtaining explanations, for outputs
of MLPs in particular. Intuitively, with LRP, each node of the MLP is given a relevance score,
showing how this node contributes to the node of interest in the output layer. Starting from the
output layer, the node we want to explain has its relevance score equal to its activation while
other nodes of the output layer (if any) have zero relevance score. Then we can calculate the
relevance score for each non-output node using Definition 3, adapted from the presentation of
LRP in [7].

Definition 3. Let βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© be an MLP, 𝑖 ∈ π‘‰π‘˜ , and 𝑗 ∈ π‘‰π‘˜+1 where 0 ≀ π‘˜ ≀ 𝑑, and the layer π‘˜ has 𝑛
                                                                                                 𝑧
nodes. Then the relevance score the node 𝑖 receives from the node 𝑗 is 𝑅𝑖←𝑗 such that 𝑅𝑖←𝑗 = Σ𝑛 𝑖𝑗 𝑧 𝑅𝑗
                                                                                                               𝑙=1 𝑙𝑗
                                                                                           𝐡(𝑗)
where 𝑧𝑖𝑗 is the contribution from 𝑖 to 𝑗 during the forward pass, i.e., 𝑧𝑖𝑗 = π‘₯𝑖 πœƒ(𝑖, 𝑗) + 𝑛 + π‘›πœ– where
πœ– ∈ ℝ is a small positive stabiliser.

   Note that this definition assumes that πœ– is distributed equally to the 𝑛 nodes: we adopt this
assumption from [7]. To calculate the relevance score node 𝑖 has towards the output node of
interest, i.e. 𝑅𝑖 , we simply sum all the relevance scores it receives from all the nodes of the layer
π‘˜ + 1. In other words, 𝑅𝑖 = Σ𝑗 𝑅𝑖←𝑗 .
   From Definition 3, we obtain also that LRP has conservative properties (for 𝑖 ∈ π‘‰π‘˜ , and 𝑗 ∈ π‘‰π‘˜+1 ),
i.e., 𝑅𝑗 = Σ𝑖 𝑅𝑖←𝑗 and Σ𝑖 𝑅𝑖 = Σ𝑗 𝑅𝑗 .

2.3. QBAF Basics
QBAFs [3] are abstractions of debates between arguments, where arguments may attack or
support one another and are equipped with a base score, which reflects the arguments’ intrinsic,
initial dialectical strength. We adopt the formal definition of QBAFs from [3].

Definition 4. A QBAF is a tuple ⟨𝐴, 𝐴𝑑𝑑, 𝑆𝑒𝑝𝑝, 𝛾 ⟩ where

     β€’ 𝐴 is a set (whose elements are referred to as arguments);
     β€’ 𝐴𝑑𝑑 βŠ† 𝐴 Γ— 𝐴 is the attack relation;

    2
     Note that, with an abuse of notation, πœƒ(𝑛, 𝑖) stands for πœƒ((𝑛, 𝑖)), for simplicity. Unless explicitly stated, this
notation is used throughout the rest of the paper.
    β€’ 𝑆𝑒𝑝𝑝 βŠ† 𝐴 Γ— 𝐴 is the support relation;
    β€’ 𝛾 ∢ 𝐴 β†’ 𝐷 is a function that maps every argument to its base score (from some set 𝐷 of a
      given set of values).3

   A QBAF may be equipped with a notion of dialectical strength, given by a strength function
𝜎 ∢ 𝐴 β†’ 𝐷, indicating a dialectical strength value (again from 𝐷) for each argument, taking into
account the strength of the attacking and supporting arguments within the debate represented
by the QBAF, as well as the argument’s intrinsic strength given by 𝛾. Several notions of 𝜎 (called
semantics in the literature on computational argumentation) have been given in the literature
(e.g. see [8]) but their formal definitions are outside the scope of this paper. Various dialectical
properties for semantics 𝜎 have been studied in the literature (e.g. see [8]) as a way to validate
their use in concrete settings and to compare across different semantics. We will follow this
approach in this paper.
   Variants of QBAFs can be extracted from neural networks, e.g. as in [6, 4]. An example of the
structure underpinning these QBAFs is given in Figure 1 (centre, for the MLP on the left): here,
the nodes represent the arguments and the edges represent the union of the attack and support
relations. In these works, the extracted QBAF can be seen as indicating how some nodes in
the neural network relate to others, and hence can be viewed as an explanation of that neural
network. We follow this approach in this paper, but using a variant of QBAFs, defined next.


3. nQBAFs and LRP-based Argumentation Semantics
We study LRP as a semantics 𝜎 for novel forms of QBAFs extracted from MLPs. We aim to prove
that this LRP-based semantics satisfies multiple dialectical properties, which we believe are
intuitive when QBAFs are used as the basis for explanations of MLPs.
   The novel QBAFs take into account the structure of MLPs. As of Definition 3, a non-output
node in an MLP may contribute to several nodes of the next layer, as in Figure 1 (centre). For any
non-output node 𝑖, if we consider each edge from 𝑖 to a node of the next layer and represent the
node 𝑖 with a unique argument for every edge (as in [6, 9]), there would be several arguments
representing that node 𝑖. This method would also be non-scalable since the relation between
arguments in the resulting QBAF would become too complex to analyse as more layers are
considered. To avoid this, we define a new, leaner form of QBAFs, where arguments referring
to the same node are grouped together.

Definition 5. A neural quantitative bipolar argumentation framework (nQBAF) is a tuple
⟨𝐴, 𝐴𝑑𝑑, 𝑆𝑒𝑝𝑝, 𝛾 ⟩ where

    β€’ 𝐴 is a set (of arguments);
    β€’ 𝐴𝑑𝑑 βŠ† 𝐴 Γ— 𝒫 (𝐴)4 is the attack relation;
    β€’ 𝑆𝑒𝑝𝑝 βŠ† 𝐴 Γ— 𝒫 (𝐴) is the support relation;
    β€’ 𝛾 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ {0} is a function that maps every argument and set of arguments to a fixed
      base score of zero.
    3
        In this paper, we will choose 𝐷 = ℝ.
    4
        Note that 𝒫 (𝐴) is the power set of a set 𝐴.
Figure 1: Example of an MLP (left), a standard QBAF (centre) and the associated nQBAF (right). Each
box refers to a group of arguments. In the QBAF and the nQBAF, dashed lines represent attacks and
solid lines represent supports.


   Thus, attack and support relations may exist not just between arguments, as in standard
QBAFs, but also between arguments and sets thereof. Given that we choose 𝐷 = ℝ as the set of
values that could be used as base score and strength of arguments, the choice of 𝛾 indicates that
each argument and set of arguments starts with a β€œneutral” base score of zero.
   We first need to relate arguments of an nQBAF and nodes of a given MLP βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ©. Each
argument represents only one node but a node can be represented by several arguments.
Accordingly, we assume a function 𝜌 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ 𝑉 βˆͺ {βŠ₯} mapping each argument/set
of arguments to a node of the MLP, if one exists (or mapping to βŠ₯ otherwise). We omit the
formal definition of 𝜌 for lack of space. As an illustration, for the MLP in Figure 1 (left), in
the derived nQBAF (right), 𝑛1 = 𝜌(𝛼12 ) = 𝜌(𝛼13 ) = 𝜌(𝛼14 ) = 𝜌({𝛼12 , 𝛼13 , 𝛼14 }), 𝑛2 = 𝜌(𝛼25 ) =
𝜌(𝛼26 ) = 𝜌({𝛼25 , 𝛼26 }), 𝑛3 = 𝜌(𝛼35 ) = 𝜌(𝛼36 ) = 𝜌({𝛼35 , 𝛼36 }), 𝑛4 = 𝜌(𝛼45 ) = 𝜌(𝛼46 ) = 𝜌({𝛼45 , 𝛼46 }),
𝑛5 = 𝜌(𝛼5 ) = 𝜌({𝛼5 }), 𝑛6 = 𝜌(𝛼6 ) = 𝜌({𝛼6 }), 𝑛0 = 𝜌(𝛼0 ) = 𝜌({𝛼0 }) and, for any other set 𝑆 of
arguments, 𝜌(𝑆) = βŠ₯.
   We then have to determine which pairs (i.e. edges as shown in Figure 1 (right)) belong to the
attack or support relations. This is done using two relation characterisations, inspired by those
in [4]: 𝑐+ , π‘βˆ’ ∢ 𝐴 Γ— 𝒫 (𝐴) β†’ {π‘‘π‘Ÿπ‘’π‘’, 𝑓 π‘Žπ‘™π‘ π‘’} where, for any argument 𝑖 and group of arguments 𝑗
such that 𝜌(𝑖) β‰  βŠ₯ and 𝜌(𝑗) β‰  βŠ₯ are in adjacent layers (i.e. (𝜌(𝑖), 𝜌(𝑗)) ∈ 𝐸):

    β€’ 𝑐+ (𝑖, 𝑗) is true iff π‘…πœŒ(𝑖)β†πœŒ(𝑗) > 0, and
    β€’ π‘βˆ’ (𝑖, 𝑗) is true iff π‘…πœŒ(𝑖)β†πœŒ(𝑗) < 0.

   With 𝑐+ and π‘βˆ’ , we can formally define our 𝐴𝑑𝑑 and 𝑆𝑒𝑝𝑝 relations and the nQBAF derived
from an MLP, as follows:

Definition 6. The nQBAF derived from βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© is ⟨𝐴,𝐴𝑑𝑑,𝑆𝑒𝑝𝑝,𝛾 ⟩ where

    β€’ 𝐴 is defined according to Algorithm 1;
    β€’ 𝐴𝑑𝑑 = {(𝑖, 𝑗) ∈ 𝐴 Γ— 𝒫 (𝐴) ∣ π‘βˆ’ (𝑖, 𝑗) is true};
    β€’ 𝑆𝑒𝑝𝑝 = {(𝑖, 𝑗) ∈ 𝐴 Γ— 𝒫 (𝐴) ∣ 𝑐+ (𝑖, 𝑗) is true};
 Algorithm 1: Extracting A from a given MLP
  𝐴 ← {};
  π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ ← 𝑑;
  while π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ >= 0 do
      for 𝑛𝑖 in π‘‰π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ do
          for 𝑛𝑗 in π‘‰π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ+1 do
              if (𝑛𝑖 , 𝑛𝑗 ) in 𝐸 then
                   𝐴 ← 𝐴 βˆͺ {𝛼𝑖𝑗 }
      π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ ← π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ βˆ’ 1
  for π›Όπ‘šπ‘› in 𝐴 do
      if 𝜌(π›Όπ‘šπ‘› ) in 𝑉0 then
          𝐴 ← 𝐴 βˆͺ {𝛼(π‘šπ‘›)β€² π‘šπ‘› }


    β€’ 𝛾 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ {0}.

   Algorithm 1 extracts the set of arguments by iterating backwards from the last hidden layer
to the input layer. It also add imaginary arguments to the set of arguments for input nodes, for
the reason discussed in the next section.
   Before we define our strength function, let us introduce some notation:

    β€’ 𝐴𝑑𝑑(π‘₯) = {π‘Ž ∈ 𝐴 ∣ (π‘Ž, π‘₯) ∈ 𝐴𝑑𝑑 } for all π‘₯ ∈ 𝒫 (𝐴);
    β€’ 𝑆𝑒𝑝𝑝(π‘₯) = {𝑠 ∈ 𝐴 ∣ (𝑠, π‘₯) ∈ 𝑆𝑒𝑝𝑝 } for all π‘₯ ∈ 𝒫 (𝐴);
    β€’ πΊπ‘Ÿπ‘œπ‘’π‘π‘  = {𝑔 ∈ 𝒫 (𝐴) ∣ βˆƒπ‘Ž ∈ 𝐴[(π‘Ž, 𝑔) ∈ 𝐴𝑑𝑑 ∨ (π‘Ž, 𝑔) ∈ 𝑆𝑒𝑝𝑝]}.

  Now we define the LRP-based semantics for our nQBAF as follows:

Definition 7. The LRP-based semantics of the nQBAF derived from an MLP βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© is 𝜎 ∢
𝐴 βˆͺ πΊπ‘Ÿπ‘œπ‘’π‘π‘  β†’ ℝ such that
       ⎧π‘₯𝑖          if 𝜌(π‘₯) ∈ 𝑉𝑑+1 with final activation π‘₯𝑖
       βŽͺπ‘…π‘šβ†πœŒ(𝑦)     if π‘₯ = 𝛼(π‘šπ‘›)β€² π‘šπ‘› , 𝑧 = π›Όπ‘šπ‘› π‘Žπ‘›π‘‘ βˆƒ!(𝑧, 𝑦) ∈ 𝐴𝑑𝑑 βˆͺ 𝑆𝑒𝑝𝑝
       βŽͺ
𝜎(π‘₯) = π‘…πœŒ(π‘₯)β†πœŒ(𝑦) if βˆƒ!(π‘₯, 𝑦) ∈ 𝐴𝑑𝑑 βˆͺ 𝑆𝑒𝑝𝑝
       ⎨
       βŽͺΞ£π‘Žβˆˆπ‘₯ 𝜎 (π‘Ž)  if π‘₯ ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ 
       βŽͺ
       ⎩0           π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

  Now we are able to conceive the relations between arguments, and to what amount each
argument supports or attacks a group of arguments, but how natural is it? Does it follow the
way humans naturally debate? To answer these questions, we have to consider whether our
nQBAFs satisfy dialectical properties.


4. Properties for nQBAFs under LRP semantics
We now consider dialectical properties that determine how natural the argumentation is for
any argument framework, i.e. how similar it is to human reasoning and debate. Our dialectical
Table 1
Dialectical properties for nQBAFs adapted from [4] and [3].
  #    Property                                                              Name
  1    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) = Ξ£π‘₯βˆˆπ΄π‘‘π‘‘(𝑔) 𝜌(π‘₯) + Ξ£π‘₯βˆˆπ‘†π‘’π‘π‘(𝑔) 𝜌(π‘₯)                  Additive Monotonicity
  2    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) = βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) = βˆ… β†’ 𝜎(𝑔) = Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯).            Balance
  3    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) β‰  βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) = βˆ… β†’ 𝜎(𝑔) < Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯).            Weakening
  4    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) = βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) β‰  βˆ… β†’ 𝜎(𝑔) > Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯).            Strengthening
  5    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) < Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯) β†’ 𝐴𝑑𝑑(𝑔) β‰  βˆ….                          Weakening Soundness
  6    βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) > Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯) β†’ 𝑆𝑒𝑝𝑝(𝑔) β‰  βˆ….                         Strengthening Soundness
  7    βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧      Equivalence
       Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) = 𝜎(𝑔2 ).
  8    βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) βŠ‚ 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧      Attack Counting
       Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔2 ) < 𝜎(𝑔1 ).
  9    βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝑆𝑒𝑝𝑝(𝑔1 ) βŠ‚ 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧      Support Counting
       Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) < 𝜎(𝑔2 ).
  10   βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧      Base Score Reinforcement
       Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) > Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ).
  11   βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝑔1 <π‘Ž 𝑔2 ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) =   Attack Reinforcement
       Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ).
  12   βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑔1 >𝑠 𝑔2 ∧ Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) =     Support Reinforcement
       Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ).


properties, as shown in Table 1, are based on those in [4] and [3] but are adapted specifically
for nQBAFs. In the table, we associate these properties with names, mostly borrowing from the
literature, where, however, they have been used for other types of argumentation frameworks.
   Before defining the properties, we first make an addition regarding the input layer. Every
dialectical property which follows considers the strength of a group of arguments based on
its attackers and supporters. As of now, there are no attackers or supporters for groups of
arguments representing nodes of the input layer, so it is likely most properties will not be
satisfied here. To resolve this issue, we add imaginary arguments to target the input nodes.
These added arguments are not considered as part of πΊπ‘Ÿπ‘œπ‘’π‘π‘ . Formally, for any 𝑔 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ 
such that 𝜌(𝑔) ∈ 𝑉0 , 𝐴𝑑𝑑(𝑔) = {π‘₯ ∈ 𝐴 ∣ 𝜌(π‘₯) = βŠ₯ ∧ βˆƒπ‘Ž ∈ 𝑔[𝜎 (π‘₯) = 𝜎 (π‘Ž) ∧ 𝜎 (π‘₯) < 0]} and
𝑆𝑒𝑝𝑝(𝑔) = {π‘₯ ∈ 𝐴 ∣ 𝜌(π‘₯) = βŠ₯ ∧ βˆƒπ‘Ž ∈ 𝑔[𝜎 (π‘₯) = 𝜎 (π‘Ž) ∧ 𝜎 (π‘₯) > 0]} and |𝐴𝑑𝑑(𝑔) βˆͺ 𝑆𝑒𝑝𝑝(𝑔)| = |𝑔|. For
example, a given input node may be represented by a group of arguments {𝛼𝑖 , … , 𝛼𝑛 } and a set
of supporting/attacking arguments {𝛼𝑐𝑖 , … , 𝛼𝑐𝑛 } corresponding to each argument of the group.
   According to Table 1, to explain, Additive Monotonicity requires that the strength of a group
of arguments is the sum of that of its supporters and attackers. Balance requires that the strength
of a group of arguments differs from the sum of base scores of that group only if such a group is
a target of other arguments. Weakening requires that when there are no supporters but at least
one attacker, the strength of a group of arguments is lower than the total sum of base scores
of that group. Conversely, Strengthening considers the situation when there are no attackers
but at least one supporter instead. Weakening Soundness is loosely the opposite direction of
Weakening, requiring that if the strength of a group of arguments is lower than the sum of base
scores of that group, then the group must have at least one attacker. Similarly, Strengthening
Soundness is loosely the opposite direction of Strengthening. Equivalence states that groups of
arguments with equal conditions in terms of attackers, supporters and the sum of base scores
within a group have the same strength. Attack Counting (Support Counting) requires that a
strictly larger set of attackers (supporters, respectively) determines a lower (higher, respectively)
strength. Base Score Reinforcement requires that a higher sum of base scores gives a higher
strength. For the last two properties, we have to define the notion of weaker and stronger
attack/support relations between sets.

Definition 8. For any set 𝐴, 𝐡 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ :
  𝐴 <π‘Ž 𝐡 iff Ξ£π‘₯βˆˆπ΄π‘‘π‘‘(𝐴) 𝜎 (π‘₯) > 𝜎π‘₯βˆˆπ΄π‘‘π‘‘(𝐡) 𝜎 (π‘₯);
  𝐴 <𝑠 𝐡 iff Ξ£π‘₯βˆˆπ‘†π‘’π‘π‘(𝐴) 𝜎 (π‘₯) < 𝜎π‘₯βˆˆπ‘†π‘’π‘π‘(𝐡) 𝜎 (π‘₯);
  𝐴 >π‘Ž 𝐡 iff 𝐡 <π‘Ž 𝐴; 𝐴 >𝑠 𝐡 iff 𝐡 <𝑠 𝐴.

   Then, Attack Reinforcement states that a weaker set of attackers determines a higher strength
whereas Support Reinforcement states that a stronger set of supporters determines a higher
strength.
   Any nQBAF satisfies all given properties (proofs omitted for lack of space). This indicates
that our LRP-based nQBAFs may align with human reasoning.

Proposition 1. nQBAFs under LRP-based semantics satisfy Properties 1-12.


5. Empirical Study
We apply the LRP-based semantics empirically on two different approaches, namely deep
argumentative explanation (DAX) [4] and the approach in [5] by Google. We then analyse the
obtained explanations qualitatively.

5.1. DAX Basics
DAX [4] is a general methodology for building local explanations (i.e. input-based explanations)
for a neural network outputs. Unlike other explanation methods which are only based on inputs
(and thus can be deemed to be flat), DAX takes account of the hidden layers as well. DAX is
based on extracting an argumentation framework from a neural network; explanations are
then drawn from the framework, represented in a comprehensible format to humans. The
extraction of the argumentation framework requires the choice of a semantics (for determining
the strength of arguments) directly matching the behaviour of the neural network.
   Here we apply DAX using our LRP semantics at its core. Also, we choose nQBAFs as the
argumentation framework underpinning DAXs. We may theoretically achieve a full (local)
explanation by viewing the entire nQBAF extracted from a neural network. However, the
explanation would be too large for complex networks, therefore too complicated for humans to
comprehend. To make things human-scale, we only consider a fragment of the nQBAF, in the
spirit of [4], as well as grouping groups of arguments representing a single node (i.e. grouping
nodes) together, and visualise the grouping as an explanation.
5.2. The Basics of Google’s Method
Google’s method [5] combines feature visualisation (i.e. what is a neuron looking for?, see [10])
with attribution (i.e. how does a specific node contributes to the output?) to generate a local
explanation for a neural network output. We use the implementation of this method available
at [11], changing the attribution method from a linear correlation to LRP. We leverage on the
existing implementation’s choices for visualisation.

5.3. Settings
For both methods, we aim to explain a Keras VGG16 model [12] (with linear activation function
for the output layer) pretrained on the ImageNet dataset [13]. Since the whole model is too
large, we only consider the last convolutional layer, explaining what the layer prioritises in a
given image. We test our method in combination with DAX, comparing it to Google’s method,
on three images: a police van from [14], a barbell from [15], and a diaper from [16]. In all cases,
we use the output node with maximum activation as the output class, with such an activation
referred to as the output prediction.
   To generate explanations using DAX, we modify the code from the ArgFlow library [9] and
apply to each of the three images. For each explanation, the size of each image illustrates the
attribution thereof towards the output class, with red and green arrows depicting attacking and
supporting the output class prediction respectively.
   For Google’s approach, we modify the code from [11] which is one of the Colaboratory
notebooks in [5]. We then apply the code to the three images, each results in the set of images
indicating parts of the original image. Each number below each factor refers to how much
attribution each component has towards the output prediction. The arrow sizes also reflect
these attributions.

5.4. DAX vs Google Comparisons
Image 1: Police Van. Explanations from both methods (as shown in Figure 2) indicate that the
model focuses mostly on the background and the red stripe of the van. There are some subtle
differences between them mainly with the strength for each factor, but their factors are quite
similar. However, an interesting point is that DAX considers the siren light of the van as one of
the top six factors contributing to the output class prediction (according to the rightmost image
of Figure 2a) while Google’s approach does not present this (arguably important) factor.
   Image 2: Barbell. According to Figure 3, both methods explain that the model focuses on the
plates and the background. However, DAX considers the plates to contribute to the prediction
more than the background, while it is the opposite for the Google’s explanation. Somewhat
counter-intuitively though, DAX considers the plates to both attack (the fourth image from
the right of Figure 3a) and support (the rightmost image from the right of Figure 3a) the class
prediction, even though the attacking argument (the fourth image from the right) is much
less strong. If the DAX is faithful to the model, then this incongruence may result from an
incongruence in the model.
   Image 3: Diaper. From Figure 4, both methods indicate that the model focuses on other
things instead of the diaper. The DAX in Figure 4a shows that the model focuses on the baby
                                        (a) The DAX approach




                                        (b) Google’s approach
Figure 2: Explanations given using (a) the DAX approach (with attacks in red and supports in green,
either indicated in the filters or as arrows, and the size of arguments for the filters indicating their
dialectical strength, see [4] for details) and (b) Google’s approach for the police van image with the
predicted class police_van (with arrows indicating support, and the size of arrows representing the LRP
values). The police van image source is (https://bit.ly/3D8FZya).


instead of the diaper. It even indicates that the diaper attacks the prediction of the class itself. In
contrast, Google’s explanation (Figure 4b) indicates that the model focuses on the background
and the diaper, giving the baby lower attributions.
                                     (a) The DAX approach




                                      (b) Google’s approach
Figure 3: Explanations given using (a) the DAX approach and (b) Google’s approach for the barbell
image with the predicted class barbell. The barbell image source is (https://amzn.to/3Db2xOQ).


5.5. Discussion
The comparisons above clearly indicate that even with similar semantics (LRP), for the same
model, explanations vary depending on how the grouping (of argument groups) is done. Google’s
approach seems to take account of the fact that concepts are usually recognised around particular
positions of an image, whereas DAX only focuses on the concepts. DAX seems to unearth
conflicts, with the same feature both attacking and supporting a prediction. Overall, more
experimentation is needed to understand which explanation method is more β€œfaithful” to the
                                       (a) The DAX approach




                                       (b) Google’s approach
Figure 4: Explanations given using (a) the DAX approach and (b) Google’s approach for the baby image
with the predicted class diaper. The diaper image source is (https://bit.ly/3D8FZya).


underlying model.


6. Conclusions
We presented a variant of Quantitative Bipolar Argumentation Frameworks (QBAFs) called
neural QBAFs (nQBAFs) and considered how the LRP-based semantics satisfies the modified
dialectical properties for nQBAFs. We also conducted preliminary experiments explaining an
image classifier, by applying the LRP-based semantics to two approaches: Deep Argumentative
Explanation (DAX) and Google’s approach, and comparing both explanations. DAX groups
argument groups (i.e. nodes) in the same filter together, while Google’s approach groups them
by means of matrix factorisation optimising for activations. The comparison shows that how
argument groups (each representing a node) are grouped can affect the resulting explanations.
As future work, we plan to conduct experiments with using nQBAFs for visualisation for text
classification, in comparison with DAX and Google’s approaches with LRP as well as other
methods, such as smoothgrad [17], deeplift [18], gradcam [19] and TCAV [20]. Finally, it would
be interesting to conduct experiments to assess demands on the cognitive load for end-users
using different (instantiations of) visualisations.


Acknowledgments
The first author was funded in part by Imperial College London under UROP (Undergraduate
Research Opportunities Programme). The last author was partially funded by the European
Research Council (ERC) under the European Union’s Horizon 2020 research and innovation
programme (grant agreement No. 101020934). Finally, Rago and Toni were partially funded by
J.P. Morgan and by the Royal Academy of Engineering under the Research Chairs and Senior
Research Fellowships scheme. Any views or opinions expressed herein are solely those of
the authors listed, and may differ, in particular, from the views and opinions expressed by
Imperial College London or its affiliates and by J.P. Morgan or its affiliates. This material is not
a product of the Research Department of J.P. Morgan Securities LLC. This material should not
be construed as an individual recommendation for any particular client and is not intended as
a recommendation of particular securities, financial instruments or strategies for a particular
client. This material does not constitute a solicitation or offer in any jurisdiction.


References
 [1] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. MΓΌller, Layer-Wise Relevance
     Propagation: An Overview, Springer International Publishing, Cham, 2019, pp. 193–209.
     URL: https://doi.org/10.1007/978-3-030-28954-6_10. doi:10.1007/978- 3- 030- 28954- 6_
     10 .
 [2] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic
     reasoning, logic programming and n-person games, Artificial Intelligence 77 (1995) 321–357.
     URL: https://www.sciencedirect.com/science/article/pii/000437029400041X. doi:https://
     doi.org/10.1016/0004- 3702(94)00041- X .
 [3] P. Baroni, A. Rago, F. Toni, How many properties do we need for gradual argumentation?,
     in: AAAI, 2018.
 [4] E. Albini, P. Lertvittayakumjorn, A. Rago, F. Toni, Deep argumentative explanations, 2021.
     arXiv:2012.05766 .
 [5] C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, A. Mordvintsev, The
     building blocks of interpretability, Distill 3 (2018). doi:10.23915/distill.00010 .
 [6] N. Potyka, Interpreting neural networks as quantitative argumentation frameworks,
     Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 6463–6470. URL:
     https://ojs.aaai.org/index.php/AAAI/article/view/16801.
 [7] P. Lertvittayakumjorn, L. Specia, F. Toni, FIND: Human-in-the-Loop Debugging Deep
     Text Classifiers, in: Proceedings of the 2020 Conference on Empirical Methods in Natural
     Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020,
     pp. 332–348. URL: https://aclanthology.org/2020.emnlp-main.24. doi:10.18653/v1/2020.
     emnlp- main.24 .
 [8] P. Baroni, A. Rago, F. Toni, From fine-grained properties to broad principles for gradual
     argumentation: A principled spectrum, Int. J. Approx. Reason. 105 (2019) 252–286. URL:
     https://doi.org/10.1016/j.ijar.2018.11.019. doi:10.1016/j.ijar.2018.11.019 .
 [9] A. Dejl, P. He, P. Mangal, H. Mohsin, B. Surdu, E. Voinea, E. Albini, P. Lertvittayakumjorn,
     A. Rago, F. Toni, Argflow: A Toolkit for Deep Argumentative Explanations for Neural
     Networks, International Foundation for Autonomous Agents and Multiagent Systems,
     Richland, SC, 2021, p. 1761–1763.
[10] C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, Distill 2 (2017). doi:10.23915/
     distill.00007 .
[11] L. Google, Neuron groups – building blocks of interpretability, 2018. URL: https://bit.ly/
     3a483Xc.
[12] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image
     recognition, in: International Conference on Learning Representations, 2015.
[13] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical
     image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition,
     2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848 .
[14] Wataree, Police van thailand, 2019. URL: https://bit.ly/3Fi1oqx.
[15] I. Synergee Fitness Worldwide, 2019. URL: https://amzn.to/3Db2xOQ.
[16] websubstance, Baby tummy time, n.d. URL: https://bit.ly/3D8FZya.
[17] D. Smilkov, N. Thorat, B. Kim, F. ViΓ©gas, M. Wattenberg, Smoothgrad: removing noise by
     adding noise, 2017. arXiv:1706.03825 .
[18] J. Li, C. Zhang, J. T. Zhou, H. Fu, S. Xia, Q. Hu, Deep-lift: Deep label-specific feature
     learning for image annotation, IEEE Transactions on Cybernetics (2021) 1–10. doi:10.
     1109/TCYB.2021.3049630 .
[19] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual
     explanations from deep networks via gradient-based localization, in: Proceedings of the
     IEEE International Conference on Computer Vision (ICCV), 2017.
[20] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, R. sayres, Interpretability
     beyond feature attribution: Quantitative testing with concept activation vectors (TCAV), in:
     J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learn-
     ing, volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 2668–2677.
     URL: https://proceedings.mlr.press/v80/kim18d.html.