LRP-Based Argumentative Explanations for Neural Networks Purin Sukpanichnant1 , Antonio Rago1 , Piyawat Lertvittayakumjorn1 and Francesca Toni1 1 Imperial College London, Exhibition Rd, South Kensington, London SW7 2AZ, United Kingdom Abstract In recent years, there have been many attempts to combine XAI with the field of symbolic AI in order to generate explanations for neural networks that are more interpretable and better align with human reasoning, with one prominent candidate for this synergy being the sub-field of computational argumentation. One method is to represent neural networks with quantitative bipolar argumentation frameworks (QBAFs) equipped with a particular semantics. The resulting QBAF can then be viewed as an explanation for the associated neural network. In this paper, we explore a novel LRP-based semantics under a new QBAF variant, namely neural QBAFs (nQBAFs). Since an nQBAF of a neural network is typically large, the nQBAF must be simplified before being used as an explanation. Our empirical evaluation indicates that the manner of this simplification is all important for the quality of the resulting explanation. Keywords Neural networks, Computational argumentation, Image classification 1. Introduction Several attempts have been made to improve explainability of AI systems. One prominent research area of XAI is devoted to explaining black-box methods such as deep learning. A popular method from this area is Layer-wise Relevance Propagation (LRP) [1]. This method determines how relevant nodes in a neural network are towards the neural network output. However, LRP does not explicitly indicate the relationship between each node. To address this issue, we combine this method with computational argumentation. This is a field of study about how knowledge can be represented as relationships between arguments. Each complete set of relationship(s) is referred to as an Argumentation Framework (AF) [2]. There are several types of AF, depending on types of relationships. In this paper, we consider a type of AF known as Quantitative Bipolar Argumentation Frameworks (QBAFs) [3], which is a form of knowledge representation displaying relationships between arguments in forms of support and attack. These attacks and supports lend themselves well to represent negative and positive influences from input features as obtained using LRP. QBAFs are interpreted by semantics which, in a nutshell, determine the arguments’ dialectical strengths, taking into account (the dialectical strength of) their attackers and supporters. XAI.it 2021 - Italian Workshop on Explainable Artificial Intelligence Envelope-Open ps1620@imperial.ac.uk (P. Sukpanichnant); a.rago@imperial.ac.uk (A. Rago); pl1515@imperial.ac.uk (P. Lertvittayakumjorn); ft@imperial.ac.uk (F. Toni) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) As QBAFs illustrate how arguments relate to one another, they can be applied to reflect the relationship between nodes of a neural network, which can be viewed as an explanation. However, to do this, one needs to match the neural network functioning and the QBAF semantics. In this paper, we focus on LRP as a semantics for suitable forms of the QBAFs that we introduce. QBAFs derived by an LRP-based semantics may be very large and too complicated for human cognition in the context of explanation. Hence a new variant of QBAF is needed. To address this issue, we introduce a new variant of QBAF, namely neural QBAFs (nQBAFs), under LRP-based semantics for generating argumentative explanations from neural networks and prove their dialectical properties. Finally, we conduct some preliminary experiments by applying our LRP-based semantics to the Deep Argumentative Explanation (DAX) method from [4] and the method from [5] in order to show practical issues with nQBAFs as explanations. This is work in progress, on exploring the use of LRP, in combination with other techniques, in visualisation for image classification: we leave a comparison with visualisations drawn from nQBAFs as future work. 2. Background We start by defining relevant concepts for our setting. These amount to multi-layer perceptrons (MLPs), Layer-wise Relevance Propagation (LRP) and Quantitative Bipolar Argumentation Frameworks (QBAFs). 2.1. MLP Basics A MLP is a form of feed-forward neural network where all neurons in one layer are connected to all neurons in the next layer. We follow [6] for background on MLPs, captured by Definitions 1 and 2 below. Definition 1. A Multi-layer Perceptron (MLP) is a tuple βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© where β€’ βŸ¨π‘‰ , 𝐸⟩ is an acyclic directed graph. β€’ 𝑉 = βŠŽπ‘‘+1 0 𝑉𝑖 is the disjoint union of sets of nodes 𝑉𝑖 ; β€’ We call 𝑉0 the input layer, 𝑉𝑑+1 the output layer and 𝑉𝑖 the i-th hidden layer for 1 ≀ 𝑖 ≀ 𝑑; 𝑑 β€’ 𝐸 βŠ† ⋃𝑖=0 (𝑉𝑖 Γ— 𝑉𝑖+1 ) is a set of edges between subsequent layers; β€’ 𝐡 ∢ (𝑉 ⧡ 𝑉0 ) β†’ ℝ assigns a bias to every non-input node; β€’ πœƒ ∢ 𝐸 β†’ ℝ assigns a weight to every edge. Figure 1 (left) visualises a fragment of an MLP with at least two hidden layers. Note that any MLP referred to afterwards only has one output node. This may be obtained from extracting a fragment of another MLP all nodes that have paths1 to the chosen output node, including the output node itself. MLPs typically result from training with sample data. Since this training is not a focus of this paper, we will simply assume that a trained MLP is available. For example, in Section 5, we will conduct experiments with a pre-trained MLP for image classification. 1 The definition of path is adopted from [4], where there exists a path via 𝐸 (set of edges) from π‘›π‘Ž to 𝑛𝑏 (from a node to another) iff βˆƒπ‘›1 , ..., 𝑛𝑑 with 𝑛1 = π‘›π‘Ž and 𝑛𝑑 = 𝑛𝑏 such that (𝑛1 , 𝑛2 ), ..., (π‘›π‘‘βˆ’1 , 𝑛𝑑 ) ∈ 𝐸. The next definition explains how we obtain an activation value for each node. Definition 2. For any 𝑗 ∈ 𝑉0 , the activation π‘₯𝑗 ∈ ℝ of node 𝑗 is an input value for 𝑗. For any π‘˜ such that 1 ≀ π‘˜ ≀ 𝑑 + 1, the activation of node 𝑖 ∈ π‘‰π‘˜ is π‘₯𝑖 = π‘Žπ‘π‘‘(𝐡(𝑖) + Ξ£π‘›βˆˆπ‘‰π‘˜βˆ’1 π‘₯𝑛 πœƒ(𝑛, 𝑖)) where act: ℝ β†’ ℝ is an activation function.2 Activations are a fundamental component of a neural network. They are involved in the calculation process of a neural network from a given input towards the output layer. An activation of each node can also be used to explain what the neural network is emphasising, as we discuss in the next section. 2.2. LRP Basics Layer-Wise Relevance Propagation (LRP) [1] is a method for obtaining explanations, for outputs of MLPs in particular. Intuitively, with LRP, each node of the MLP is given a relevance score, showing how this node contributes to the node of interest in the output layer. Starting from the output layer, the node we want to explain has its relevance score equal to its activation while other nodes of the output layer (if any) have zero relevance score. Then we can calculate the relevance score for each non-output node using Definition 3, adapted from the presentation of LRP in [7]. Definition 3. Let βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© be an MLP, 𝑖 ∈ π‘‰π‘˜ , and 𝑗 ∈ π‘‰π‘˜+1 where 0 ≀ π‘˜ ≀ 𝑑, and the layer π‘˜ has 𝑛 𝑧 nodes. Then the relevance score the node 𝑖 receives from the node 𝑗 is 𝑅𝑖←𝑗 such that 𝑅𝑖←𝑗 = Σ𝑛 𝑖𝑗 𝑧 𝑅𝑗 𝑙=1 𝑙𝑗 𝐡(𝑗) where 𝑧𝑖𝑗 is the contribution from 𝑖 to 𝑗 during the forward pass, i.e., 𝑧𝑖𝑗 = π‘₯𝑖 πœƒ(𝑖, 𝑗) + 𝑛 + π‘›πœ– where πœ– ∈ ℝ is a small positive stabiliser. Note that this definition assumes that πœ– is distributed equally to the 𝑛 nodes: we adopt this assumption from [7]. To calculate the relevance score node 𝑖 has towards the output node of interest, i.e. 𝑅𝑖 , we simply sum all the relevance scores it receives from all the nodes of the layer π‘˜ + 1. In other words, 𝑅𝑖 = Σ𝑗 𝑅𝑖←𝑗 . From Definition 3, we obtain also that LRP has conservative properties (for 𝑖 ∈ π‘‰π‘˜ , and 𝑗 ∈ π‘‰π‘˜+1 ), i.e., 𝑅𝑗 = Σ𝑖 𝑅𝑖←𝑗 and Σ𝑖 𝑅𝑖 = Σ𝑗 𝑅𝑗 . 2.3. QBAF Basics QBAFs [3] are abstractions of debates between arguments, where arguments may attack or support one another and are equipped with a base score, which reflects the arguments’ intrinsic, initial dialectical strength. We adopt the formal definition of QBAFs from [3]. Definition 4. A QBAF is a tuple ⟨𝐴, 𝐴𝑑𝑑, 𝑆𝑒𝑝𝑝, 𝛾 ⟩ where β€’ 𝐴 is a set (whose elements are referred to as arguments); β€’ 𝐴𝑑𝑑 βŠ† 𝐴 Γ— 𝐴 is the attack relation; 2 Note that, with an abuse of notation, πœƒ(𝑛, 𝑖) stands for πœƒ((𝑛, 𝑖)), for simplicity. Unless explicitly stated, this notation is used throughout the rest of the paper. β€’ 𝑆𝑒𝑝𝑝 βŠ† 𝐴 Γ— 𝐴 is the support relation; β€’ 𝛾 ∢ 𝐴 β†’ 𝐷 is a function that maps every argument to its base score (from some set 𝐷 of a given set of values).3 A QBAF may be equipped with a notion of dialectical strength, given by a strength function 𝜎 ∢ 𝐴 β†’ 𝐷, indicating a dialectical strength value (again from 𝐷) for each argument, taking into account the strength of the attacking and supporting arguments within the debate represented by the QBAF, as well as the argument’s intrinsic strength given by 𝛾. Several notions of 𝜎 (called semantics in the literature on computational argumentation) have been given in the literature (e.g. see [8]) but their formal definitions are outside the scope of this paper. Various dialectical properties for semantics 𝜎 have been studied in the literature (e.g. see [8]) as a way to validate their use in concrete settings and to compare across different semantics. We will follow this approach in this paper. Variants of QBAFs can be extracted from neural networks, e.g. as in [6, 4]. An example of the structure underpinning these QBAFs is given in Figure 1 (centre, for the MLP on the left): here, the nodes represent the arguments and the edges represent the union of the attack and support relations. In these works, the extracted QBAF can be seen as indicating how some nodes in the neural network relate to others, and hence can be viewed as an explanation of that neural network. We follow this approach in this paper, but using a variant of QBAFs, defined next. 3. nQBAFs and LRP-based Argumentation Semantics We study LRP as a semantics 𝜎 for novel forms of QBAFs extracted from MLPs. We aim to prove that this LRP-based semantics satisfies multiple dialectical properties, which we believe are intuitive when QBAFs are used as the basis for explanations of MLPs. The novel QBAFs take into account the structure of MLPs. As of Definition 3, a non-output node in an MLP may contribute to several nodes of the next layer, as in Figure 1 (centre). For any non-output node 𝑖, if we consider each edge from 𝑖 to a node of the next layer and represent the node 𝑖 with a unique argument for every edge (as in [6, 9]), there would be several arguments representing that node 𝑖. This method would also be non-scalable since the relation between arguments in the resulting QBAF would become too complex to analyse as more layers are considered. To avoid this, we define a new, leaner form of QBAFs, where arguments referring to the same node are grouped together. Definition 5. A neural quantitative bipolar argumentation framework (nQBAF) is a tuple ⟨𝐴, 𝐴𝑑𝑑, 𝑆𝑒𝑝𝑝, 𝛾 ⟩ where β€’ 𝐴 is a set (of arguments); β€’ 𝐴𝑑𝑑 βŠ† 𝐴 Γ— 𝒫 (𝐴)4 is the attack relation; β€’ 𝑆𝑒𝑝𝑝 βŠ† 𝐴 Γ— 𝒫 (𝐴) is the support relation; β€’ 𝛾 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ {0} is a function that maps every argument and set of arguments to a fixed base score of zero. 3 In this paper, we will choose 𝐷 = ℝ. 4 Note that 𝒫 (𝐴) is the power set of a set 𝐴. Figure 1: Example of an MLP (left), a standard QBAF (centre) and the associated nQBAF (right). Each box refers to a group of arguments. In the QBAF and the nQBAF, dashed lines represent attacks and solid lines represent supports. Thus, attack and support relations may exist not just between arguments, as in standard QBAFs, but also between arguments and sets thereof. Given that we choose 𝐷 = ℝ as the set of values that could be used as base score and strength of arguments, the choice of 𝛾 indicates that each argument and set of arguments starts with a β€œneutral” base score of zero. We first need to relate arguments of an nQBAF and nodes of a given MLP βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ©. Each argument represents only one node but a node can be represented by several arguments. Accordingly, we assume a function 𝜌 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ 𝑉 βˆͺ {βŠ₯} mapping each argument/set of arguments to a node of the MLP, if one exists (or mapping to βŠ₯ otherwise). We omit the formal definition of 𝜌 for lack of space. As an illustration, for the MLP in Figure 1 (left), in the derived nQBAF (right), 𝑛1 = 𝜌(𝛼12 ) = 𝜌(𝛼13 ) = 𝜌(𝛼14 ) = 𝜌({𝛼12 , 𝛼13 , 𝛼14 }), 𝑛2 = 𝜌(𝛼25 ) = 𝜌(𝛼26 ) = 𝜌({𝛼25 , 𝛼26 }), 𝑛3 = 𝜌(𝛼35 ) = 𝜌(𝛼36 ) = 𝜌({𝛼35 , 𝛼36 }), 𝑛4 = 𝜌(𝛼45 ) = 𝜌(𝛼46 ) = 𝜌({𝛼45 , 𝛼46 }), 𝑛5 = 𝜌(𝛼5 ) = 𝜌({𝛼5 }), 𝑛6 = 𝜌(𝛼6 ) = 𝜌({𝛼6 }), 𝑛0 = 𝜌(𝛼0 ) = 𝜌({𝛼0 }) and, for any other set 𝑆 of arguments, 𝜌(𝑆) = βŠ₯. We then have to determine which pairs (i.e. edges as shown in Figure 1 (right)) belong to the attack or support relations. This is done using two relation characterisations, inspired by those in [4]: 𝑐+ , π‘βˆ’ ∢ 𝐴 Γ— 𝒫 (𝐴) β†’ {π‘‘π‘Ÿπ‘’π‘’, 𝑓 π‘Žπ‘™π‘ π‘’} where, for any argument 𝑖 and group of arguments 𝑗 such that 𝜌(𝑖) β‰  βŠ₯ and 𝜌(𝑗) β‰  βŠ₯ are in adjacent layers (i.e. (𝜌(𝑖), 𝜌(𝑗)) ∈ 𝐸): β€’ 𝑐+ (𝑖, 𝑗) is true iff π‘…πœŒ(𝑖)β†πœŒ(𝑗) > 0, and β€’ π‘βˆ’ (𝑖, 𝑗) is true iff π‘…πœŒ(𝑖)β†πœŒ(𝑗) < 0. With 𝑐+ and π‘βˆ’ , we can formally define our 𝐴𝑑𝑑 and 𝑆𝑒𝑝𝑝 relations and the nQBAF derived from an MLP, as follows: Definition 6. The nQBAF derived from βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© is ⟨𝐴,𝐴𝑑𝑑,𝑆𝑒𝑝𝑝,𝛾 ⟩ where β€’ 𝐴 is defined according to Algorithm 1; β€’ 𝐴𝑑𝑑 = {(𝑖, 𝑗) ∈ 𝐴 Γ— 𝒫 (𝐴) ∣ π‘βˆ’ (𝑖, 𝑗) is true}; β€’ 𝑆𝑒𝑝𝑝 = {(𝑖, 𝑗) ∈ 𝐴 Γ— 𝒫 (𝐴) ∣ 𝑐+ (𝑖, 𝑗) is true}; Algorithm 1: Extracting A from a given MLP 𝐴 ← {}; π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ ← 𝑑; while π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ >= 0 do for 𝑛𝑖 in π‘‰π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ do for 𝑛𝑗 in π‘‰π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ+1 do if (𝑛𝑖 , 𝑛𝑗 ) in 𝐸 then 𝐴 ← 𝐴 βˆͺ {𝛼𝑖𝑗 } π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ ← π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘Žπ‘¦π‘’π‘Ÿ βˆ’ 1 for π›Όπ‘šπ‘› in 𝐴 do if 𝜌(π›Όπ‘šπ‘› ) in 𝑉0 then 𝐴 ← 𝐴 βˆͺ {𝛼(π‘šπ‘›)β€² π‘šπ‘› } β€’ 𝛾 ∢ 𝐴 βˆͺ 𝒫 (𝐴) β†’ {0}. Algorithm 1 extracts the set of arguments by iterating backwards from the last hidden layer to the input layer. It also add imaginary arguments to the set of arguments for input nodes, for the reason discussed in the next section. Before we define our strength function, let us introduce some notation: β€’ 𝐴𝑑𝑑(π‘₯) = {π‘Ž ∈ 𝐴 ∣ (π‘Ž, π‘₯) ∈ 𝐴𝑑𝑑 } for all π‘₯ ∈ 𝒫 (𝐴); β€’ 𝑆𝑒𝑝𝑝(π‘₯) = {𝑠 ∈ 𝐴 ∣ (𝑠, π‘₯) ∈ 𝑆𝑒𝑝𝑝 } for all π‘₯ ∈ 𝒫 (𝐴); β€’ πΊπ‘Ÿπ‘œπ‘’π‘π‘  = {𝑔 ∈ 𝒫 (𝐴) ∣ βˆƒπ‘Ž ∈ 𝐴[(π‘Ž, 𝑔) ∈ 𝐴𝑑𝑑 ∨ (π‘Ž, 𝑔) ∈ 𝑆𝑒𝑝𝑝]}. Now we define the LRP-based semantics for our nQBAF as follows: Definition 7. The LRP-based semantics of the nQBAF derived from an MLP βŸ¨π‘‰ , 𝐸, 𝐡, πœƒβŸ© is 𝜎 ∢ 𝐴 βˆͺ πΊπ‘Ÿπ‘œπ‘’π‘π‘  β†’ ℝ such that ⎧π‘₯𝑖 if 𝜌(π‘₯) ∈ 𝑉𝑑+1 with final activation π‘₯𝑖 βŽͺπ‘…π‘šβ†πœŒ(𝑦) if π‘₯ = 𝛼(π‘šπ‘›)β€² π‘šπ‘› , 𝑧 = π›Όπ‘šπ‘› π‘Žπ‘›π‘‘ βˆƒ!(𝑧, 𝑦) ∈ 𝐴𝑑𝑑 βˆͺ 𝑆𝑒𝑝𝑝 βŽͺ 𝜎(π‘₯) = π‘…πœŒ(π‘₯)β†πœŒ(𝑦) if βˆƒ!(π‘₯, 𝑦) ∈ 𝐴𝑑𝑑 βˆͺ 𝑆𝑒𝑝𝑝 ⎨ βŽͺΞ£π‘Žβˆˆπ‘₯ 𝜎 (π‘Ž) if π‘₯ ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘  βŽͺ ⎩0 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’ Now we are able to conceive the relations between arguments, and to what amount each argument supports or attacks a group of arguments, but how natural is it? Does it follow the way humans naturally debate? To answer these questions, we have to consider whether our nQBAFs satisfy dialectical properties. 4. Properties for nQBAFs under LRP semantics We now consider dialectical properties that determine how natural the argumentation is for any argument framework, i.e. how similar it is to human reasoning and debate. Our dialectical Table 1 Dialectical properties for nQBAFs adapted from [4] and [3]. # Property Name 1 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) = Ξ£π‘₯βˆˆπ΄π‘‘π‘‘(𝑔) 𝜌(π‘₯) + Ξ£π‘₯βˆˆπ‘†π‘’π‘π‘(𝑔) 𝜌(π‘₯) Additive Monotonicity 2 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) = βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) = βˆ… β†’ 𝜎(𝑔) = Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯). Balance 3 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) β‰  βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) = βˆ… β†’ 𝜎(𝑔) < Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯). Weakening 4 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔) = βˆ… ∧ 𝑆𝑒𝑝𝑝(𝑔) β‰  βˆ… β†’ 𝜎(𝑔) > Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯). Strengthening 5 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) < Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯) β†’ 𝐴𝑑𝑑(𝑔) β‰  βˆ…. Weakening Soundness 6 βˆ€π‘” ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝜎(𝑔) > Ξ£π‘₯βˆˆπ‘” 𝛾 (π‘₯) β†’ 𝑆𝑒𝑝𝑝(𝑔) β‰  βˆ…. Strengthening Soundness 7 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ Equivalence Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) = 𝜎(𝑔2 ). 8 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) βŠ‚ 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ Attack Counting Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔2 ) < 𝜎(𝑔1 ). 9 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝑆𝑒𝑝𝑝(𝑔1 ) βŠ‚ 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ Support Counting Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) < 𝜎(𝑔2 ). 10 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ Base Score Reinforcement Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) > Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ). 11 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝑔1 <π‘Ž 𝑔2 ∧ 𝑆𝑒𝑝𝑝(𝑔1 ) = 𝑆𝑒𝑝𝑝(𝑔2 ) ∧ Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Attack Reinforcement Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ). 12 βˆ€π‘”1 , 𝑔2 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ , 𝐴𝑑𝑑(𝑔1 ) = 𝐴𝑑𝑑(𝑔2 ) ∧ 𝑔1 >𝑠 𝑔2 ∧ Ξ£π‘₯βˆˆπ‘”1 𝛾 (π‘₯) = Support Reinforcement Ξ£π‘₯βˆˆπ‘”2 𝛾 (π‘₯) β†’ 𝜎(𝑔1 ) > 𝜎(𝑔2 ). properties, as shown in Table 1, are based on those in [4] and [3] but are adapted specifically for nQBAFs. In the table, we associate these properties with names, mostly borrowing from the literature, where, however, they have been used for other types of argumentation frameworks. Before defining the properties, we first make an addition regarding the input layer. Every dialectical property which follows considers the strength of a group of arguments based on its attackers and supporters. As of now, there are no attackers or supporters for groups of arguments representing nodes of the input layer, so it is likely most properties will not be satisfied here. To resolve this issue, we add imaginary arguments to target the input nodes. These added arguments are not considered as part of πΊπ‘Ÿπ‘œπ‘’π‘π‘ . Formally, for any 𝑔 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘  such that 𝜌(𝑔) ∈ 𝑉0 , 𝐴𝑑𝑑(𝑔) = {π‘₯ ∈ 𝐴 ∣ 𝜌(π‘₯) = βŠ₯ ∧ βˆƒπ‘Ž ∈ 𝑔[𝜎 (π‘₯) = 𝜎 (π‘Ž) ∧ 𝜎 (π‘₯) < 0]} and 𝑆𝑒𝑝𝑝(𝑔) = {π‘₯ ∈ 𝐴 ∣ 𝜌(π‘₯) = βŠ₯ ∧ βˆƒπ‘Ž ∈ 𝑔[𝜎 (π‘₯) = 𝜎 (π‘Ž) ∧ 𝜎 (π‘₯) > 0]} and |𝐴𝑑𝑑(𝑔) βˆͺ 𝑆𝑒𝑝𝑝(𝑔)| = |𝑔|. For example, a given input node may be represented by a group of arguments {𝛼𝑖 , … , 𝛼𝑛 } and a set of supporting/attacking arguments {𝛼𝑐𝑖 , … , 𝛼𝑐𝑛 } corresponding to each argument of the group. According to Table 1, to explain, Additive Monotonicity requires that the strength of a group of arguments is the sum of that of its supporters and attackers. Balance requires that the strength of a group of arguments differs from the sum of base scores of that group only if such a group is a target of other arguments. Weakening requires that when there are no supporters but at least one attacker, the strength of a group of arguments is lower than the total sum of base scores of that group. Conversely, Strengthening considers the situation when there are no attackers but at least one supporter instead. Weakening Soundness is loosely the opposite direction of Weakening, requiring that if the strength of a group of arguments is lower than the sum of base scores of that group, then the group must have at least one attacker. Similarly, Strengthening Soundness is loosely the opposite direction of Strengthening. Equivalence states that groups of arguments with equal conditions in terms of attackers, supporters and the sum of base scores within a group have the same strength. Attack Counting (Support Counting) requires that a strictly larger set of attackers (supporters, respectively) determines a lower (higher, respectively) strength. Base Score Reinforcement requires that a higher sum of base scores gives a higher strength. For the last two properties, we have to define the notion of weaker and stronger attack/support relations between sets. Definition 8. For any set 𝐴, 𝐡 ∈ πΊπ‘Ÿπ‘œπ‘’π‘π‘ : 𝐴 <π‘Ž 𝐡 iff Ξ£π‘₯βˆˆπ΄π‘‘π‘‘(𝐴) 𝜎 (π‘₯) > 𝜎π‘₯βˆˆπ΄π‘‘π‘‘(𝐡) 𝜎 (π‘₯); 𝐴 <𝑠 𝐡 iff Ξ£π‘₯βˆˆπ‘†π‘’π‘π‘(𝐴) 𝜎 (π‘₯) < 𝜎π‘₯βˆˆπ‘†π‘’π‘π‘(𝐡) 𝜎 (π‘₯); 𝐴 >π‘Ž 𝐡 iff 𝐡 <π‘Ž 𝐴; 𝐴 >𝑠 𝐡 iff 𝐡 <𝑠 𝐴. Then, Attack Reinforcement states that a weaker set of attackers determines a higher strength whereas Support Reinforcement states that a stronger set of supporters determines a higher strength. Any nQBAF satisfies all given properties (proofs omitted for lack of space). This indicates that our LRP-based nQBAFs may align with human reasoning. Proposition 1. nQBAFs under LRP-based semantics satisfy Properties 1-12. 5. Empirical Study We apply the LRP-based semantics empirically on two different approaches, namely deep argumentative explanation (DAX) [4] and the approach in [5] by Google. We then analyse the obtained explanations qualitatively. 5.1. DAX Basics DAX [4] is a general methodology for building local explanations (i.e. input-based explanations) for a neural network outputs. Unlike other explanation methods which are only based on inputs (and thus can be deemed to be flat), DAX takes account of the hidden layers as well. DAX is based on extracting an argumentation framework from a neural network; explanations are then drawn from the framework, represented in a comprehensible format to humans. The extraction of the argumentation framework requires the choice of a semantics (for determining the strength of arguments) directly matching the behaviour of the neural network. Here we apply DAX using our LRP semantics at its core. Also, we choose nQBAFs as the argumentation framework underpinning DAXs. We may theoretically achieve a full (local) explanation by viewing the entire nQBAF extracted from a neural network. However, the explanation would be too large for complex networks, therefore too complicated for humans to comprehend. To make things human-scale, we only consider a fragment of the nQBAF, in the spirit of [4], as well as grouping groups of arguments representing a single node (i.e. grouping nodes) together, and visualise the grouping as an explanation. 5.2. The Basics of Google’s Method Google’s method [5] combines feature visualisation (i.e. what is a neuron looking for?, see [10]) with attribution (i.e. how does a specific node contributes to the output?) to generate a local explanation for a neural network output. We use the implementation of this method available at [11], changing the attribution method from a linear correlation to LRP. We leverage on the existing implementation’s choices for visualisation. 5.3. Settings For both methods, we aim to explain a Keras VGG16 model [12] (with linear activation function for the output layer) pretrained on the ImageNet dataset [13]. Since the whole model is too large, we only consider the last convolutional layer, explaining what the layer prioritises in a given image. We test our method in combination with DAX, comparing it to Google’s method, on three images: a police van from [14], a barbell from [15], and a diaper from [16]. In all cases, we use the output node with maximum activation as the output class, with such an activation referred to as the output prediction. To generate explanations using DAX, we modify the code from the ArgFlow library [9] and apply to each of the three images. For each explanation, the size of each image illustrates the attribution thereof towards the output class, with red and green arrows depicting attacking and supporting the output class prediction respectively. For Google’s approach, we modify the code from [11] which is one of the Colaboratory notebooks in [5]. We then apply the code to the three images, each results in the set of images indicating parts of the original image. Each number below each factor refers to how much attribution each component has towards the output prediction. The arrow sizes also reflect these attributions. 5.4. DAX vs Google Comparisons Image 1: Police Van. Explanations from both methods (as shown in Figure 2) indicate that the model focuses mostly on the background and the red stripe of the van. There are some subtle differences between them mainly with the strength for each factor, but their factors are quite similar. However, an interesting point is that DAX considers the siren light of the van as one of the top six factors contributing to the output class prediction (according to the rightmost image of Figure 2a) while Google’s approach does not present this (arguably important) factor. Image 2: Barbell. According to Figure 3, both methods explain that the model focuses on the plates and the background. However, DAX considers the plates to contribute to the prediction more than the background, while it is the opposite for the Google’s explanation. Somewhat counter-intuitively though, DAX considers the plates to both attack (the fourth image from the right of Figure 3a) and support (the rightmost image from the right of Figure 3a) the class prediction, even though the attacking argument (the fourth image from the right) is much less strong. If the DAX is faithful to the model, then this incongruence may result from an incongruence in the model. Image 3: Diaper. From Figure 4, both methods indicate that the model focuses on other things instead of the diaper. The DAX in Figure 4a shows that the model focuses on the baby (a) The DAX approach (b) Google’s approach Figure 2: Explanations given using (a) the DAX approach (with attacks in red and supports in green, either indicated in the filters or as arrows, and the size of arguments for the filters indicating their dialectical strength, see [4] for details) and (b) Google’s approach for the police van image with the predicted class police_van (with arrows indicating support, and the size of arrows representing the LRP values). The police van image source is (https://bit.ly/3D8FZya). instead of the diaper. It even indicates that the diaper attacks the prediction of the class itself. In contrast, Google’s explanation (Figure 4b) indicates that the model focuses on the background and the diaper, giving the baby lower attributions. (a) The DAX approach (b) Google’s approach Figure 3: Explanations given using (a) the DAX approach and (b) Google’s approach for the barbell image with the predicted class barbell. The barbell image source is (https://amzn.to/3Db2xOQ). 5.5. Discussion The comparisons above clearly indicate that even with similar semantics (LRP), for the same model, explanations vary depending on how the grouping (of argument groups) is done. Google’s approach seems to take account of the fact that concepts are usually recognised around particular positions of an image, whereas DAX only focuses on the concepts. DAX seems to unearth conflicts, with the same feature both attacking and supporting a prediction. Overall, more experimentation is needed to understand which explanation method is more β€œfaithful” to the (a) The DAX approach (b) Google’s approach Figure 4: Explanations given using (a) the DAX approach and (b) Google’s approach for the baby image with the predicted class diaper. The diaper image source is (https://bit.ly/3D8FZya). underlying model. 6. Conclusions We presented a variant of Quantitative Bipolar Argumentation Frameworks (QBAFs) called neural QBAFs (nQBAFs) and considered how the LRP-based semantics satisfies the modified dialectical properties for nQBAFs. We also conducted preliminary experiments explaining an image classifier, by applying the LRP-based semantics to two approaches: Deep Argumentative Explanation (DAX) and Google’s approach, and comparing both explanations. DAX groups argument groups (i.e. nodes) in the same filter together, while Google’s approach groups them by means of matrix factorisation optimising for activations. The comparison shows that how argument groups (each representing a node) are grouped can affect the resulting explanations. As future work, we plan to conduct experiments with using nQBAFs for visualisation for text classification, in comparison with DAX and Google’s approaches with LRP as well as other methods, such as smoothgrad [17], deeplift [18], gradcam [19] and TCAV [20]. Finally, it would be interesting to conduct experiments to assess demands on the cognitive load for end-users using different (instantiations of) visualisations. Acknowledgments The first author was funded in part by Imperial College London under UROP (Undergraduate Research Opportunities Programme). The last author was partially funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 101020934). Finally, Rago and Toni were partially funded by J.P. Morgan and by the Royal Academy of Engineering under the Research Chairs and Senior Research Fellowships scheme. Any views or opinions expressed herein are solely those of the authors listed, and may differ, in particular, from the views and opinions expressed by Imperial College London or its affiliates and by J.P. Morgan or its affiliates. This material is not a product of the Research Department of J.P. Morgan Securities LLC. This material should not be construed as an individual recommendation for any particular client and is not intended as a recommendation of particular securities, financial instruments or strategies for a particular client. This material does not constitute a solicitation or offer in any jurisdiction. References [1] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. MΓΌller, Layer-Wise Relevance Propagation: An Overview, Springer International Publishing, Cham, 2019, pp. 193–209. URL: https://doi.org/10.1007/978-3-030-28954-6_10. doi:10.1007/978- 3- 030- 28954- 6_ 10 . [2] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games, Artificial Intelligence 77 (1995) 321–357. URL: https://www.sciencedirect.com/science/article/pii/000437029400041X. doi:https:// doi.org/10.1016/0004- 3702(94)00041- X . [3] P. Baroni, A. Rago, F. Toni, How many properties do we need for gradual argumentation?, in: AAAI, 2018. [4] E. Albini, P. Lertvittayakumjorn, A. Rago, F. Toni, Deep argumentative explanations, 2021. arXiv:2012.05766 . [5] C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, A. Mordvintsev, The building blocks of interpretability, Distill 3 (2018). doi:10.23915/distill.00010 . [6] N. Potyka, Interpreting neural networks as quantitative argumentation frameworks, Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 6463–6470. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16801. [7] P. Lertvittayakumjorn, L. Specia, F. Toni, FIND: Human-in-the-Loop Debugging Deep Text Classifiers, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 332–348. URL: https://aclanthology.org/2020.emnlp-main.24. doi:10.18653/v1/2020. emnlp- main.24 . [8] P. Baroni, A. Rago, F. Toni, From fine-grained properties to broad principles for gradual argumentation: A principled spectrum, Int. J. Approx. Reason. 105 (2019) 252–286. URL: https://doi.org/10.1016/j.ijar.2018.11.019. doi:10.1016/j.ijar.2018.11.019 . [9] A. Dejl, P. He, P. Mangal, H. Mohsin, B. Surdu, E. Voinea, E. Albini, P. Lertvittayakumjorn, A. Rago, F. Toni, Argflow: A Toolkit for Deep Argumentative Explanations for Neural Networks, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 2021, p. 1761–1763. [10] C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, Distill 2 (2017). doi:10.23915/ distill.00007 . [11] L. Google, Neuron groups – building blocks of interpretability, 2018. URL: https://bit.ly/ 3a483Xc. [12] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations, 2015. [13] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848 . [14] Wataree, Police van thailand, 2019. URL: https://bit.ly/3Fi1oqx. [15] I. Synergee Fitness Worldwide, 2019. URL: https://amzn.to/3Db2xOQ. [16] websubstance, Baby tummy time, n.d. URL: https://bit.ly/3D8FZya. [17] D. Smilkov, N. Thorat, B. Kim, F. ViΓ©gas, M. Wattenberg, Smoothgrad: removing noise by adding noise, 2017. arXiv:1706.03825 . [18] J. Li, C. Zhang, J. T. Zhou, H. Fu, S. Xia, Q. Hu, Deep-lift: Deep label-specific feature learning for image annotation, IEEE Transactions on Cybernetics (2021) 1–10. doi:10. 1109/TCYB.2021.3049630 . [19] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. [20] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, R. sayres, Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV), in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learn- ing, volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 2668–2677. URL: https://proceedings.mlr.press/v80/kim18d.html.