<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Foundations for Solving Classi cation Problems with Quantitative Abstract Argumentation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Background on Abstract Argumentation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Stuttgart</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Our understanding of an argument in this work follows Dung's notion of
abstract argumentation: "an argument is an abstract entity whose role is solely
determined by its relations to other arguments" [7]. In particular, we abstract
from the content of an argument and are only interested in its acceptability
dependent on the acceptability of its attackers and supporters. We consider bipolar
argumentation graphs (BAGs) of the form (A; Att; Sup), where A is a nite set of
arguments, Att A A is the attack relation and Sup A A is the support
relation. With a slight abuse of notation, we let Att(A) = fB 2 A j (B; A) 2 Attg
denote the attackers of A and let Sup(A) denote its supporters. Graphically, we
denote attack relations by solid and support relations by dashed edges. Figure
1 illustrates the de nition. The BAG models part of a decision problem from
[17], where we want to decide whether to buy new or to sell existing stocks of a
company. A1 corresponds to the statement of an expert that recommends selling.
A2 and A3 correspond to statements by experts who contradict A1's premises
and recommend buying. Since we do not want to accept both the selling and the
buying decision, the corresponding decision arguments attack each other.</p>
      <p>We will not talk about the classical interpretation of bipolar argumentation
frameworks and move directly to quantitative approaches. Our focus is on two
bipolar approaches that seem interesting for classi cation tasks. Both share a
BAG as a common data structure. They enhance the BAG in di erent ways in
order to derive numerical degrees of acceptance.
Probabilistic epistemic argumentation [11] builds up on basic probability theory.
Probabilities are assigned to arguments by means of probability functions P :
2A ! [0; 1] such that Pw22A P (w) = 1. Each w 2 2A can be seen as one possible
world, in which the arguments in w are accepted and the remaining ones are
rejected. The probability of an argument A 2 A under P is then de ned by
adding the probabilities of all worlds in which A is accepted, that is, P (A) =
Pw22A;A2w P (w). In order to restrict the set of all probability functions to those
that respect the BAG, di erent constraints have been introduced [11]. We give
a few examples here:
COH: P is called coherent if for all A; B 2 A with (A; B) 2 Att, we have</p>
      <p>P (B) 1 P (A).</p>
      <p>SFOU: P is called semi-founded if P (A)
0:5 for all A 2 A with Att(A) = ;.</p>
      <p>SOPT: P is called semi-optimistic if P (A)
with Att(A) 6= ;.
1 PB2Att(A) P (B) for all A 2 A
Coherence encodes one possible meaning of attack relations: if A attacks B,
then the probability of A bounds the probability of B from above (and vice
versa). The intuition behind the de nition is that we do not want to accept both
an argument and its attacker. Semi-foundedness encodes the intuition that an
argument should not be rejected if there is no reason for doing so. Formally,
this means that the degree of belief in this argument should not be lower than
0:5. Semi-optimism gives another lower bound for the degree of belief. If there
are no attackers, the argument must be accepted (probability 1). Attackers will
decrease the lower bound based on their own degree of belief. These constraints
have been de ned for attack-only graphs, but they can be easily extended to
bipolar graphs. For example, dual to coherence, we could de ne a lower bound
for support edges as follows.</p>
      <p>S-COH: P is called s-coherent if for all A; B 2 A with (A; B) 2 Sup, we have</p>
      <p>P (B) P (A).</p>
      <p>The previous examples are all special cases of linear atomic constraints [19] that
are generally written in the normalized form
n
X ci
i=1
(Ai)
c0;
where Ai 2 A , ci 2 R and is a syntactic symbol for the probability of an
argument. A probability function P satis es such a linear atomic constraint i
Pin=1 ci P (Ai) c0. Note that all constraints above can be brought into this
form. For example, the Coherence constraint can be rewritten as 1 (A) + 1
(B) 1 and the S-Coherence constraint as 1 (A) + ( 1) (B) 0. In
general, we can also consider more general epistemic constraints that allow
nonlinear combinations of probabilities and probabilities of complex formulas over
arguments [10]. However, in order to keep things simple, we do not discuss the
most general form. For future reference, we de ne a P-BAG as follows.
De nition 1. A P-BAG is a tuple (A; Att; Sup; C), where (A; Att; Sup) is a
BAG and C is a set of epistemic constraints over the BAG.</p>
      <p>Given a P-BAG (A; Att; Sup; C), we are interested in solving the following
entailment problem: compute tight upper and lower bounds on the probability of an
argument A 2 A among all probability distributions that satisfy the constraints
in C. If we restrict to linear atomic constraints, this problem can be solved in
polynomial time [19]. To illustrate the entailment problem, consider again the
BAG in Figure 1. We consider a new constraint that we call Balance.
BAL: P is called balanced if P (A) = 12 +</p>
      <p>A 2 A.</p>
      <p>PB21S+upm(Aax)fPjS(Bup)(AP)j;BjA2Attt(tA(A))jgP (B) for all</p>
      <p>To improve readability, we do not present Balance in the normal form of
linear atomic constraints. Based on our previous discussion, the reader hopefully
recognizes that Balance can indeed be written in this form. Note, in particular,
that an equality can be represented by two inequalities (x = y i x y and
y x). Intuitively, Balance enforces probability 0:5 if attackers and supporters
are equally strong and moves the probability towards 0 (1) if the attackers are
stronger (weaker) than the supporters. The entailment results are shown in
Figure 2 on the left. In general, we may get interval probabilities rather than point
probabilities. To illustrate this, we can replace BAL with the constraints SFOU,
COH and S-COH that we discussed before. The entailed probabilities are shown
in Figure 2 on the right.</p>
      <p>
        A polynomial-time implementation for solving the entailment problem for
linear atomic constraints can be found in the Java library Probabble1.
Gradual argumentation frameworks evaluate arguments by numerical strength
values. We will focus on bipolar frameworks that use the interval [0; 1] to give a
uniform presentation. A discussion of more general frameworks can be found in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The basic building block is again a BAG. Now, additionally, every argument
has an initial weight that can be seen as an apriori belief in the argument when
ignoring all the others. For future reference, we de ne a G-BAG as follows.
De nition 2. A G-BAG is a tuple (A; Att; Sup; w), where (A; Att; Sup) is a
BAG and w : A ! [0; 1] is a weight function.
      </p>
      <p>The main computational problem is again to assign acceptance values to
arguments. In this context, they are often called strength values. Strength values are
often computed by applying an update function to the G-BAG repeatedly until
the values converge. The update function often has a simple modular structure
consisting of an aggregation and an in uence function [14]. The aggregation
function takes the strength values of all parents and combines them to a
numerical value. Intuitively, an attacker should decrease the value based on its
strength, whereas a supporter should increase the value based on its strength.</p>
      <sec id="sec-1-1">
        <title>1 https://sourceforge.net/projects/probabble/</title>
        <p>
          The in uence function then adapts the initial weight based on the aggregate.
For example, the aggregation function of the DF-QuAD algorithm uses
multiplication [23]. While it has some nice properties, its particular de nition has the
disadvantage that the aggregate is necessarily (close to) 0 if both an attacker and
a supporter have strength (close to) 1. The Euler-based semantics introduced in
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], uses addition instead of multiplication to overcome this problem. However,
its in uence function is not symmetric about 0, which causes a counterintuitive
imbalance between attacks and supports. The quadratic-energy model from [16]
uses a symmetric in uence function in order to overcome this problem. Its
aggregation function is de ned by
(A) =
        </p>
        <p>X
B2Sup(A)
s(B)</p>
        <p>X
where s : A ! [0; 1] assigns the current strength value to every argument. Its
in uence function is de ned by
(A) = w(A)
w(A) h(
where h(x) = 1+mmaxafx0f;0x;gx2g2 squashes its input to the interval [0; 1]. Intuitively, a
positive aggregate will move the weight towards 1, while a negative aggregate will
move the weight towards 0. The strength values are initialized with the initial
weights. Then the aggregation and in uence function are applied repeatedly until
the values converge.</p>
        <p>Figure 3 shows, at the top, possible initial weights for the example BAG on
the left and the resulting nal strength values on the right. At the bottom of
Figure 3, the left graph shows how the nal strength values evolve over time.
For acyclic graphs, they can be computed in linear time by a single forward
pass [16]. In acyclic BAGs, the convergence theory is actually not complete.
[14] gave some examples for cyclic G-BAGs where the strength values under
di erent approaches start cycling and do not converge. In the known cases,
these convergence problems can be overcome by continuizing the discrete update
approach [16] as illustrated in the right graph at the bottom in Figure 3. In all
cases, where convergence guarantees are known, it is actually guaranteed that
the discrete and continuized algorithm will converge to the same strength values
[18]. Empirically, the continuous algorithm still converges in subquadratic time
[16]. The plots of the evolution of strength values may also add transparency
and explainability to gradual argumentation frameworks.</p>
        <p>Implementations for computing strength values with di erent gradual
argumentation approaches and plotting the evolution of their strength values can be
found in the Java library Attractor2.
3
We will now look at how classi cation problems can be solved by means of
argumentation frameworks. The goal of classi cation is to map inputs x to outputs y.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2 https://sourceforge.net/projects/attractorproject/</title>
        <p>We think of the inputs as feature tuples x = (x1; : : : ; xk), where the i-th value is
taken from some domain Di. The output y is taken from a nite set of class labels
L. A classi cation problem P = ((D1; : : : ; Dk); L; E) consists of the domains, the
class labels and a set of training examples E = f(xi; yi) j 1 i N g.</p>
        <p>By a numerical classi er, we mean a function c : ik=1 Di L ! R that
assigns to every pair (x; y) a numerical value. An important special case is a
probabilistic classi er p : ik=1 Di L ! [0; 1] where PjjL=j1 p(x; yi) = 1. Then
p(x; y) 2 [0; 1] can be understood as the con dence of the classi er that an
example with features x belongs to the class y. Let us note that every numerical
classi er c can be turned into a probabilistic classi er pc by normalizing the label
outputs by a softmax function. That is, pc(x; y) = PjjLe=xj1pe(xcp(x(c;y(xi);)yj)) .</p>
        <p>Figure 4 shows the high-level architecture of our classi ers. The input is rst
encoded in a BAG that models the classi cation problem. If the resulting model
is not a probabilistic classi er, the acceptance degrees of arguments
corresponding to the labels are then normalized by a softmax function. We will discuss
these steps in more detail in the following sections.
In order to solve a classi cation problem P = ((D1; : : : ; Dk); L; E) with
argumentation technology, we rst transform the domains and class labels into
arguments. A categorical feature with domain D = fd1; : : : ; dlg can be
transformed into l arguments AD;1; : : : ; AD;l. Intuitively, we can identify the value
di with accepting AD;i and rejecting the remaining arguments AD;j , j 6= i, of
the feature. A continuous feature with domain D R can be discretized by
partitioning D into l intervals that can then be treated like discrete features. We
denote the resulting input arguments by Ain.</p>
        <p>We proceed analogously for the class labels. For multiclass classi cation
problems, we create one argument for every label analogous to discrete features.
However, if the classi cation problem is binary, L = fc0; c1g, we create a single
output argument. We denote the resulting output argument(s) by Aout.</p>
        <p>To illustrate the idea, we consider a small toy classi cation problem inspired
by the Census dataset from the UCL machine learning repository3. We consider
three features that correspond to age (continuous), education (3 categories) and
work class (3 categories) and two class labels that correspond to an 'average or
below' or 'above average' salary. Figure 5 shows one potential BAG built up from
the corresponding input (bottom) and output (top) arguments. For simplicity, we
chose a coarse discretization into three bins. Of course, determining the number
and boundaries of the bins can also be made part of the learning process with
the usual advantages ( exibility) and disadvantages (learning complexity).
The most straightforward way to build a classi er is to take the input arguments
and output arguments and to connect them via edges in such a way that a good
classi cation performance on the examples is obtained.</p>
        <p>De nition 3 (Naive Classi cation BAG). A Naive Classi cation BAG for
a classi cation problem P = ((D1; : : : ; Dk); L; E) is a BAG (A; Att; Sup) such
that A = Ain [ Aout is the set of input and output arguments for P , Att; Sup
Ain Aout and Att \ Sup = ;.</p>
        <p>Figure 5 shows one possible naive classi cation BAG for our census example.
While a naive classi cation BAG is easy to interpret, it may not have su
cient degrees of freedom to capture more complicated relationships. In order</p>
      </sec>
      <sec id="sec-1-3">
        <title>3 https://archive.ics.uci.edu/ml/datasets/Census+Income</title>
        <p>to overcome the problem, we propose adding hidden layers of arguments
between the input and the output layer inspired by the architecture of feedforward
neural networks [8]. Similar to the intuition of deep feedforward neural
networks, the hope is that deeper layers will form more sophisticated patterns from
the patterns detected in earlier layers. Interpretability will probably su er with
increasing depth. However, due to the simple mechanics of the introduced
argumentation approaches, a deep abstract argumentation classi er may still be
easier to interpret than 'black box approaches' like neural networks or support
vector machines. Figure 6 shows a possible deep classi cation BAG for our census
example. The meaning of hidden arguments in early layers can still be intuitively
explained. For example, H1;1 can be roughly interpreted as saying that an above
average salary is unlikely in the age group from 20 to 60 and if the education
category is 1 unless the person is in working class 3.</p>
        <p>De nition 4 (Deep Classi cation BAG). A Deep Classi cation BAG with
k layers for a classi cation problem P = ((D1; : : : ; Dk); L; E) is a BAG (A; Att; Sup)
such that A = Sik=+01 Ahii consists of the input arguments Ah0i = Ain and
output arguments Ahk+1i = Aout for P and additional layers of arguments
Ahii such that Ahii \ Ahji = ; for i 6= j. Furthermore, Att \ Sup = ; and
Att; Sup Sik=0 Sjk=+i1+1 Ahii Ahji , that is, edges can only be directed towards
deeper layers.</p>
        <p>In order to classify an example with a classi cation BAG, we have to specify
how to generate an output label from the input features. We will discuss some
ideas for P-BAGs and G-Bags in the next sections.
3.3</p>
        <sec id="sec-1-3-1">
          <title>Classi cation P-BAGs</title>
          <p>In order to solve the classi cation problem with a P-BAG, we have to add a set
of constraints to our classi cation BAG. We divide the constraints into
Classi cation Constraints: encode the meaning of support and attack edges.
Instance Constraints: encode the input features of an example.
The classi cation constraints are example independent, whereas the instance
constraints change for every example (they correspond to the input
transformation in Figure 4). We start discussing the instance constraints. The idea is
simple: for every feature, we constrain the probability of the input argument
that corresponds to the input value to 1 and the probability of the remaining
input arguments for this feature to 0. For example, if Di = fdi;1; : : : ; di;lg and
xi = di;2, we add the constraints (Ai;2) = 1 and (Ai;j ) = 0 for j 6= 2.</p>
          <p>There are various ways to de ne the classi cation constraints. De ning them
could even be made part of the learning process. However, this would complicate
the learning problem further. We therefore discuss some concrete options. Our
initial proposal is a variation of the Balance constraint that we discussed before.
We slightly generalize it by adding weights to edges. We will consider these
weights as parameters that have to be learnt. For every argument A 2 A n Ain
that is not an input argument, we introduce one constraint of the form
(A) =</p>
          <p>+
where we demand 0 &lt; B;A, PB2Sup(A) B;A = 1 and PB2Att(A) B;A = 1.
Let us note that, due to the simple structure of the BAG, the probabilities of
the arguments can be computed in a single forward pass in linear time. The
probabilities of the input arguments can be set immediately according to the
instance constraints. We then go to the next layer. Since edges can only be
directed towards deeper layers, all probabilities in this layer are determined by
the probabilities of the previous layer. We can continue in this way until we set
the probabilities in the output layer.</p>
          <p>However, we note that the resulting classi er is necessarily a linear classi er.
By induction, we can see that the probability at every argument is just a linear
combination of the inputs: For the rst layer, this is obvious. For the subsequent
layers, it follows by induction, since a linear combination of linear combinations
is a linear combination again. Therefore, the output probability is a linear
function of the inputs. As a consequence, the classi er is only able to learn linearly
separable functions. For example, it is not possible to learn to classify the XOR
(exclusive or) function correctly when using the linear balance constraints in (1).</p>
          <p>Fortunately, if we keep the structure su ciently simple, we can still deal with
non-linear constraints e ciently by performing a single forward pass through the
BAG as before. We therefore propose the following non-linear variant of Balance.
(A) = A
+ (1</p>
          <p>A) maxf
A
maxf</p>
          <p>X
B2Att(A)</p>
          <p>X
B2Sup(A)</p>
          <p>B;A</p>
          <p>B;A
(B)
(B)</p>
          <p>X
B2Sup(A)</p>
          <p>X
where again 0 &lt; B;A , PB2Sup(A) B;A = 1, PB2Att(A) B;A = 1 and
furthermore 0 A 1. The new parameter A replaces 0:5 and allows learning a
bias for the probability of this argument. The rst term moves the probability
towards 1 if the supporters are stronger, the second term moves the probability
towards 0 if the attackers are stronger. To illustrate that this is su cient to
capture non-linear relationships, Figure 7 shows, on the left, a P-BAG for the
XOR function. The graph on the right illustrates the classi cation process for a
positive example. The output is 0:75 (the positive label is accepted). By going
backward through the graph, the result can be explained. Y is accepted because
its supporter H1;1 is accepted. H1;1, in turn, is accepted because X1 = 0 and
X2 6= 0 (the inputs di er). When using the BAG in Figure 8 with edge weights
N = 1, the XOR function will actually be perfectly reproduced. That is, in the
table in Figure 7, we will have 1 instead of 0:75 and 0 instead of 0:25. However,
the P-BAG in Figure 7 also classi es every example correctly (when the decision
threshold for a positive example is 0:5) and is easier to interpret.
For G-BAGs, we have to decide how to de ne the initial weights and which
update function we use. For the update function, we propose a variant of the
quadratic energy model that we explained before. Our variant again adds edge
weights that are supposed to be learnt during the training process. The
aggregation function is de ned by
where s : A ! [0; 1] assigns the current strength value to every argument as
before. The in uence function is de ned by
with h(x) = 1+mmaxafx0f;0x;gx2g2 . We demand 0 &lt;</p>
          <p>The weights of the input arguments can again be set based on the input. Since
they do not have any ingoing edges, this weight will also be their nal strength.
For setting the weights, we proceed similar as before. For every feature, we set
the weight of the argument corresponding to the input value to 1 and the weight
of the remaining input arguments for this feature to 0.</p>
          <p>Due to the acyclicity of the classi cation BAG, the nal strength values can
again be computed by a single forward pass through the graph in linear time.
The result is guaranteed to be equal to the result of the iterative and continuous
update approach [16]. Figure 8 shows a G-BAG for the XOR function. The more
compact BAG in Figure 7 could also used for a G-BAG for XOR. In particular,
as we increase the parameter values at the edges from 1 to 1, the outputs for
positive (negative) examples will move to 1 (0). We will come back to the BAG in
Figure 8 later because it illustrates an idea to classify arbitrary discrete functions
with classi cation BAGs.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Learning</title>
      <p>We nally discuss some ideas for learning classi cation BAGs from data. We
divide the learning problem into parameter and structure learning. For parameter
learning, we suppose that the classi cation BAG is already given and we only
have to learn the weights. For structure learning, both the classi cation BAG
(hidden arguments, edges) and the parameters have to be learnt.
4.1</p>
      <sec id="sec-2-1">
        <title>Parameter Learning</title>
        <p>One common way to learn the parameters of a model is to minimize a loss
function that measures the discrepancy between the desired label and the output of
the classi er. Recall from Figure 4 that our classi ers are probabilistic
classiers (if not by default, then due to the softmax normalization). A standard loss
function for such classi ers is the logistic loss (a.k.a. cross-entropy loss). Given a
classi er c with parameter vector and examples E = f(xi; yi) j 1 i N g,
the logistic loss of the current parameters is L( ) = N1 PiN=1 log c (xi; yi).
Intuitively, it simply takes the negative logarithm of the classi er's con dence for
the right label for every example. If the con dence for the right label is 1 (and,
thus, the con dence for the wrong labels is 0), the loss is 0. As the con dence
goes to 0 the loss goes to in nity. If we do not use the softmax normalization,
we have to be careful that the output for the desired label is non-zero, but this
can be guaranteed by some modi cations of the actual implementation.</p>
        <p>The loss is often minimized by using optimized variants of gradient descent.
In general, the loss function for classi cation BAGs may be non-di erentiable.
However, due to the simple structure of our proposed models, we can guarantee
di erentiability in several cases. The key observation is that the degrees of
acceptance can be computed by a simple forward pass for both the P-BAGs and
G-BAGs that we discussed. Therefore, the acceptance degree at the label
arguments is just a composed function of the inputs. If the involved functions are
di erentiable, the loss functions are di erentiable and the gradient can be
computed by automatic di erentiation as implemented in libraries like Tensor ow.
For our proposed P-BAGs, the loss function is non-di erentiable at some points
due to the use of the maximum. This may actually not cause any problems, but
we could also make the loss function di erentiable by replacing the maximum in
our constraints with a squared maximum. For our G-BAGs, the loss function is
already di erentiable.</p>
        <p>In general, we cannot apply gradient methods naively because our parameter
ranges are constrained. However, the constraints are rather simple and can be
dealt with by doing the following after every update step:
1. If a parameter exceeds a threshold b, set the parameter to 02+b , where 0
is the previous value of the parameter.
2. Normalize parameters that have to be normalized (attack and support
parameters for P-BAGs).</p>
        <p>For the classi cation BAGs that we introduced here, we can actually get rid of the
non-negativity constraint by considering edges with negative weights as attacks
and edges with positive weights as supports. This is also bene cial because it
allows to learn the nature of an edge (attack or support) from data. However,
for other instantiations, like a G-BAG with product for aggregation, this may
not be possible as easily. Furthermore, for more general classi cation BAGs, it
may not be possible to obtain a di erentiable loss functions. We are planning to
look at two strategies for these cases:
Heuristic Gradients: compute a heuristic gradient of the loss function by
replacing the partial derivative for the i-th parameter with the approximation
L( 0) L( ) , where 0 is obtained from by increasing the i-th parameter
by a small constant &gt; 0.</p>
        <p>
          Meta-heuristics: apply meta-heuristics for solving numerical optimization
problems that do not require gradient information like Di erential Evolution or
Particle Swarm Optimization [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
4.2
In principle, we could just build a huge classi er BAG and apply a parameter
learning algorithm in order to solve a classi cation problem. Similar to the idea
of the approximation theorems for neural networks [8], a classi cation BAG
with a single hidden layer can learn to approximate arbitrary discrete functions.
The intuitive explanation is that we can just generate one hidden argument
for every possible input (or region of input values) that is supported by the
input arguments that agree with this input and attacked by the remaining input
arguments. This argument then supports the desired label and attacks all other
labels. The classi cation BAG for XOR in Figures 8 illustrates the idea (we
eliminated some redundancy, though). However, of course, this is not a very
meaningful model and will probably over t the noisy dataset after training. It
will also be very di cult to interpret such a model.
        </p>
        <p>Since interpretability is our main motivation, we want to learn a sparse
classi cation BAG. The situation is similar for Bayesian networks, where one usually
wants to learn a compact network with as few spurious edges as possible [13].
Bayesian networks have some local decomposition properties that we cannot
exploit. However, there are some general learning ideas that we can immediately
apply. One way to learn the structure of Bayesian networks is by means of a
local search prodedure that starts from some random graph and then repeatedly
1. creates a neighborhood of the graph by means of search operators,
2. evaluates (a subset of) the graphs in the neighborhood by minimizing the
loss function for this graph,
3. picks the best neighbor for the next iteration
until the graph cannot be improved anymore [13]. Step 2 can be easier for
Bayesian networks because often closed-form solutions for the optimal
parameters exist. We will usually have to perform a parameter search instead and may
end up with parameters that are only locally optimal. An interesting question is
therefore if we can set up classi cation BAGs such that we obtain closed-form
solutions or, at least, loss minimization problems with a unique minimum. For
example, for P-BAGs with the linear constraints from (1), this may be possible.</p>
        <p>Common search operators for Bayesian networks are adding, deleting and
reversing edges. Reversing edges currently does not play a role for us, but may be
interesting in order to obtain more expressive classi ers. Turning an attack edge
into a support edge may be another useful operator. We also want to add hidden
arguments. Since just adding arguments without any edges cannot change the
classi cation outcome, operators may introduce new arguments between existing
connections and summarize these connections.</p>
        <p>
          The search can be guided by local search heuristics like simulated annealing
or beam search [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In order to simplify the search process, to avoid over tting
and to improve interpretability, it is reasonable to restrict the possible
structures. In addition to the forward structure that we assumed throughout this
paper, it seems also reasonable to restrict the number of layers, the number
of arguments per layer and the number of ingoing and outgoing edges. We are
currently working on an implementation to evaluate di erent strategies.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>There has been growing interest in combining argumentation technology and
machine learning in recent years. [25] proposed some ideas for solving classi
cation problems by means of structured argumentation. As opposed to abstract
argumentation, structured argumentation explicitly takes the structure of
arguments like their premises and conclusion into account. The idea in [25] is to
apply a rule mining algorithm rst in order to learn structured arguments. A
structured argumentation solver can then be applied in order to derive a label
for given inputs and to explain the outcome. While this is a very interesting idea
for explainable classi cation, it relies very much on the success of the underlying
rule mining algorithm. In particular, it is currently unclear how to train such a
model in an end-to-end fashion such that the rule mining process is guided by
the classi cation outcome of the reasoner.</p>
      <p>
        Gradual argumentation frameworks have been combined successfully with
machine learning methods in order to add explainability to problems like
product recommendation [22] or review aggregation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, the
argumentation framework and the machine learning method are again not trained in an
end-to-end fashion. Instead, a machine learning method is applied rst and the
argumentation framework is applied on top of the result.
      </p>
      <p>[9] recently proposed some ideas for learning epistemic constraints for
PBAGs from data. The constraints are more general than what we considered here
and have the form of rules. The premise of these rules consists of a conjunction
of atomic probability statements and the conclusion is an atomic probability
statement as well. Using a constraint learning algorithm like that may be an
alternative to learn the structure of a classi cation BAG without repeatedly
calling a parameter learning algorithm.</p>
      <p>There also has been some work on learning classical argumentation
frameworks from sets of accepted arguments [24, 12]. However, the motivation is very
di erent from our motivation. While argumentation usually asks, given an
argumentation graph, which arguments can be accepted, the authors in these works
ask, given arguments accepted by users, what is the underlying argumentation
graph? Therefore, these approaches do not allow for a distinction between
input and output arguments and the investigated frameworks are restricted to
attack-relations only. In particular, the training procedure is currently based on
argumentation rationales, rather than based on a classi cation outcome.</p>
      <p>
        In principle, we could also instantiate classi cation BAGs with classical
bipolar frameworks [
        <xref ref-type="bibr" rid="ref2">2, 15, 20</xref>
        ]. The degree of acceptance at an argument can then be
de ned as the relative frequency of labellings that accept the argument, where
we restrict to such labellings that accept the input arguments. The resulting
loss function will be non-di erentiable, however, and computing all labellings
can be very expensive. It is interesting to note, though, that the relative
frequencies can often be computed by encoding the argumentation problems as a
Markov network [21]. This relationship may also be helpful to apply learning
algorithms for Markov networks for learning classical classi cation BAGs and
other probabilistic classi cation BAGs discussed in [21].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented some conceptual ideas for solving classi cation problem by means
of abstract argumentation technology. One important di erence to previous
combinations of argumentation and machine learning is that our framework can be
trained in an end-to-end fashion.</p>
      <p>Our classi cation BAGs are structurally similar to feedforward neural
networks and Bayesian networks and we took a lot of inspiration from these elds.
Naive classi cation BAGs are to deep classi cation BAGs as logistic regression
is to deep multilayer perceptrons and as a naive Bayes classi er is to complex
Bayesian networks. As we argued before, from a classi cation perspective, deep
classi cation BAGs can be seen as universal function approximators that can
theoretically approximate arbitrary discrete functions.</p>
      <p>Our hope is that Classi cation BAGs can o er better transparency and
explainability than other numerical classi ers. For example, for deep neural
networks, transparency is often lost due to the deep and dense structure of the
network. Bayesian networks look very intuitive, but are frequently misinterpreted.
For example, practitioners often assume that edges encode causal relationships,
while the actual theory only assumes that missing edges encode
independencies [13]. Classi cation BAGs are less susceptible to misinterpretations because
the meaning of attack and support edges is very intuitive. The calculations of
degrees of acceptance seem also easier to grasp than the marginal probability
computations that underlie Bayesian networks.</p>
      <p>Of course, the applicability of classi cation BAGs will depend on the
availability of reliable learning algorithms and an experimental evaluation is necessary
in order to evaluate their performance. We are currently implementing di erent
methods in a Python library and will start an experimental evaluation soon.
7. Dung, P.M.: On the acceptability of arguments and its fundamental role in
nonmonotonic reasoning, logic programming and n-person games. Arti cial intelligence
77(2), 321{357 (1995)
8. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)
9. Hunter, A.: Learning constraints for the epistemic graphs approach to
argumentation. In: Computational Models of Argument (COMMA). p. to appear. IOS Press
(2020)
10. Hunter, A., Polberg, S., Thimm, M.: Epistemic graphs for representing and
reasoning with positive and negative in uences of arguments. Arti cial Intelligence
281, 103236 (2020). https://doi.org/10.1016/j.artint.2020.103236
11. Hunter, A., Thimm, M.: On partial information and contradictions in probabilistic
abstract argumentation. In: International Conference on Principles of Knowledge
Representation and Reasoning (KR). pp. 53{62. AAAI Press (2016)
12. Kido, H., Okamoto, K.: A bayesian approach to argument-based reasoning for
attack estimation. In: Sierra, C. (ed.) International Joint Conference on Arti cial
Intelligence (IJCAI). pp. 249{255 (2017)
13. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques.</p>
      <p>MIT press (2009)
14. Mossakowski, T., Neuhaus, F.: Modular semantics and characteristics for bipolar
weighted argumentation graphs. arXiv preprint arXiv:1807.06685 (2018)
15. Oren, N., Norman, T.J.: Semantics for evidence-based argumentation. In:
Computational Models of Argument (COMMA). vol. 172, pp. 276{284. IOS Press (2008)
16. Potyka, N.: Continuous dynamical systems for weighted bipolar argumentation. In:
International Conference on Principles of Knowledge Representation and
Reasoning (KR). pp. 148{157 (2018)
17. Potyka, N.: A tutorial for weighted bipolar argumentation with continuous
dynamical systems and the java library attractor. International Workshop on
NonMonotonic Reasoning (NMR) (2018)
18. Potyka, N.: Extending modular semantics for bipolar weighted argumentation. In:
International Conference on Autonomous Agents and MultiAgent Systems
(AAMAS). pp. 1722{1730 (2019)
19. Potyka, N.: A polynomial-time fragment of epistemic probabilistic
argumentation. International Journal of Approximate Reasoning 115, 265{289 (2019).
https://doi.org/10.1016/j.ijar.2019.10.005
20. Potyka, N.: Abstract argumentation with markov networks. In: European
Conference on Arti cial Intelligence (ECAI). p. to appear (2020)
21. Potyka, N.: Bipolar abstract argumentation with dual attacks and supports. In:
International Conference on Principles of Knowledge Representation and Reasoning
(KR). p. to appear (2020)
22. Rago, A., Cocarascu, O., Toni, F.: Argumentation-based recommendations:
Fantastic explanations and how to nd them. In: International Joint Conference on
Arti cial Intelligence (IJCAI). pp. 1949{1955 (2018)
23. Rago, A., Toni, F., Aurisicchio, M., Baroni, P.: Discontinuity-free decision support
with quantitative argumentation debates. In: International Conference on
Principles of Knowledge Representation and Reasoning (KR). pp. 63{73 (2016)
24. Riveret, R., Governatori, G.: On learning attacks in probabilistic abstract
argumentation. In: Jonker, C.M., Marsella, S., Thangarajah, J., Tuyls, K. (eds.)
Proceedings of AAMAS'16. pp. 653{661. IFAAMAS (2016)
25. Thimm, M., Kersting, K.: Towards argumentation-based classi cation. In: Logical
Foundations of Uncertainty and Machine Learning Workshop. vol. 17 (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amgoud</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-Naim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Evaluation of arguments in weighted bipolar graphs</article-title>
          .
          <source>In: European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty (ECSQARU)</source>
          . pp.
          <volume>25</volume>
          {
          <fpage>35</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amgoud</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cayrol</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lagasquie-Schiex</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          :
          <article-title>On the bipolarity in argumentation frameworks</article-title>
          .
          <source>In: International Workshop on Non-Monotonic Reasoning (NMR)</source>
          .
          <source>vol. 4</source>
          , pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rago</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>How many properties do we need for gradual argumentation</article-title>
          ?
          <source>In: AAAI Conference on Arti cial Intelligence (AAAI)</source>
          . pp.
          <volume>1736</volume>
          {
          <fpage>1743</fpage>
          .
          <string-name>
            <surname>AAAI</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aurisicchio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertanza</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automatic evaluation of design alternatives with quantitative argumentation</article-title>
          .
          <source>Argument &amp; Computation</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <volume>24</volume>
          {
          <fpage>49</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cocarascu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rago</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Extracting dialogical explanations for review aggregations with argumentative dialogical agents</article-title>
          .
          <source>In: International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)</source>
          . pp.
          <volume>1261</volume>
          {
          <issue>1269</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swamy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Search and optimization by metaheuristics</article-title>
          .
          <source>Techniques and Algorithms</source>
          Inspired by Nature; Birkhauser: Basel, Switzerland (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>