=Paper=
{{Paper
|id=Vol-3086/paper9
|storemode=property
|title=Towards Argumentative Decision Graphs: Learning Argumentation Graphs from Data
|pdfUrl=https://ceur-ws.org/Vol-3086/paper9.pdf
|volume=Vol-3086
|authors=Pierpaolo Dondio
|dblpUrl=https://dblp.org/rec/conf/aiia/Dondio21
}}
==Towards Argumentative Decision Graphs: Learning Argumentation Graphs from Data==
<pdf width="1500px">https://ceur-ws.org/Vol-3086/paper9.pdf</pdf>
<pre>
Towards Argumentative Decision Graphs: Learning
Argumentation Graphs from Data
Pierpaolo Dondio1
1
    School of Computer Science, Technological University Dublin, Grangegorman Campus, D07 EWV4, Dublin, Ireland


                                         Abstract
                                         In this paper we present a novel data-mining model called argumentative decision graphs (𝐴𝐷𝐺). An 𝐴𝐷𝐺
                                         is a special argumentation framework where arguments have a rule-based structure and an attack relation is
                                         defined among arguments. 𝐴𝐷𝐺𝑠 are graph-like models learnt from data in a supervised way that can be
                                         used for classification tasks. As in a decision tree, given a set of input features, an 𝐴𝐷𝐺 returns the value of
                                         the target variable. Unlike decision trees, the output of an 𝐴𝐷𝐺 can be also an undecided status, occurring
                                         when the model does not have enough reasons to predict a value for the target variable. This is due to
                                         the use of argumentation semantics to identify what arguments of an 𝐴𝐷𝐺 are accepted and consequently
                                         make a prediction about the target variable. Unlike Bayesian Networks, 𝐴𝐷𝐺𝑠 are not required to be acyclic,
                                         but they can have any topology. Advantages of 𝐴𝐷𝐺𝑠 are the possibility of using different semantics to
                                         make predictions, the ability to deal with incomplete input data and to generate compact explanations. We
                                         evaluate a preliminary greedy algorithm to learn an 𝐴𝐷𝐺 from data using public datasets and we compare
                                         our results with Decision Tree in terms of balanced accuracy and size of the model. Our results provide
                                         evidence to further progress our research.

                                          Keywords
                                         Argumentation, Data Mining, Decision Tree, Graph models, Explainability


1. Introduction
In this paper we describe a novel data-mining model called Argumentative Decision Graphs (𝐴𝐷𝐺).
An 𝐴𝐷𝐺 is a supervised data-mining model that learns the relationships between a target variable
and a large enough set of examples. In this first paper we present a preliminary algorithm to learn
𝐴𝐷𝐺𝑠 from data for binary classification.
   An argumentative decision graph is an extension of Dung’s abstract argumentation graphs [1].
As in Dung’s framework, each node of the graph represents an argument, the links represent an
attack relation among the arguments and no support relation is defined. However, unlike Dung’s
abstract framework, arguments have a rule-based structure with a premise (called support) and a
conclusion. Some arguments have a conclusion that can be used to predict the value of the target
variable, while other arguments are not used to predict the target variable directly, but rather they
are used to interact with other predictive arguments.


AI3 2021, 5𝑡ℎ 𝑊 𝑜𝑟𝑘𝑠ℎ𝑜𝑝 𝑜𝑛 𝐴𝑑𝑣𝑎𝑛𝑐𝑒𝑠 𝑖𝑛 𝐴𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝐴𝑟𝑡𝑖𝑓 𝑖𝑐𝑖𝑎𝑙 𝐼 𝑛𝑡𝑒𝑙𝑙𝑖𝑔𝑒𝑛𝑐𝑒
Envelope-Open pierpaolo.dondio@tudublin.ie (P. Dondio)
Orcid 0000-0001-7874-8762 (P. Dondio)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   The main reason to introduce 𝐴𝐷𝐺 is the willingness to bring the advantages of symbolic
non-monotonic reasoning into data-mining models. An 𝐴𝐷𝐺 is an extension of an abstract argu-
mentation framework, that is a symbolic reasoning system able to represent partial and conflicting
knowledge and to formalize a large set of non-monotonic semantics. For this reason, an 𝐴𝐷𝐺 can
deal with partial information and make predictions when some of the inputs are missing.
   The predictions of an 𝐴𝐷𝐺 are made by applying an argumentation semantics. We first recall that
an argumentation semantics is a set of postulates used to identify the set of acceptable arguments,
called extensions. In the labelling approach [2] adopted in this paper, the effect of an argumentation
semantics is to assign a label in , out or undec to each argument. This means that an argument can
respectively be accepted, rejected or deemed undecided. The undec label represents a situation in
which the semantics has no reasons to definitely accept or reject an argument. A first advantage of
using argumentation semantics is that the various semantics offer a rich toolset to model different
ways of making decisions. Grounded semantics, for instance, represents a skeptical acceptance
strategy, while preferred semantics is a so-called credulous semantics that has less conservative
conditions to accept an argument. A second advantage is that an 𝐴𝐷𝐺 could output an undecided
status. This means that an 𝐴𝐷𝐺 can either make a prediction or abstain from making one. This
happens in situations in which the input data create a conflict that cannot be resolved by the
rules of the semantics and therefore there is no ground to take a decision. The undecided status
provides an 𝐴𝐷𝐺 with a way of quantifying the uncertainty generated by conflicting information.
Third, the use of an attack relation between arguments coupled with an argumentation semantics
generate compact and understandable explanations. In a distinctive non-monotonic fashion, a
single argument can invalidate many others and directly or indirectly support a specific conclusion.
Even in a large argumentation graph, the arguments that are responsible for a conclusion could be
a small subset. This could make the explanations more compact and understandable.
   Data-mining models that could be considered similar to 𝐴𝐷𝐺𝑠 are explainable models such as
Decision Trees [3], or graph-like structures learnt from data such as Bayesian Networks [4]. In this
paper, we provide an introduction to 𝐴𝐷𝐺𝑠 and we underline the differences between 𝐴𝐷𝐺 and
similar models. In the second part of this paper, we also provide a first greedy algorithm to learn
an 𝐴𝐷𝐺 from data and we evaluate the performance of such algorithm versus decision trees using
benchmark classification datasets.
   The paper is organized as follows. In section 2 we recall the basics of abstract argumentation
semantics, in section 3 we introduce our 𝐴𝐷𝐺 model, while in section 4 we critically compare them
to decision trees. Section 5 describes an algorithm to learn 𝐴𝐷𝐺 from data that is evaluated in
section 6, while related works are described in section 7.


2. Abstract Argumentation Semantics
Definition 2.1. An argumentation framework 𝐴𝐹 is a pair ⟨𝐴𝑟, ℛ⟩, where 𝐴𝑟 is a non-empty finite set
whose elements are called arguments and ℛ ⊆ 𝐴𝑟 × 𝐴𝑟 is a binary relation, called the attack relation.
If (𝑎, 𝑏) ∈ ℛ we say that 𝑎 attacks 𝑏. An argument 𝑎 is initial if it is not attacked by any argument,
including itself.
  An argumentation framework 𝐴𝐹 = ⟨𝐴𝑟, ℛ⟩ identifies a directed graph. We define the restriction
of an argumentation framework to a set of nodes 𝑆 as the framework 𝐴𝐹↓𝑆 corresponding to the
vertex-induced subgraph of 𝐴𝐹 identified by 𝑆:

Definition 2.2. Given 𝐴𝐹 = ⟨𝐴𝑟, ℛ⟩, the restriction of 𝐴𝐹 to a set of nodes 𝑆 ⊆ 𝐴𝑟 is the
argumentation framework 𝐴𝐹↓𝑆 = ⟨𝑆, ℛ𝑠 ⟩ where ℛ𝑠 = ℛ ∩ (𝑆 × 𝑆).

   An abstract argumentation semantics identifies a set of arguments that can survive the conflicts
encoded by the attack relation ℛ. Dung’s semantics require a group of acceptable arguments to be
conflict-free (an argument and its attackers cannot be accepted at the same) and admissible (the set
of arguments defends itself from external attacks).

Definition 2.3. A set 𝐴𝑟𝑔 ⊆ 𝐴𝑟 is conflict-free iff ∀𝑎, 𝑏 ∈ 𝐴𝑟𝑔, (𝑎, 𝑏) ∉ ℛ.

Definition 2.4. A set 𝐴𝑟𝑔 ⊆ 𝐴𝑟 defends an argument 𝑎 ⊆ 𝐴𝑟 iff ∀𝑏 ∈ 𝐴𝑟 such that (𝑏, 𝑎) ∈ ℛ, ∃𝑐 ∈ 𝐴𝑟𝑔
such that (𝑐, 𝑏) ∈ ℛ. The set of arguments defended by 𝐴𝑟𝑔 is denoted ℱ (𝐴𝑟𝑔). A conflict-free set
𝐴𝑟𝑔 is admissible if 𝐴𝑟𝑔 ⊆ ℱ (𝐴𝑟𝑔) and it is complete if 𝐴𝑟𝑔 = ℱ (𝐴𝑟𝑔).

  We follow the labelling approach of [2], where a semantics assigns to each argument a label 𝑖𝑛,
𝑜𝑢𝑡 or 𝑢𝑛𝑑𝑒𝑐.

Definition 2.5. Let 𝐴𝐹 = ⟨𝐴𝑟, ℛ⟩. A labelling is a total function ℒ ∶ 𝐴𝑟 → {𝑖𝑛, 𝑜𝑢𝑡, 𝑢𝑛𝑑𝑒𝑐}.
We write 𝑖𝑛(ℒ ) for {𝑎 ∈ 𝐴𝑟|ℒ (𝑎) = 𝑖𝑛}, 𝑜𝑢𝑡(ℒ ) for {𝑎 ∈ 𝐴𝑟|ℒ (𝑎) = 𝑜𝑢𝑡}, and 𝑢𝑛𝑑𝑒𝑐(ℒ ) for {𝑎 ∈
𝐴𝑟|ℒ (𝑎) = 𝑢𝑛𝑑𝑒𝑐}.

Definition 2.6. Let 𝐴𝐹 = (𝐴𝑟, ℛ). A complete labelling is a labelling such that for every 𝑎 ∈ 𝐴𝑟
holds that:
   1. if 𝑎 is labelled 𝑖𝑛 then all its attackers are labelled 𝑜𝑢𝑡;
   2. if 𝑎 is labelled 𝑜𝑢𝑡 then it has at least one attacker that is labelled 𝑖𝑛;
   3. if 𝑎 is labelled 𝑢𝑛𝑑𝑒𝑐 then it has at least one attacker labelled 𝑢𝑛𝑑𝑒𝑐 and no attackers labelled 𝑖𝑛.

Definition 2.7. Given 𝐴𝐹 = (𝐴𝑟, ℛ), ℒ is the grounded labelling iff ℒ is a complete labelling where
𝑢𝑛𝑑𝑒𝑐(ℒ ) is maximal (w.r.t. set inclusion) among all complete labellings of 𝐴𝐹. ℒ is the preferred
labelling iff ℒ is a complete labelling where 𝑖𝑛(ℒ ) is maximal (w.r.t. set inclusion) among all complete
labellings of 𝐴𝐹. A stable labelling is a complete labelling with 𝑢𝑛𝑑𝑒𝑐(ℒ ) = ∅.

   The grounded semantics, first introduced by Pollock [5], is a skeptical semantics that can be
computed in polynomial time by accepting initial arguments and then any argument defended
directly or indirectly by initial arguments. In this first paper, grounded semantics is the only
semantics used by an 𝐴𝐷𝐺.


3. Argumentative Decision Graphs
In this section we introduce the notion of argumentative decision graphs. We work with a dataset
composed by a set of features, each of them taking values in a finite set. A feature represent
the target variable of the classification task. Informally, an argumentative decision graph is an
extension of Dung’s abstract argumentation graph [1] where arguments have a rule-based structure
with a premise (also called the support) and a conclusion. The premise of each argument consists
of a feature and an associated value, while the conclusion is a value for the target variable. Some
arguments might have an empty conclusion, meaning that their role is not to predict the target
variable directly, but rather to interact with other predictive arguments.
   Formally, we consider a dataset 𝒟 represented by a 𝑁 × 𝑀 matrix-like data structure, where each
row is called an instance of the dataset and each column is called a feature. An instance can be
represented by a tuple 𝑡 = ⟨𝑣1 , ., 𝑣𝑚 ⟩, where 𝑣𝑖 is the value associated with the 𝑖𝑡ℎ feature 𝑓𝑖 . We
consider the predicates of the form 𝑓𝑖 (𝑣), with the meaning ”the feature 𝑖 has the value 𝑣”.
   Given a tuple 𝑡 = ⟨𝑣1 , ., 𝑣𝑖 , .., 𝑣𝑚 ⟩, the predicate 𝑓𝑖 (𝑣) is verified by 𝑡 iff 𝑣𝑖 = 𝑣 (the value of the 𝑖𝑡ℎ
component of the tuple 𝑡 is equal to 𝑣), and not verified otherwise. One of the features 𝑓𝑖 is called the
target variable and it is denoted with 𝑦, and therefore the predicates involving the target variable
have the form 𝑦(𝑣). The set of all the predicates is called 𝒫𝑓 ∪ 𝒫𝑦 , where 𝒫𝑦 is the set of predicates
regarding the target variable 𝑦 and 𝒫𝑓 the set of predicates regarding the other features {𝑓1 ..𝑓𝑚 }.
   An 𝐴𝐷𝐺 is an argumentation framework where each argument 𝑎 has a structure 𝑎 = ⟨𝜙, 𝜃⟩,
where 𝜙 ∈ 𝒫𝑓 and 𝜃 is either the empty set or a target variable predicate 𝑦(𝑣). An argument with
a non-empty conclusion is called a predictive argument. An attack relation is defined over the
arguments. The definition is therefore the following:

Definition 3.1. An argumentative decision graph ADG is an argumentation framework 𝐴𝐹 =
(𝐴𝑟, ℛ) where each 𝑎 ∈ 𝐴𝑟 has the form 𝑎 = ⟨𝜙, 𝜃⟩, 𝜙 ∈ 𝒫𝑓 , 𝜃 ∈ 𝒫𝑦 ∪ ∅ and ℛ ⊆ 𝐴𝑟 × 𝐴𝑟.

   Example 1. Let us consider the dataset 𝒟1 in Table 1, describing Paul’s activities on Sunday.
The dataset has the following four Boolean features: w (whether the weather is windy), s (whether
the weather is sunny), k (whether Paul has a sore knee or not) and l (whether Paul has a fishing
licence) and a target variable a (activity), that takes the two values {surf, fish}. Using the features of
𝒟1 , the following two 𝐴𝐷𝐺𝑠 are given:

    • 𝐴𝐷𝐺1 = ⟨{𝑎1 , 𝑎2 , 𝑎3 }, {(𝑎1 , 𝑎2 ), (𝑎2 , 𝑎3 ), (𝑎3 , 𝑎2 )}
    • 𝐴𝐷𝐺2 = ⟨{𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 }, {(𝑎1 , 𝑎2 ), (𝑎2 , 𝑎3 ), (𝑎3 , 𝑎2 )}

   where 𝑎1 = ⟨𝑘(𝑦𝑒𝑠), ∅)⟩, 𝑎2 = ⟨𝑤(𝑦𝑒𝑠), 𝑎(𝑠𝑢𝑟𝑓 )⟩, 𝑎3 = ⟨𝑠(𝑦𝑒𝑠), 𝑎(𝑓 𝑖𝑠ℎ)⟩, 𝑎4 = ⟨𝑤(𝑛𝑜), 𝑎(𝑓 𝑖𝑠ℎ)⟩.

  𝐴𝐷𝐺1 is saying that Paul goes surfing if the weather is windy (argument 𝑎2 ) and fishing if the
weather is sunny (𝑎3 ), but he cannot do both of the activities (𝑎2 and 𝑎3 mutually attack each other),
and he does not go surfing if he has a sore knee (𝑎1 attacks 𝑎2 ). 𝐴𝐷𝐺2 is adding the information
that Paul goes fishing if the weather is not windy (𝑎4 ). We can represent an 𝐴𝐷𝐺 with a directed
graph where each node is labelled with the argument it represents. In figure 1 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2
are shown. For readability, we also wrote the description of each argument.

3.1. Well-formed ADGs
In order to be well-formed, the attack relation of an 𝐴𝐷𝐺 has to satisfy some consistency constraints.
The first constraint is that there is no attack between two arguments whose supports contain the
Table 1
The dataset 𝒟 consists of 10 tuples and four features. The values of the target variable activity predicted by
𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 are also reported.
    ID       Sunny (s)        Windy (w)         Sore Knee (k)        Activity (a)        𝑃𝐴𝐷𝐺1         𝑃𝐴𝐷𝐺2
    1          Yes               Yes                 Yes                Fish              Fish          Fish
    2          Yes               Yes                 No                 Surf              und           und
    3          Yes               No                  No                 Fish              Fish          Fish
    4          Yes               No                  No                 Fish              Fish          Fish
    5           No               Yes                 Yes                Fish              unk           unk
    6           No               No                  Yes                Fish              unk           Fish
    7           No               Yes                 No                 Surf              Surf          Surf
    8           No               No                  No                 Fish              unk           Fish
    9           No               Yes                 Yes                Surf              unk           unk
    10          No               Yes                 No                 Surf              Surf          Surf


Figure 1: Two ADGs.


same feature. This is because the two arguments are mutually exclusive, since only one of the two
supports can be verified by an input tuple. The second constrain is that there is no attack between
two arguments whose conclusions are the same and not empty. Third, if two arguments 𝑎 and 𝑏
have mutually exclusive conclusions and they use different features in their supports, there must
be an attack between them: either 𝑎 attacks 𝑏, 𝑏 attacks 𝑎 or they symmetrically attack each other.
This constrain is introduced to guarantee that the predictions of an 𝐴𝐷𝐺 are conflict-free, i.e., two
arguments with mutually exclusive conclusions cannot be accepted at the same time. Formally the
three constrains are as follows. We remind how 𝑦 denotes the target variable and 𝑓𝑖 the 𝑖𝑖𝑡ℎ feature.

Definition 3.2. Given an 𝐴𝐷𝐺 = ⟨𝐴𝑟, ℛ⟩, the 𝐴𝐷𝐺 is well-formed iff ∀𝑎1 , 𝑎2 ∈ 𝐴𝑟 with 𝑎1 =
⟨𝜙1 , 𝜃1 ⟩, 𝑎2 = ⟨𝜙2 , 𝜃2 ⟩ it holds that:

   1. if 𝜙1 = 𝑓𝑖 (𝑣1 ) and 𝜙2 = 𝑓𝑖 (𝑣2 ) then (𝑎1 , 𝑎2 ) ∉ ℛ ∧ (𝑎2 , 𝑎1 ) ∉ ℛ
   2. if 𝜃1 = 𝜃2 ≠ ∅ then (𝑎1 , 𝑎2 ) ∉ ℛ ∧ (𝑎2 , 𝑎1 ) ∉ ℛ
   3. if 𝑎1 = ⟨𝑓1 (𝑣𝑥 ), 𝑦(𝑣1 )⟩, 𝑎2 = ⟨𝑓2 (𝑣𝑦 ), 𝑦(𝑣2 )⟩ and 𝑣1 ≠ 𝑣2 ∧ 𝑓1 ≠ 𝑓2 then (𝑎1 , 𝑎2 ) ∈ ℛ ∨ (𝑎2 , 𝑎1 ) ∈ ℛ.
3.2. Making Predictions using an ADG
We consider a dataset 𝒟 and an 𝐴𝐷𝐺 built from 𝒟. The target variable is 𝑦 and 𝑌 is the set of values
that the target variable can take.
   An 𝐴𝐷𝐺 identifies a function 𝑃𝐴𝐷𝐺 ∶ 𝒟 → 𝑌 ∪ {und , unk } that, given an input tuple 𝑡 ∈ 𝒟, it
returns either a value for the target variable or one of two additional outputs: undecided (und ) or
unknown (unk ). The computation of 𝑃𝐴𝐷𝐺 is described in algorithm 1. Given an input instance 𝑡, we
first consider the set 𝑉 of all the arguments whose support is verified by 𝑡. Then, we apply grounded
semantics on the abstract argumentation framework restricted to the verified arguments. We
remind how, since the 𝐴𝐷𝐺 is well-formed, all the accepted arguments have the same conclusion.
We then consider the following situations:
    1. if there is at least a predictive accepted argument, the value predicted by the 𝐴𝐷𝐺 is the
       value of the conclusion of any of the predictive accepted arguments.
    2. if the extension is empty but there is at least one predictive argument that is undecided, then
       the predicted value of the 𝐴𝐷𝐺 is the status undecided : there are arguments that could be
       used to predict the target variable, but they are part of conflicts that cannot be resolved.
    3. in all the other cases, when there are neither accepted nor undecided predictive arguments,
       the value returned by the 𝐴𝐷𝐺 is the status unknown .

 Algorithm 1: The function 𝑃𝐴𝐷𝐺 to make predictions given a tuple 𝑡 ∈ 𝒟. The dataset 𝒟 is
 described by the features 𝑓1 , .., 𝑓𝑚 and target variable 𝑦 taking values in the set 𝑌.
1  Inputs: 𝐴𝐷𝐺 = ⟨𝐴𝑟, ℛ⟩, a tuple 𝑡 ∈ 𝒟;
2  Output: a predicted value in the set 𝑌 ∪ {und,unk }
 3 𝑉 ← verified(𝐴𝑟, 𝑡)
 4 ℒ ← Grounded(𝐴𝐷𝐺↓𝑉 )
 5 if in(ℒ ) ≠ ∅ then
 6      if ∃𝑎 ∈ in(ℒ ) ∣ 𝑎 = ⟨𝜙, 𝑦(𝑣)⟩ then
 7           return v
 8      else
 9           if und(ℒ ) ≠ ∅ ∧ ∃𝑎 ∈ und(ℒ ) ∣ 𝑎 = ⟨𝜙, 𝑦(𝑣)⟩ then return und ;
10 return unk


   Example 2. Let us consider again 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 and the dataset 𝒟1 in Table 1. Let us consider
the first tuple 𝑡1 = ⟨𝑦𝑒𝑠, 𝑦𝑒𝑠, 𝑦𝑒𝑠, 𝑓 𝑖𝑠ℎ⟩ and compute 𝑃𝐴𝐷𝐺1 (𝑡1 ). All the three arguments 𝑎1 , 𝑎2 , 𝑎3 are
verified by 𝑡1 , and therefore the argumentation graph to be considered is equivalent to the full one.
The grounded semantics returns two accepted arguments {𝑎1 , 𝑎3 } since 𝑎2 is defeated by 𝑎1 . Since
𝑎3 = ⟨𝑠(𝑦𝑒𝑠), 𝑎(𝑓 𝑖𝑠ℎ)⟩ then 𝑃𝐴𝐷𝐺1 (𝑡1 ) = 𝑓 𝑖𝑠ℎ. If we consider the tuple 𝑡2 = ⟨𝑦𝑒𝑠, 𝑦𝑒𝑠, 𝑛𝑜, 𝑠𝑢𝑟𝑓 ⟩ (the
weather is sunny and windy and Paul has no sore knee), only 𝑎2 and 𝑎3 are verified by 𝑡2 , and the
resulting argumentation framework is a couple of mutually attacking arguments labelled undec
by the grounded semantics and therefore 𝑃𝐴𝐷𝐺1 (𝑡2 ) =und . Regarding tuple 𝑡6 = ⟨𝑛𝑜, 𝑛𝑜, 𝑦𝑒𝑠, 𝑠𝑢𝑟𝑓 ⟩
(weather neither sunny nor windy, Paul has a sore knee), only argument 𝑎1 is verified and accepted
by grounded semantics. However, this argument does not predict the target variable and, since
there are no undecided arguments, nothing can be said about the target variable and 𝑃𝐴𝐷𝐺1 (𝑡6 ) =unk .
   By comparing the predictions of an 𝐴𝐷𝐺 with the actual values of the target variable, the usual
performance metrics derived from the confusion matrix such as accuracy, precision, recall and
f1-score can be computed. The confusion matrix is a joint-frequency table of the predicted versus
the actual values. In a binary classification the confusion matrix is a 2 × 2 table. However, since
an 𝐴𝐷𝐺 can also return the two special values undecided and unknown , the confusion matrix will
have two additional lines. In Figure 2 the confusion matrices for 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 computed using
the dataset 𝒟1 are shown.


Figure 2: Confusion matrices for 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 for the dataset 𝒟1 of Table 1.


   We note how both 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 have a perfect accuracy when a prediction is generated
(this is measured by the decision accuracy equal to 1), however 𝐴𝐷𝐺1 is in the unknown status in
4 out of 10 cases and undecided in one case, and therefore its overall accuracy is 0.5. 𝐴𝐷𝐺2 is a
better predictive model since it makes predictions in more cases than 𝐴𝐷𝐺1 without adding any
prediction error. This is reflected by its higher f1-score.


4. A critical comparison with Decision Trees
In this section we outline some interesting differences and similarities between 𝐴𝐷𝐺𝑠 and decision
trees. We consider again a dataset 𝒟 with a set of features {𝑓1 , .., 𝑓𝑛 , 𝑦} where 𝑦 is the target variable.
A decision tree (𝐷𝑇) [3] is a tree-structure where each node is associated with a test on a feature
(also referred as a variable in 𝐷𝑇 terminology), and each of the terminal nodes of the tree has a value
of the target variable associated. Given an input tuple, a decision tree returns a predicted value
for the target variable (the decision). In order to take a decision, the tree has to be visited starting
from the root node. At each node the test associated with that node is evaluated and the result of
the test determines which child node has to be visited next, until a terminal node is reached and a
prediction is made. Figure 3 shows an example of two decision trees for the dataset 𝒟1 in Table 1.
𝑇1 has three internal nodes and four terminal nodes. 𝑇2 has two internal and three terminal nodes,
and it does not use all the features (for instance windy weather is not used). Decision trees can be
learnt from data and many efficient algorithms have been proposed, such as ID3, C4.5 and CART
[6]. The fundamental idea is similar, a tree is learnt in a top-down, recursive, divide-and-conquer
approach. Starting from the root node, the algorithm selects the best variable to split the tree in
two or more sub-trees. The best variable is the one that splits the tree in branches such that the
target variable becomes easier to predict in each branch. Different measures for the concept of
easier to predict are used by decision tree algorithms, such as the Gini index, the information gain
(defined as the change in the entropy of the target variable conditioned to knowing the value of the
variable tested for the split) and gain ratio. We now discuss some key differences and similarities
between 𝐴𝐷𝐺𝑠 and decision trees (𝐷𝑇).


Figure 3: Two decision trees for the dataset 𝒟 of Table 1.


   Entry point. A decision tree has a single entry point - the root of the tree - and a specific order
in which nodes are tested to generate a prediction. On the contrary, an 𝐴𝐷𝐺 has no entry point and
arguments can be evaluated in any order. The consequence is that, in case of partial information -
that is the value of one or more features is missing or unknown - an 𝐴𝐷𝐺 has a higher chance to
return a prediction. For instance, if the information about the feature sore knee is not available 𝑇1
and 𝑇2 cannot return a prediction, while 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 could.
   Adding new knowledge. Let us suppose that new knowledge needs to be added to the dataset
in the form of a new feature/variable. An argumentation graph is easier to be expanded; in order to
add a new argument, it is enough to define the new set of attacks involving the new node. The
previous structure of the graph is unchanged. On the other side, adding a variable to a decision tree
without changing its previous structure can be done only by adding new terminal nodes testing the
new variable or use the variable as the new root. Retraining the tree would in general generates a
very different tree where the new variable might appear in different part of the graph. It can be
observed that also for an 𝐴𝐷𝐺 in order to find the optimal graph it might be necessary to modify
the previous nodes and links. However, an 𝐴𝐷𝐺 has more options when it comes to accommodate
a new variable without changing the existing graph due to a less constrained topology.
   Representation of knowledge. The topology of a 𝐷𝑇 and the one of an 𝐴𝐷𝐺 are certainly
different. However, despite the differences, both of the two models represent knowledge as a set of
logical rules in disjunctive normal form. A tree is an acyclic graph where every path is disjoint
from the other. Every path represents a distinct rule to reach a conclusion, and once a path is taken
it is not possible to move back to another path. Multiple paths are usually present to reach the
same conclusion. Therefore a 𝐷𝑇 represents knowledge as a set of mutually exclusive rules and
each split represents a specialization of the support of a rule. On the contrary an 𝐴𝐷𝐺 represents
knowledge as a set of default rules, potentially defining an inconsistent (conflicting) set and subjects
to exceptions. Every unidirectional attack provides an exception to a rule (=an argument) and
therefore specializing further when the rule can be used. Depending on the context and the data,
the conflict-based representation of the 𝐴𝐷𝐺 could generate more understandable and compact
models than the mutually exclusive rules generated by the 𝐷𝑇 or vice versa.
   Explanations. Decision Trees are regarded as one of the most explainable data mining models.
The tree structure, if not too complex, can be easily understood by a human. Each branch of the
tree represents a specific input case, and each branch identifies a logical rule consisting of the
conjunction of all the tests used to reach the terminal node from the root. For instance, sore knee
and not windy weather is an explanation for Paul going fishing in the tree 𝑇2 . The length of an
explanation is the length of the path from the root node to the terminal node. The average length
of all the paths - often weighted by the number of tuples covered by each path - is a measure of the
average length of the explanations generated by the decision tree.
   Regarding an 𝐴𝐷𝐺, the structure can be also understood by a human if the argumentation graph
has a small size, but it could be argued that the free topology of an argumentation graph makes it
harder to be understood. However, even when the graph is large, the application of the chosen
semantics reduces the number of arguments that are necessary to explain a decision. In general,
the arguments defeated by an accepted argument are irrelevant to define the decision and, if we are
using grounded semantics and the set of accepted arguments is not empty, the undecided arguments
are also irrelevant. Moreover, we do not need all the accepted arguments since the acceptability of
some arguments might depend on the acceptability of other arguments. The explanations could
therefore be compact also for a large graph. The analysis of explanations is an important issue that
will be covered in future works.
   Variable Importance. In a decision tree, the importance of a variable is measured by the
information gain, that is the reduction in the conditional entropy of the target variable. In other
words, a variable is important if, by knowing its value, the target variable can be predicted with
more certainty. In an 𝐴𝐷𝐺, various measures of the importance of a variable could be defined. For
instance, the importance of an argument could be proportional to the number of input tuples for
which the argument is necessary to obtain a prediction. Note how, since an argument refers to
a specific feature-value pair, an 𝐴𝐷𝐺 has a more fine-grained notion of variable importance. An
aggregated value for each feature can then be provided.
   Semantics. Another difference between 𝐷𝑇 and 𝐴𝐷𝐺 is the way their outputs are produced.
An 𝐴𝐷𝐺 is evaluated using argumentation semantics to solve the conflicts encoded in the graph.
Given an input tuple, an 𝐴𝐷𝐺 is evaluated by mean of an argumentation semantics that labels
the arguments as accepted, rejected or undecided. The undecided label provides an 𝐴𝐷𝐺 with
a built-in way of quantifying the uncertainty deriving from conflicting information. Moreover,
semantics provide a way of representing different strategies to accept arguments. While the
grounded semantics is the most skeptical semantics maximizing the set of undecided arguments,
semantics such as the preferred are credulous semantics maximizing the set of accepted arguments.
Semantics other than the grounded are often multi-status, meaning that they generate multiple
sets of accepted arguments, all consistent with the semantics used. Multi-status semantics could
be used to present multiple consistent scenarios to the decision makers, each of them with valid
reasons to be accepted.
   Expressiveness. Regarding the expressiveness of the two models, they are equivalent. Given
an 𝐴𝐷𝐺, it is possible to define a 𝐷𝑇 computing exactly the same function. Indeed, if we compute
the output of an 𝐴𝐷𝐺 for all the input tuples, we can build an exhaustive 𝐷𝑇 where each path
represents an input tuple and the value predicted by the terminal nodes is the same as the 𝐴𝐷𝐺
output. Given a 𝐷𝑇, an equivalent 𝐴𝐷𝐺 can be built by following these rules:
   1. for each directed link 𝑙 from a node 𝑁 to a terminal node 𝑀 of the 𝐷𝑇, create a predictive
      argument ⟨𝑓𝑖 (𝑣𝑛 ), 𝑦𝑚 ⟩, where 𝑓𝑖 is the variable tested at node 𝑁 and 𝑣𝑛 is the value of 𝑓𝑖
      identifying the link 𝑙 connecting 𝑁 to the terminal node 𝑀, and 𝑦𝑚 is the value of the target
      variable predicted at node 𝑀.
   2. for each link 𝑙 from a node 𝑁 to node 𝑀 where 𝑀 is non-terminal, create an argument
      ⟨𝑓𝑖 (𝑣𝑛 ), ∅⟩, where 𝑓𝑖 is the variable tested at 𝑁 and 𝑣𝑛 is the value of the feature identifying
      the link 𝑙.
   3. after having found the arguments of the 𝐴𝐷𝐺 using rules 1 and 2, for each argument 𝑎
      associated to a link 𝑙𝑎 from node 𝑁 to 𝑀, add to the attack relation ℛ the pair of nodes (𝑎, 𝑏),
      where 𝑏 is any argument associated to a link 𝑙𝑏 that is on a directed path 𝑝 connecting 𝑁 to a
      terminal node so that 𝑙𝑎 ∉ 𝑝, unless the two arguments use the same feature or predict the
      same value for the target variable.
The idea is that terminal nodes provide the arguments to predict the target variable, while the
non-terminal nodes provide the non-predictive arguments. Regarding the attack relation, an
argument generated by a link 𝑙 from node 𝑁 to 𝑀 attacks all the arguments generated by links that
are mutually exclusive with 𝑙 and belonging to the sub-tree identified by the node 𝑁.


Figure 4: The 𝐴𝐷𝐺 equivalent to 𝑇1 .


   As an example, let us consider the decision tree 𝑇1 . We build the corresponding 𝐴𝐷𝐺. According
to rule 1, the 𝐴𝐷𝐺 has the following predictive arguments: 𝑎1 = ⟨𝑤(𝑦𝑒𝑠), 𝑠𝑢𝑟𝑓 ⟩, 𝑎2 = ⟨𝑤(𝑛𝑜), 𝑓 𝑖𝑠ℎ⟩,
𝑎3 = ⟨𝑠(𝑦𝑒𝑠), 𝑓 𝑖𝑠ℎ⟩, 𝑎4 = ⟨𝑠(𝑛𝑜), 𝑠𝑢𝑟𝑓 ⟩. According to rule 2, it has the following two non-predictive
arguments: 𝑎5 = ⟨𝑘(𝑛𝑜), ∅⟩, 𝑎6 = ⟨𝑘(𝑦𝑒𝑠), ∅⟩. Regarding the attack relation, according to rule 3 the
only attacks are: 𝑎5 attacks 𝑎3 and 𝑎4 , while 𝑎6 attacks 𝑎1 𝑎2 . Argument 𝑎5 does not attack 𝑎6 and
vice versa since they use the same feature and they are therefore mutually exclusive. The resulting
𝐴𝐷𝐺 is shown in figure 4. We conclude by observing that, despite the two models are equivalent,
this does not guarantee that the two algorithms will learn similar models from data. In fact, the
way the relationship between input tuples and target variable is learnt and the way knowledge is
represented imply that the two models will in general be quite different.
     5. An algorithm to learn ADGs from data
     In this section we present a first simple algorithm to learn 𝐴𝐷𝐺 from data. The algorithm is a
     preliminary attempt, and it can be improved in multiple ways. However, it represents a starting
     point to study and test the potential of 𝐴𝐷𝐺𝑠.

      Algorithm 2: The function BuildADG to learn an 𝐴𝐷𝐺 from a dataset 𝒟. The dataset 𝒟 is
      described by the features 𝑓1 , .., 𝑓𝑛 and target variable 𝑦. The target variable is binary and
      takes the values 𝑦 + , 𝑦 − . 𝐹𝑖 represents the set of all possible values for the feature 𝑓𝑖 . The
      function eval returns the selected performance indicator for a given 𝐴𝐷𝐺
     Inputs: Dataset 𝒟, a performance threshold Δ
     1
     Outputs: an 𝐴𝐷𝐺
     2
   3 Function BuildADG( 𝒟,Δ) :
   4    perf ← 0 ; Arg ← ∅
   5    for each distinct pair (𝑓𝑖 , 𝑣𝑛 ) do
   6         Arg ← Arg ∪ ⟨𝑓𝑖 (𝑣𝑁 ), 𝑦 + ⟩ ∪ ⟨𝑓𝑖 (𝑣𝑁 ), 𝑦 − ⟩ ∪ ⟨𝑓𝑖 (𝑣𝑁 ), ∅⟩
   7    𝐴𝐷𝐺𝐵𝑒𝑠𝑡 ← ⟨∅, ∅⟩
   8    while Arg ≠ ∅ do
   9         𝐴𝐷𝐺𝐵𝑒𝑠𝑡 ← 𝐴𝐷𝐺𝑜𝑙𝑑
  10         for 𝑎 ∈ Arg do
  11             𝐴𝐷𝐺𝑛𝑒𝑤 ← AddArg (𝑎, 𝐴𝐷𝐺𝐵𝑒𝑠𝑡 )
  12             perf𝑛𝑒𝑤 ←eval (ADG𝑛𝑒𝑤 )
  13             if perf𝑛𝑒𝑤 > 𝑝𝑒𝑟𝑓 + Δ then
  14                  perf ← perf𝑛𝑒𝑤
  15                  𝐴𝐷𝐺𝑏𝑒𝑠𝑡 ← 𝐴𝐷𝐺𝑛𝑒𝑤
  16                  𝑎𝑏𝑒𝑠𝑡 ← 𝑎
17            if 𝐴𝐷𝐺𝐵𝑒𝑠𝑡 == 𝐴𝐷𝐺𝑜𝑙𝑑 then
18                 return 𝐴𝐷𝐺𝑏𝑒𝑠𝑡
19            𝐴𝑟𝑔.remove(𝑎)
20        return 𝐴𝐷𝐺𝑏𝑒𝑠𝑡
21
22       Function AddARG( 𝑎 = ⟨𝑓𝑎 (𝑣𝑎 ), 𝑦𝑎 ⟩, 𝐴𝐷𝐺 = ⟨𝐴𝑟𝑔, ℛ⟩) ) :
23          for 𝑏 = ⟨𝑓𝑏 (𝑣𝑏 ), 𝑦𝑏 ⟩ ∈ Arg do
24               𝐴𝐷𝐺← ← ⟨𝐴𝑟𝑔 ∪ {𝑎, 𝑏}, ℛ ∪ {(𝑎, 𝑏)}⟩
25               𝐴𝐷𝐺→ ← ⟨𝐴𝑟𝑔 ∪ 𝑏, ℛ ∪ {(𝑏, 𝑎)}⟩
26               𝐴𝐷𝐺↔ ← ⟨𝐴𝑟𝑔 ∪ 𝑏, ℛ ∪ {(𝑎, 𝑏), (𝑏, 𝑎)}⟩
27               if 𝑓𝑎 ≠ 𝑓𝑏 ∧ 𝑦𝑎 ≠ 𝑦𝑞 ≠ ∅ then
28                    𝐴𝐷𝐺 ← 𝑋 ∈ {𝐴𝐷𝐺← , 𝐴𝐷𝐺→ , 𝐴𝐷𝐺↔ } where eval(𝑋) is maximal
29               else if 𝑓𝑎 ≠ 𝑓𝑏 then
30                    𝐴𝐷𝐺 ← 𝑋 ∈ {𝐴𝐷𝐺, 𝐴𝐷𝐺← , 𝐴𝐷𝐺→ , 𝐴𝐷𝐺↔ } where eval(𝑋) is maximal
31          return 𝐴𝐷𝐺
       The algorithm, called BuildADG , builds an 𝐴𝐷𝐺 incrementally by adding an argument at the
     time and by expanding the attack relation in order to maximize a performance indicator, such
     as the overall accuracy of the 𝐴𝐷𝐺. BuildADG is shown in algorithm 2. We consider a dataset
     𝒟 with features {𝑓1 , .., 𝑓𝑛 , 𝑦} where 𝑦 is the target variable, each feature 𝑓𝑖 takes value from its
     corresponding set of values 𝐹𝑖 , while the target variable is binary and it takes the two values 𝑦 +
     or 𝑦 − . The algorithm has two inputs: the dataset 𝒟 and a tuning parameter Δ. The algorithm
starts by identifying the arguments that will be used to build the 𝐴𝐷𝐺. For each pair feature-value
⟨𝑓𝑖 , 𝑣⟩ three arguments are added, one predicting 𝑦 + , one 𝑦 − and the neutral argument with empty
conclusion ⟨𝑓𝑖 (𝑣), ∅⟩.
    Then, for each argument in 𝐴𝑟𝑔 the algorithm adds to the 𝐴𝐷𝐺 the argument 𝑎 that increased
the performance of the resulting 𝐴𝐷𝐺 by the highest interval, but only if the addition of 𝑎 increased
the performance by at least Δ compared to the previous 𝐴𝐷𝐺. Argument 𝑎 is removed from the
list of arguments and the procedure is repeated until all the arguments have been tested or it is
not possible to increase the performance of the 𝐴𝐷𝐺 by Δ. The function AddArg is responsible for
adding a new argument 𝑎 to the 𝐴𝐷𝐺. For each argument 𝑏 already in the 𝐴𝐷𝐺, we need to decide
how the new argument 𝑎 interacts with 𝑏. The resulting 𝐴𝐷𝐺 must be well-formed to avoid logical
inconsistencies. If argument 𝑎 and 𝑏 are mutually exclusive there is no need to add an attack link
between them. The same is for the situation in which 𝑎 and 𝑏 are predicting the same value for the
target variable 𝑦. If both 𝑎 and 𝑏 are predictive but they predict different values for 𝑦 and they do
not have mutually exclusive supports, an attack must be present to keep conflict-freeness. There
are three possibilities: 𝑎 attacks 𝑏, 𝑏 attacks 𝑎 or they mutually attack each other. One of these three
attacks must be present to keep the 𝐴𝐷𝐺 well-formed, and the one generating the best 𝐴𝐷𝐺 is
kept. If either 𝑎 or 𝑏 are non-predictive argument, there is also the fourth possibility that there is
no attack relation between the arguments (line 30).


6. Evaluation
In this section, we provide a first evaluation of the BuildADG algorithm and we compare its results
to a C4.5 Decision Tree. The evaluation compares the total accuracy of the two models using three
benchmark datasets. An evaluation considering other performance indicators or the explainability
of the models is left for future works. The datasets used for the evaluation are well-known, publicly
available datasets [7] for binary classification where all the features are categorical. We used Δ = 0
as parameters, meaning that any improvement to the 𝐴𝐷𝐺 is retained. Table 2 shows the results for
the three datasets. Performance was computed by randomly dividing each dataset into a training
and testing set using an 80%-20% split ratio. We report the accuracy, balanced accuracy and the
size of both the decision tree and the 𝐴𝐷𝐺, measured by the number of nodes and links of the
graph. Table 3 shows the most important features considered by the decision tree and by the 𝐴𝐷𝐺.
Information gain is used to rank features in the decision tree, while the proportion of times that an
argument is necessary to make a prediction is used to rank features in the 𝐴𝐷𝐺.

Table 2
Results obtained by the BuildADG and the C.45 algorithm. The column size reports the number of nodes
and links.
                                                  Decision Tree                        𝐴𝐷𝐺
 Dataset       Records     Features    Acc       CI 95%     B. Acc   size     Acc      B.Acc    size
 Bank             600         11       0.82    [0.74,0.89]   0.82   28/27     0.75      0.75    6/7
 US Census       32561        14       0.84    [0.83,0.85]   0.73   14/13     0.83      0.75   21/78
 Car Price       1782         6        0.94    [0.91,0.96]   0.92   20/19     0.88      0.88    8/11
   In both the bank and car price datasets, the accuracy of the decision tree was statistically higher
than the 𝐴𝐷𝐺 accuracy. The gap was smaller for the car price dataset, where 𝐴𝐷𝐺 registered a
balanced accuracy of 0.88. For the US census dataset we obtained positive results: the accuracy
of the 𝐴𝐷𝐺 was in the 95% confidence interval of the decision tree accuracy and only marginally
lower (83.4% versus 84.2%), and the balanced accuracy of the 𝐴𝐷𝐺 was higher than the one of the
decision tree. Regarding the size of the models learnt, we used the number of nodes and links to
measure it (note how the number of links in a tree is 𝑁 − 1, where 𝑁 is the number of nodes). The
𝐴𝐷𝐺 graph was smaller for the car price and bank dataset, but more complex for the US census
dataset. Table 3 shows the most important variables (ranked by importance) for decision tree and
for 𝐴𝐷𝐺. There is a good degree of overlapping for all the three datasets, meaning that the two
algorithms substantially agreed on the most important variables.
   Discussion. The results obtained are promising but they also expose some of the weakness of the
preliminary algorithm proposed. Indeed, the algorithm is naive in several aspects. For instance,
the addition of a new argument is accepted if the absolute number of correctly classified instances
is increased by an interval Δ even if the accuracy (i.e. the percentage of correct predictions) could
decrease, showing how the algorithm has a bias in favour of increasing the coverage of the instances
rather than maximizing the accuracy. The algorithm also needs a pruning procedure, similar to the
ones used in a decision tree. The result of the US census dataset shows that an 𝐴𝐷𝐺 can became
very complex, harming the understandability of the outputs but also increasing the chance of
model overfitting. A pruning strategy should reduce the complexity of the model, keeping its
accuracy high. One idea could be to introduce a regularization parameter similar to the one present
in the cost-based pruning mechanism of a decision tree, parameter that will penalize complex
𝐴𝐷𝐺𝑠 , forcing the algorithm to find a trade-off between complexity and performance. Finally, our
preliminary algorithm is not computationally efficient, and it can be optimized in multiple ways,
including an approximation using Monte Carlo simulation.

Table 3
Most important variables for decision tree and 𝐴𝐷𝐺
   Dataset                     Decision Tree                            𝐴𝐷𝐺                   ∩
   Bank                    age, children, income             children, income, married       2/3
                     relation, marital status, cap gain,     cap gain, degree, ed num,
   US Census                                                                                 5/7
                    degree, ed num, gender, occupation      hours, relations, age, gender
                                                               persons, safety, price,
   Car Price       persons, safety, maintenance, lug boot                                    3/4
                                                                    maintenance


7. Related Works
Recently, there has been an increasing number of studies mixing argumentation theory and machine
learning methods. However, very little studies directly face the problem of learning an argumenta-
tion graph from data. The large majority of applications are in the field of argumentation mining,
where arguments are extracted automatically from text using NLP techniques either in a data-driven
fashion or in a mixed approach where an explicit structure of arguments has to be matched on
the text [8],[9],[10]. In both cases, learning an argumentation graph from data is not part of the
problem. In [11] the authors learnt the strength of a probabilistic argumentation frameworks using
the Bayesian inference rule, while in [12] the authors proposed an algorithm to learn a probabilistic
argumentation graph given a set of extensions. A different approach is the one where machine
learning techniques have been adopted to predict the acceptability of arguments under a given
semantics. In [13] this problem was tackled by modelling it as a multinomial classification task
and by using convolutional neural networks to learn the acceptability of arguments. The work by
Craandijk and Bex [14] had a similar aim. The authors proposed to use special neural networks,
called argumentation graph neural network (AGNN), to learn a binary classification model predict-
ing whether an argument is accepted or rejected. The difference from our approach is that in those
approaches an argumentation framework already exists, and the problem is to learn the output of a
semantics applied to the argumentation framework. On the contrary, our problem is to learn such
argumentative framework from data. Moreover, our aim is to learn a Dung-like argumentation
framework that could generate understandable justifications, while the aim of both [13] and [14] is
not the intelligibility of the model, but rather to train a black-box deep neural network to compute
a semantics accurately. In [15] the authors proposed to use genetic algorithms to learn a gradual
argumentation graph, considered as an instance of a sparse multi-layer neural network. To obtain
a well-interpretable model, the authors proposed to use a fitness function balancing sparseness
and accuracy of the classifier. The paper presents experimental results on standard benchmark
datasets from the UCI machine learning repository [7]. The results obtained showed an accuracy
comparable to decision trees across the three datasets evaluated. In [16] the authors proposed a
method to equip autonomous agents with the ability to argue and explain their decisions. Similar
to our approach, arguments and attack relations between arguments are built from a set of training
examples. The generation of arguments is also based on all the pairs feature-values found in the
training dataset. Two types of attacks are defined, symmetrical rebuttal attacks and unidirectional
undercutting attacks. Differently from our work, these attacks are explicitly identified by the
structure of the arguments rather than being decided based on how well they fit the dataset given.


8. Conclusions
In this paper, we presented a novel data-mining algorithm called argumentative decision graphs
(𝐴𝐷𝐺). An 𝐴𝐷𝐺 is a special argumentation framework where arguments have a rule-based structure
and an attack relation is defined among arguments. 𝐴𝐷𝐺𝑠 are learnt from data in a supervised way
and they can be used for classification tasks. We have discussed the main differences and similarities
with similar models such as decision trees, showing a translation between the two formalisms.
Unlike decision trees, the output of an 𝐴𝐷𝐺 can be also an undecided status, where the graph
does not have enough reasons to predict a value for the target variable. This is due to the use of
argumentation semantics to identify the arguments of an 𝐴𝐷𝐺 that are accepted and consequently
make a prediction on the target variable. We evaluated a preliminary greedy algorithm to learn an
𝐴𝐷𝐺 from data using benchmark datasets and we compared our results with the C4.5 decision tree
algorithm. Our results showed how 𝐴𝐷𝐺 had an accuracy lower or comparable to decision trees, a
generally less complex model and a good agreement on the importance of the variables between the
two models. The algorithm presented is naive in some of its assumptions, and it can be improved in
many aspects, including how arguments interact, the way the impact of an argument is evaluated
and how to reduce its computational time. Overall, we believe to have provided enough evidence
to justify further research into 𝐴𝐷𝐺𝑠.


References
 [1] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic
     reasoning, logic programming and n-person games, Artificial intelligence 77 (1995) 321–357.
 [2] M. W. Caminada, D. M. Gabbay, A logical account of formal argumentation, Studia Logica 93
     (2009) 109–145.
 [3] J. R. Quinlan, Induction of decision trees, Machine learning 1 (1986) 81–106.
 [4] F. V. Jensen, T. D. Nielsen, Bayesian networks and decision graphs, volume 2, Springer, 2007.
 [5] J. L. Pollock, Cognitive carpentry, a blueprint for how to build a person, Mit Press, 1995.
 [6] B. Hssina, A. Merbouha, H. Ezzikouri, M. Erritali, A comparative study of decision tree id3
     and c4. 5, International Journal of Advanced Computer Science and Applications 4 (2014)
     13–19.
 [7] D. Dua, C. Graff, UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml.
 [8] O. Cocarascu, A. Stylianou, K. Čyras, F. Toni, Data-empowered argumentation for dialectically
     explainable predictions, in: ECAI 2020, IOS Press, 2020, pp. 2449–2456.
 [9] O. Cocarascu, F. Toni, Detecting deceptive reviews using argumentation, in: Proceedings of
     the 1st International Workshop on AI for Privacy and Security, 2016, pp. 1–8.
[10] M. Lippi, P. Torroni, Argument mining: A machine learning perspective, in: International
     Workshop on Theory and Applications of Formal Argumentation, Springer, 2015, pp. 163–176.
[11] K. Noor, A. Hunter, A bayesian probabilistic argumentation framework for learning from online
     reviews, in: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence
     (ICTAI), IEEE, 2020, pp. 742–747.
[12] R. Riveret, G. Governatori, On learning attacks in probabilistic abstract argumentation, in:
     Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent
     Systems, 2016, pp. 653–661.
[13] I. Kuhlmann, M. Thimm, Using graph convolutional networks for approximate reasoning
     with abstract argumentation frameworks: A feasibility study, in: International Conference on
     Scalable Uncertainty Management, Springer, 2019, pp. 24–37.
[14] D. Craandijk, F. Bex, Deep learning for abstract argumentation semantics, arXiv preprint
     arXiv:2007.07629 (2020).
[15] J. Spieler, N. Potyka, S. Staab, Learning gradual argumentation frameworks using genetic
     algorithms, arXiv preprint arXiv:2106.13585 (2021).
[16] L. Amgoud, M. Serrurier, Agents that argue and explain classifications, Autonomous Agents
     and Multi-Agent Systems 16 (2008) 187–209.

</pre>