Towards Argumentative Decision Graphs: Learning Argumentation Graphs from Data Pierpaolo Dondio1 1 School of Computer Science, Technological University Dublin, Grangegorman Campus, D07 EWV4, Dublin, Ireland Abstract In this paper we present a novel data-mining model called argumentative decision graphs (𝐴𝐷𝐺). An 𝐴𝐷𝐺 is a special argumentation framework where arguments have a rule-based structure and an attack relation is defined among arguments. 𝐴𝐷𝐺𝑠 are graph-like models learnt from data in a supervised way that can be used for classification tasks. As in a decision tree, given a set of input features, an 𝐴𝐷𝐺 returns the value of the target variable. Unlike decision trees, the output of an 𝐴𝐷𝐺 can be also an undecided status, occurring when the model does not have enough reasons to predict a value for the target variable. This is due to the use of argumentation semantics to identify what arguments of an 𝐴𝐷𝐺 are accepted and consequently make a prediction about the target variable. Unlike Bayesian Networks, 𝐴𝐷𝐺𝑠 are not required to be acyclic, but they can have any topology. Advantages of 𝐴𝐷𝐺𝑠 are the possibility of using different semantics to make predictions, the ability to deal with incomplete input data and to generate compact explanations. We evaluate a preliminary greedy algorithm to learn an 𝐴𝐷𝐺 from data using public datasets and we compare our results with Decision Tree in terms of balanced accuracy and size of the model. Our results provide evidence to further progress our research. Keywords Argumentation, Data Mining, Decision Tree, Graph models, Explainability 1. Introduction In this paper we describe a novel data-mining model called Argumentative Decision Graphs (𝐴𝐷𝐺). An 𝐴𝐷𝐺 is a supervised data-mining model that learns the relationships between a target variable and a large enough set of examples. In this first paper we present a preliminary algorithm to learn 𝐴𝐷𝐺𝑠 from data for binary classification. An argumentative decision graph is an extension of Dung’s abstract argumentation graphs [1]. As in Dung’s framework, each node of the graph represents an argument, the links represent an attack relation among the arguments and no support relation is defined. However, unlike Dung’s abstract framework, arguments have a rule-based structure with a premise (called support) and a conclusion. Some arguments have a conclusion that can be used to predict the value of the target variable, while other arguments are not used to predict the target variable directly, but rather they are used to interact with other predictive arguments. AI3 2021, 5π‘‘β„Ž π‘Š π‘œπ‘Ÿπ‘˜π‘ β„Žπ‘œπ‘ π‘œπ‘› π΄π‘‘π‘£π‘Žπ‘›π‘π‘’π‘  𝑖𝑛 π΄π‘Ÿπ‘”π‘’π‘šπ‘’π‘›π‘‘π‘Žπ‘‘π‘–π‘œπ‘› 𝑖𝑛 π΄π‘Ÿπ‘‘π‘–π‘“ π‘–π‘π‘–π‘Žπ‘™ 𝐼 𝑛𝑑𝑒𝑙𝑙𝑖𝑔𝑒𝑛𝑐𝑒 Envelope-Open pierpaolo.dondio@tudublin.ie (P. Dondio) Orcid 0000-0001-7874-8762 (P. Dondio) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) The main reason to introduce 𝐴𝐷𝐺 is the willingness to bring the advantages of symbolic non-monotonic reasoning into data-mining models. An 𝐴𝐷𝐺 is an extension of an abstract argu- mentation framework, that is a symbolic reasoning system able to represent partial and conflicting knowledge and to formalize a large set of non-monotonic semantics. For this reason, an 𝐴𝐷𝐺 can deal with partial information and make predictions when some of the inputs are missing. The predictions of an 𝐴𝐷𝐺 are made by applying an argumentation semantics. We first recall that an argumentation semantics is a set of postulates used to identify the set of acceptable arguments, called extensions. In the labelling approach [2] adopted in this paper, the effect of an argumentation semantics is to assign a label in , out or undec to each argument. This means that an argument can respectively be accepted, rejected or deemed undecided. The undec label represents a situation in which the semantics has no reasons to definitely accept or reject an argument. A first advantage of using argumentation semantics is that the various semantics offer a rich toolset to model different ways of making decisions. Grounded semantics, for instance, represents a skeptical acceptance strategy, while preferred semantics is a so-called credulous semantics that has less conservative conditions to accept an argument. A second advantage is that an 𝐴𝐷𝐺 could output an undecided status. This means that an 𝐴𝐷𝐺 can either make a prediction or abstain from making one. This happens in situations in which the input data create a conflict that cannot be resolved by the rules of the semantics and therefore there is no ground to take a decision. The undecided status provides an 𝐴𝐷𝐺 with a way of quantifying the uncertainty generated by conflicting information. Third, the use of an attack relation between arguments coupled with an argumentation semantics generate compact and understandable explanations. In a distinctive non-monotonic fashion, a single argument can invalidate many others and directly or indirectly support a specific conclusion. Even in a large argumentation graph, the arguments that are responsible for a conclusion could be a small subset. This could make the explanations more compact and understandable. Data-mining models that could be considered similar to 𝐴𝐷𝐺𝑠 are explainable models such as Decision Trees [3], or graph-like structures learnt from data such as Bayesian Networks [4]. In this paper, we provide an introduction to 𝐴𝐷𝐺𝑠 and we underline the differences between 𝐴𝐷𝐺 and similar models. In the second part of this paper, we also provide a first greedy algorithm to learn an 𝐴𝐷𝐺 from data and we evaluate the performance of such algorithm versus decision trees using benchmark classification datasets. The paper is organized as follows. In section 2 we recall the basics of abstract argumentation semantics, in section 3 we introduce our 𝐴𝐷𝐺 model, while in section 4 we critically compare them to decision trees. Section 5 describes an algorithm to learn 𝐴𝐷𝐺 from data that is evaluated in section 6, while related works are described in section 7. 2. Abstract Argumentation Semantics Definition 2.1. An argumentation framework 𝐴𝐹 is a pair βŸ¨π΄π‘Ÿ, β„›βŸ©, where π΄π‘Ÿ is a non-empty finite set whose elements are called arguments and β„› βŠ† π΄π‘Ÿ Γ— π΄π‘Ÿ is a binary relation, called the attack relation. If (π‘Ž, 𝑏) ∈ β„› we say that π‘Ž attacks 𝑏. An argument π‘Ž is initial if it is not attacked by any argument, including itself. An argumentation framework 𝐴𝐹 = βŸ¨π΄π‘Ÿ, β„›βŸ© identifies a directed graph. We define the restriction of an argumentation framework to a set of nodes 𝑆 as the framework 𝐴𝐹↓𝑆 corresponding to the vertex-induced subgraph of 𝐴𝐹 identified by 𝑆: Definition 2.2. Given 𝐴𝐹 = βŸ¨π΄π‘Ÿ, β„›βŸ©, the restriction of 𝐴𝐹 to a set of nodes 𝑆 βŠ† π΄π‘Ÿ is the argumentation framework 𝐴𝐹↓𝑆 = βŸ¨π‘†, ℛ𝑠 ⟩ where ℛ𝑠 = β„› ∩ (𝑆 Γ— 𝑆). An abstract argumentation semantics identifies a set of arguments that can survive the conflicts encoded by the attack relation β„›. Dung’s semantics require a group of acceptable arguments to be conflict-free (an argument and its attackers cannot be accepted at the same) and admissible (the set of arguments defends itself from external attacks). Definition 2.3. A set π΄π‘Ÿπ‘” βŠ† π΄π‘Ÿ is conflict-free iff βˆ€π‘Ž, 𝑏 ∈ π΄π‘Ÿπ‘”, (π‘Ž, 𝑏) βˆ‰ β„›. Definition 2.4. A set π΄π‘Ÿπ‘” βŠ† π΄π‘Ÿ defends an argument π‘Ž βŠ† π΄π‘Ÿ iff βˆ€π‘ ∈ π΄π‘Ÿ such that (𝑏, π‘Ž) ∈ β„›, βˆƒπ‘ ∈ π΄π‘Ÿπ‘” such that (𝑐, 𝑏) ∈ β„›. The set of arguments defended by π΄π‘Ÿπ‘” is denoted β„± (π΄π‘Ÿπ‘”). A conflict-free set π΄π‘Ÿπ‘” is admissible if π΄π‘Ÿπ‘” βŠ† β„± (π΄π‘Ÿπ‘”) and it is complete if π΄π‘Ÿπ‘” = β„± (π΄π‘Ÿπ‘”). We follow the labelling approach of [2], where a semantics assigns to each argument a label 𝑖𝑛, π‘œπ‘’π‘‘ or 𝑒𝑛𝑑𝑒𝑐. Definition 2.5. Let 𝐴𝐹 = βŸ¨π΄π‘Ÿ, β„›βŸ©. A labelling is a total function β„’ ∢ π΄π‘Ÿ β†’ {𝑖𝑛, π‘œπ‘’π‘‘, 𝑒𝑛𝑑𝑒𝑐}. We write 𝑖𝑛(β„’ ) for {π‘Ž ∈ π΄π‘Ÿ|β„’ (π‘Ž) = 𝑖𝑛}, π‘œπ‘’π‘‘(β„’ ) for {π‘Ž ∈ π΄π‘Ÿ|β„’ (π‘Ž) = π‘œπ‘’π‘‘}, and 𝑒𝑛𝑑𝑒𝑐(β„’ ) for {π‘Ž ∈ π΄π‘Ÿ|β„’ (π‘Ž) = 𝑒𝑛𝑑𝑒𝑐}. Definition 2.6. Let 𝐴𝐹 = (π΄π‘Ÿ, β„›). A complete labelling is a labelling such that for every π‘Ž ∈ π΄π‘Ÿ holds that: 1. if π‘Ž is labelled 𝑖𝑛 then all its attackers are labelled π‘œπ‘’π‘‘; 2. if π‘Ž is labelled π‘œπ‘’π‘‘ then it has at least one attacker that is labelled 𝑖𝑛; 3. if π‘Ž is labelled 𝑒𝑛𝑑𝑒𝑐 then it has at least one attacker labelled 𝑒𝑛𝑑𝑒𝑐 and no attackers labelled 𝑖𝑛. Definition 2.7. Given 𝐴𝐹 = (π΄π‘Ÿ, β„›), β„’ is the grounded labelling iff β„’ is a complete labelling where 𝑒𝑛𝑑𝑒𝑐(β„’ ) is maximal (w.r.t. set inclusion) among all complete labellings of 𝐴𝐹. β„’ is the preferred labelling iff β„’ is a complete labelling where 𝑖𝑛(β„’ ) is maximal (w.r.t. set inclusion) among all complete labellings of 𝐴𝐹. A stable labelling is a complete labelling with 𝑒𝑛𝑑𝑒𝑐(β„’ ) = βˆ…. The grounded semantics, first introduced by Pollock [5], is a skeptical semantics that can be computed in polynomial time by accepting initial arguments and then any argument defended directly or indirectly by initial arguments. In this first paper, grounded semantics is the only semantics used by an 𝐴𝐷𝐺. 3. Argumentative Decision Graphs In this section we introduce the notion of argumentative decision graphs. We work with a dataset composed by a set of features, each of them taking values in a finite set. A feature represent the target variable of the classification task. Informally, an argumentative decision graph is an extension of Dung’s abstract argumentation graph [1] where arguments have a rule-based structure with a premise (also called the support) and a conclusion. The premise of each argument consists of a feature and an associated value, while the conclusion is a value for the target variable. Some arguments might have an empty conclusion, meaning that their role is not to predict the target variable directly, but rather to interact with other predictive arguments. Formally, we consider a dataset π’Ÿ represented by a 𝑁 Γ— 𝑀 matrix-like data structure, where each row is called an instance of the dataset and each column is called a feature. An instance can be represented by a tuple 𝑑 = βŸ¨π‘£1 , ., π‘£π‘š ⟩, where 𝑣𝑖 is the value associated with the π‘–π‘‘β„Ž feature 𝑓𝑖 . We consider the predicates of the form 𝑓𝑖 (𝑣), with the meaning ”the feature 𝑖 has the value 𝑣”. Given a tuple 𝑑 = βŸ¨π‘£1 , ., 𝑣𝑖 , .., π‘£π‘š ⟩, the predicate 𝑓𝑖 (𝑣) is verified by 𝑑 iff 𝑣𝑖 = 𝑣 (the value of the π‘–π‘‘β„Ž component of the tuple 𝑑 is equal to 𝑣), and not verified otherwise. One of the features 𝑓𝑖 is called the target variable and it is denoted with 𝑦, and therefore the predicates involving the target variable have the form 𝑦(𝑣). The set of all the predicates is called 𝒫𝑓 βˆͺ 𝒫𝑦 , where 𝒫𝑦 is the set of predicates regarding the target variable 𝑦 and 𝒫𝑓 the set of predicates regarding the other features {𝑓1 ..π‘“π‘š }. An 𝐴𝐷𝐺 is an argumentation framework where each argument π‘Ž has a structure π‘Ž = βŸ¨πœ™, πœƒβŸ©, where πœ™ ∈ 𝒫𝑓 and πœƒ is either the empty set or a target variable predicate 𝑦(𝑣). An argument with a non-empty conclusion is called a predictive argument. An attack relation is defined over the arguments. The definition is therefore the following: Definition 3.1. An argumentative decision graph ADG is an argumentation framework 𝐴𝐹 = (π΄π‘Ÿ, β„›) where each π‘Ž ∈ π΄π‘Ÿ has the form π‘Ž = βŸ¨πœ™, πœƒβŸ©, πœ™ ∈ 𝒫𝑓 , πœƒ ∈ 𝒫𝑦 βˆͺ βˆ… and β„› βŠ† π΄π‘Ÿ Γ— π΄π‘Ÿ. Example 1. Let us consider the dataset π’Ÿ1 in Table 1, describing Paul’s activities on Sunday. The dataset has the following four Boolean features: w (whether the weather is windy), s (whether the weather is sunny), k (whether Paul has a sore knee or not) and l (whether Paul has a fishing licence) and a target variable a (activity), that takes the two values {surf, fish}. Using the features of π’Ÿ1 , the following two 𝐴𝐷𝐺𝑠 are given: β€’ 𝐴𝐷𝐺1 = ⟨{π‘Ž1 , π‘Ž2 , π‘Ž3 }, {(π‘Ž1 , π‘Ž2 ), (π‘Ž2 , π‘Ž3 ), (π‘Ž3 , π‘Ž2 )} β€’ 𝐴𝐷𝐺2 = ⟨{π‘Ž1 , π‘Ž2 , π‘Ž3 , π‘Ž4 }, {(π‘Ž1 , π‘Ž2 ), (π‘Ž2 , π‘Ž3 ), (π‘Ž3 , π‘Ž2 )} where π‘Ž1 = βŸ¨π‘˜(𝑦𝑒𝑠), βˆ…)⟩, π‘Ž2 = βŸ¨π‘€(𝑦𝑒𝑠), π‘Ž(π‘ π‘’π‘Ÿπ‘“ )⟩, π‘Ž3 = βŸ¨π‘ (𝑦𝑒𝑠), π‘Ž(𝑓 π‘–π‘ β„Ž)⟩, π‘Ž4 = βŸ¨π‘€(π‘›π‘œ), π‘Ž(𝑓 π‘–π‘ β„Ž)⟩. 𝐴𝐷𝐺1 is saying that Paul goes surfing if the weather is windy (argument π‘Ž2 ) and fishing if the weather is sunny (π‘Ž3 ), but he cannot do both of the activities (π‘Ž2 and π‘Ž3 mutually attack each other), and he does not go surfing if he has a sore knee (π‘Ž1 attacks π‘Ž2 ). 𝐴𝐷𝐺2 is adding the information that Paul goes fishing if the weather is not windy (π‘Ž4 ). We can represent an 𝐴𝐷𝐺 with a directed graph where each node is labelled with the argument it represents. In figure 1 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 are shown. For readability, we also wrote the description of each argument. 3.1. Well-formed ADGs In order to be well-formed, the attack relation of an 𝐴𝐷𝐺 has to satisfy some consistency constraints. The first constraint is that there is no attack between two arguments whose supports contain the Table 1 The dataset π’Ÿ consists of 10 tuples and four features. The values of the target variable activity predicted by 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 are also reported. ID Sunny (s) Windy (w) Sore Knee (k) Activity (a) 𝑃𝐴𝐷𝐺1 𝑃𝐴𝐷𝐺2 1 Yes Yes Yes Fish Fish Fish 2 Yes Yes No Surf und und 3 Yes No No Fish Fish Fish 4 Yes No No Fish Fish Fish 5 No Yes Yes Fish unk unk 6 No No Yes Fish unk Fish 7 No Yes No Surf Surf Surf 8 No No No Fish unk Fish 9 No Yes Yes Surf unk unk 10 No Yes No Surf Surf Surf Figure 1: Two ADGs. same feature. This is because the two arguments are mutually exclusive, since only one of the two supports can be verified by an input tuple. The second constrain is that there is no attack between two arguments whose conclusions are the same and not empty. Third, if two arguments π‘Ž and 𝑏 have mutually exclusive conclusions and they use different features in their supports, there must be an attack between them: either π‘Ž attacks 𝑏, 𝑏 attacks π‘Ž or they symmetrically attack each other. This constrain is introduced to guarantee that the predictions of an 𝐴𝐷𝐺 are conflict-free, i.e., two arguments with mutually exclusive conclusions cannot be accepted at the same time. Formally the three constrains are as follows. We remind how 𝑦 denotes the target variable and 𝑓𝑖 the π‘–π‘–π‘‘β„Ž feature. Definition 3.2. Given an 𝐴𝐷𝐺 = βŸ¨π΄π‘Ÿ, β„›βŸ©, the 𝐴𝐷𝐺 is well-formed iff βˆ€π‘Ž1 , π‘Ž2 ∈ π΄π‘Ÿ with π‘Ž1 = βŸ¨πœ™1 , πœƒ1 ⟩, π‘Ž2 = βŸ¨πœ™2 , πœƒ2 ⟩ it holds that: 1. if πœ™1 = 𝑓𝑖 (𝑣1 ) and πœ™2 = 𝑓𝑖 (𝑣2 ) then (π‘Ž1 , π‘Ž2 ) βˆ‰ β„› ∧ (π‘Ž2 , π‘Ž1 ) βˆ‰ β„› 2. if πœƒ1 = πœƒ2 β‰  βˆ… then (π‘Ž1 , π‘Ž2 ) βˆ‰ β„› ∧ (π‘Ž2 , π‘Ž1 ) βˆ‰ β„› 3. if π‘Ž1 = βŸ¨π‘“1 (𝑣π‘₯ ), 𝑦(𝑣1 )⟩, π‘Ž2 = βŸ¨π‘“2 (𝑣𝑦 ), 𝑦(𝑣2 )⟩ and 𝑣1 β‰  𝑣2 ∧ 𝑓1 β‰  𝑓2 then (π‘Ž1 , π‘Ž2 ) ∈ β„› ∨ (π‘Ž2 , π‘Ž1 ) ∈ β„›. 3.2. Making Predictions using an ADG We consider a dataset π’Ÿ and an 𝐴𝐷𝐺 built from π’Ÿ. The target variable is 𝑦 and π‘Œ is the set of values that the target variable can take. An 𝐴𝐷𝐺 identifies a function 𝑃𝐴𝐷𝐺 ∢ π’Ÿ β†’ π‘Œ βˆͺ {und , unk } that, given an input tuple 𝑑 ∈ π’Ÿ, it returns either a value for the target variable or one of two additional outputs: undecided (und ) or unknown (unk ). The computation of 𝑃𝐴𝐷𝐺 is described in algorithm 1. Given an input instance 𝑑, we first consider the set 𝑉 of all the arguments whose support is verified by 𝑑. Then, we apply grounded semantics on the abstract argumentation framework restricted to the verified arguments. We remind how, since the 𝐴𝐷𝐺 is well-formed, all the accepted arguments have the same conclusion. We then consider the following situations: 1. if there is at least a predictive accepted argument, the value predicted by the 𝐴𝐷𝐺 is the value of the conclusion of any of the predictive accepted arguments. 2. if the extension is empty but there is at least one predictive argument that is undecided, then the predicted value of the 𝐴𝐷𝐺 is the status undecided : there are arguments that could be used to predict the target variable, but they are part of conflicts that cannot be resolved. 3. in all the other cases, when there are neither accepted nor undecided predictive arguments, the value returned by the 𝐴𝐷𝐺 is the status unknown . Algorithm 1: The function 𝑃𝐴𝐷𝐺 to make predictions given a tuple 𝑑 ∈ π’Ÿ. The dataset π’Ÿ is described by the features 𝑓1 , .., π‘“π‘š and target variable 𝑦 taking values in the set π‘Œ. 1 Inputs: 𝐴𝐷𝐺 = βŸ¨π΄π‘Ÿ, β„›βŸ©, a tuple 𝑑 ∈ π’Ÿ; 2 Output: a predicted value in the set π‘Œ βˆͺ {und,unk } 3 𝑉 ← verified(π΄π‘Ÿ, 𝑑) 4 β„’ ← Grounded(𝐴𝐷𝐺↓𝑉 ) 5 if in(β„’ ) β‰  βˆ… then 6 if βˆƒπ‘Ž ∈ in(β„’ ) ∣ π‘Ž = βŸ¨πœ™, 𝑦(𝑣)⟩ then 7 return v 8 else 9 if und(β„’ ) β‰  βˆ… ∧ βˆƒπ‘Ž ∈ und(β„’ ) ∣ π‘Ž = βŸ¨πœ™, 𝑦(𝑣)⟩ then return und ; 10 return unk Example 2. Let us consider again 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 and the dataset π’Ÿ1 in Table 1. Let us consider the first tuple 𝑑1 = βŸ¨π‘¦π‘’π‘ , 𝑦𝑒𝑠, 𝑦𝑒𝑠, 𝑓 π‘–π‘ β„ŽβŸ© and compute 𝑃𝐴𝐷𝐺1 (𝑑1 ). All the three arguments π‘Ž1 , π‘Ž2 , π‘Ž3 are verified by 𝑑1 , and therefore the argumentation graph to be considered is equivalent to the full one. The grounded semantics returns two accepted arguments {π‘Ž1 , π‘Ž3 } since π‘Ž2 is defeated by π‘Ž1 . Since π‘Ž3 = βŸ¨π‘ (𝑦𝑒𝑠), π‘Ž(𝑓 π‘–π‘ β„Ž)⟩ then 𝑃𝐴𝐷𝐺1 (𝑑1 ) = 𝑓 π‘–π‘ β„Ž. If we consider the tuple 𝑑2 = βŸ¨π‘¦π‘’π‘ , 𝑦𝑒𝑠, π‘›π‘œ, π‘ π‘’π‘Ÿπ‘“ ⟩ (the weather is sunny and windy and Paul has no sore knee), only π‘Ž2 and π‘Ž3 are verified by 𝑑2 , and the resulting argumentation framework is a couple of mutually attacking arguments labelled undec by the grounded semantics and therefore 𝑃𝐴𝐷𝐺1 (𝑑2 ) =und . Regarding tuple 𝑑6 = βŸ¨π‘›π‘œ, π‘›π‘œ, 𝑦𝑒𝑠, π‘ π‘’π‘Ÿπ‘“ ⟩ (weather neither sunny nor windy, Paul has a sore knee), only argument π‘Ž1 is verified and accepted by grounded semantics. However, this argument does not predict the target variable and, since there are no undecided arguments, nothing can be said about the target variable and 𝑃𝐴𝐷𝐺1 (𝑑6 ) =unk . By comparing the predictions of an 𝐴𝐷𝐺 with the actual values of the target variable, the usual performance metrics derived from the confusion matrix such as accuracy, precision, recall and f1-score can be computed. The confusion matrix is a joint-frequency table of the predicted versus the actual values. In a binary classification the confusion matrix is a 2 Γ— 2 table. However, since an 𝐴𝐷𝐺 can also return the two special values undecided and unknown , the confusion matrix will have two additional lines. In Figure 2 the confusion matrices for 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 computed using the dataset π’Ÿ1 are shown. Figure 2: Confusion matrices for 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 for the dataset π’Ÿ1 of Table 1. We note how both 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 have a perfect accuracy when a prediction is generated (this is measured by the decision accuracy equal to 1), however 𝐴𝐷𝐺1 is in the unknown status in 4 out of 10 cases and undecided in one case, and therefore its overall accuracy is 0.5. 𝐴𝐷𝐺2 is a better predictive model since it makes predictions in more cases than 𝐴𝐷𝐺1 without adding any prediction error. This is reflected by its higher f1-score. 4. A critical comparison with Decision Trees In this section we outline some interesting differences and similarities between 𝐴𝐷𝐺𝑠 and decision trees. We consider again a dataset π’Ÿ with a set of features {𝑓1 , .., 𝑓𝑛 , 𝑦} where 𝑦 is the target variable. A decision tree (𝐷𝑇) [3] is a tree-structure where each node is associated with a test on a feature (also referred as a variable in 𝐷𝑇 terminology), and each of the terminal nodes of the tree has a value of the target variable associated. Given an input tuple, a decision tree returns a predicted value for the target variable (the decision). In order to take a decision, the tree has to be visited starting from the root node. At each node the test associated with that node is evaluated and the result of the test determines which child node has to be visited next, until a terminal node is reached and a prediction is made. Figure 3 shows an example of two decision trees for the dataset π’Ÿ1 in Table 1. 𝑇1 has three internal nodes and four terminal nodes. 𝑇2 has two internal and three terminal nodes, and it does not use all the features (for instance windy weather is not used). Decision trees can be learnt from data and many efficient algorithms have been proposed, such as ID3, C4.5 and CART [6]. The fundamental idea is similar, a tree is learnt in a top-down, recursive, divide-and-conquer approach. Starting from the root node, the algorithm selects the best variable to split the tree in two or more sub-trees. The best variable is the one that splits the tree in branches such that the target variable becomes easier to predict in each branch. Different measures for the concept of easier to predict are used by decision tree algorithms, such as the Gini index, the information gain (defined as the change in the entropy of the target variable conditioned to knowing the value of the variable tested for the split) and gain ratio. We now discuss some key differences and similarities between 𝐴𝐷𝐺𝑠 and decision trees (𝐷𝑇). Figure 3: Two decision trees for the dataset π’Ÿ of Table 1. Entry point. A decision tree has a single entry point - the root of the tree - and a specific order in which nodes are tested to generate a prediction. On the contrary, an 𝐴𝐷𝐺 has no entry point and arguments can be evaluated in any order. The consequence is that, in case of partial information - that is the value of one or more features is missing or unknown - an 𝐴𝐷𝐺 has a higher chance to return a prediction. For instance, if the information about the feature sore knee is not available 𝑇1 and 𝑇2 cannot return a prediction, while 𝐴𝐷𝐺1 and 𝐴𝐷𝐺2 could. Adding new knowledge. Let us suppose that new knowledge needs to be added to the dataset in the form of a new feature/variable. An argumentation graph is easier to be expanded; in order to add a new argument, it is enough to define the new set of attacks involving the new node. The previous structure of the graph is unchanged. On the other side, adding a variable to a decision tree without changing its previous structure can be done only by adding new terminal nodes testing the new variable or use the variable as the new root. Retraining the tree would in general generates a very different tree where the new variable might appear in different part of the graph. It can be observed that also for an 𝐴𝐷𝐺 in order to find the optimal graph it might be necessary to modify the previous nodes and links. However, an 𝐴𝐷𝐺 has more options when it comes to accommodate a new variable without changing the existing graph due to a less constrained topology. Representation of knowledge. The topology of a 𝐷𝑇 and the one of an 𝐴𝐷𝐺 are certainly different. However, despite the differences, both of the two models represent knowledge as a set of logical rules in disjunctive normal form. A tree is an acyclic graph where every path is disjoint from the other. Every path represents a distinct rule to reach a conclusion, and once a path is taken it is not possible to move back to another path. Multiple paths are usually present to reach the same conclusion. Therefore a 𝐷𝑇 represents knowledge as a set of mutually exclusive rules and each split represents a specialization of the support of a rule. On the contrary an 𝐴𝐷𝐺 represents knowledge as a set of default rules, potentially defining an inconsistent (conflicting) set and subjects to exceptions. Every unidirectional attack provides an exception to a rule (=an argument) and therefore specializing further when the rule can be used. Depending on the context and the data, the conflict-based representation of the 𝐴𝐷𝐺 could generate more understandable and compact models than the mutually exclusive rules generated by the 𝐷𝑇 or vice versa. Explanations. Decision Trees are regarded as one of the most explainable data mining models. The tree structure, if not too complex, can be easily understood by a human. Each branch of the tree represents a specific input case, and each branch identifies a logical rule consisting of the conjunction of all the tests used to reach the terminal node from the root. For instance, sore knee and not windy weather is an explanation for Paul going fishing in the tree 𝑇2 . The length of an explanation is the length of the path from the root node to the terminal node. The average length of all the paths - often weighted by the number of tuples covered by each path - is a measure of the average length of the explanations generated by the decision tree. Regarding an 𝐴𝐷𝐺, the structure can be also understood by a human if the argumentation graph has a small size, but it could be argued that the free topology of an argumentation graph makes it harder to be understood. However, even when the graph is large, the application of the chosen semantics reduces the number of arguments that are necessary to explain a decision. In general, the arguments defeated by an accepted argument are irrelevant to define the decision and, if we are using grounded semantics and the set of accepted arguments is not empty, the undecided arguments are also irrelevant. Moreover, we do not need all the accepted arguments since the acceptability of some arguments might depend on the acceptability of other arguments. The explanations could therefore be compact also for a large graph. The analysis of explanations is an important issue that will be covered in future works. Variable Importance. In a decision tree, the importance of a variable is measured by the information gain, that is the reduction in the conditional entropy of the target variable. In other words, a variable is important if, by knowing its value, the target variable can be predicted with more certainty. In an 𝐴𝐷𝐺, various measures of the importance of a variable could be defined. For instance, the importance of an argument could be proportional to the number of input tuples for which the argument is necessary to obtain a prediction. Note how, since an argument refers to a specific feature-value pair, an 𝐴𝐷𝐺 has a more fine-grained notion of variable importance. An aggregated value for each feature can then be provided. Semantics. Another difference between 𝐷𝑇 and 𝐴𝐷𝐺 is the way their outputs are produced. An 𝐴𝐷𝐺 is evaluated using argumentation semantics to solve the conflicts encoded in the graph. Given an input tuple, an 𝐴𝐷𝐺 is evaluated by mean of an argumentation semantics that labels the arguments as accepted, rejected or undecided. The undecided label provides an 𝐴𝐷𝐺 with a built-in way of quantifying the uncertainty deriving from conflicting information. Moreover, semantics provide a way of representing different strategies to accept arguments. While the grounded semantics is the most skeptical semantics maximizing the set of undecided arguments, semantics such as the preferred are credulous semantics maximizing the set of accepted arguments. Semantics other than the grounded are often multi-status, meaning that they generate multiple sets of accepted arguments, all consistent with the semantics used. Multi-status semantics could be used to present multiple consistent scenarios to the decision makers, each of them with valid reasons to be accepted. Expressiveness. Regarding the expressiveness of the two models, they are equivalent. Given an 𝐴𝐷𝐺, it is possible to define a 𝐷𝑇 computing exactly the same function. Indeed, if we compute the output of an 𝐴𝐷𝐺 for all the input tuples, we can build an exhaustive 𝐷𝑇 where each path represents an input tuple and the value predicted by the terminal nodes is the same as the 𝐴𝐷𝐺 output. Given a 𝐷𝑇, an equivalent 𝐴𝐷𝐺 can be built by following these rules: 1. for each directed link 𝑙 from a node 𝑁 to a terminal node 𝑀 of the 𝐷𝑇, create a predictive argument βŸ¨π‘“π‘– (𝑣𝑛 ), π‘¦π‘š ⟩, where 𝑓𝑖 is the variable tested at node 𝑁 and 𝑣𝑛 is the value of 𝑓𝑖 identifying the link 𝑙 connecting 𝑁 to the terminal node 𝑀, and π‘¦π‘š is the value of the target variable predicted at node 𝑀. 2. for each link 𝑙 from a node 𝑁 to node 𝑀 where 𝑀 is non-terminal, create an argument βŸ¨π‘“π‘– (𝑣𝑛 ), βˆ…βŸ©, where 𝑓𝑖 is the variable tested at 𝑁 and 𝑣𝑛 is the value of the feature identifying the link 𝑙. 3. after having found the arguments of the 𝐴𝐷𝐺 using rules 1 and 2, for each argument π‘Ž associated to a link π‘™π‘Ž from node 𝑁 to 𝑀, add to the attack relation β„› the pair of nodes (π‘Ž, 𝑏), where 𝑏 is any argument associated to a link 𝑙𝑏 that is on a directed path 𝑝 connecting 𝑁 to a terminal node so that π‘™π‘Ž βˆ‰ 𝑝, unless the two arguments use the same feature or predict the same value for the target variable. The idea is that terminal nodes provide the arguments to predict the target variable, while the non-terminal nodes provide the non-predictive arguments. Regarding the attack relation, an argument generated by a link 𝑙 from node 𝑁 to 𝑀 attacks all the arguments generated by links that are mutually exclusive with 𝑙 and belonging to the sub-tree identified by the node 𝑁. Figure 4: The 𝐴𝐷𝐺 equivalent to 𝑇1 . As an example, let us consider the decision tree 𝑇1 . We build the corresponding 𝐴𝐷𝐺. According to rule 1, the 𝐴𝐷𝐺 has the following predictive arguments: π‘Ž1 = βŸ¨π‘€(𝑦𝑒𝑠), π‘ π‘’π‘Ÿπ‘“ ⟩, π‘Ž2 = βŸ¨π‘€(π‘›π‘œ), 𝑓 π‘–π‘ β„ŽβŸ©, π‘Ž3 = βŸ¨π‘ (𝑦𝑒𝑠), 𝑓 π‘–π‘ β„ŽβŸ©, π‘Ž4 = βŸ¨π‘ (π‘›π‘œ), π‘ π‘’π‘Ÿπ‘“ ⟩. According to rule 2, it has the following two non-predictive arguments: π‘Ž5 = βŸ¨π‘˜(π‘›π‘œ), βˆ…βŸ©, π‘Ž6 = βŸ¨π‘˜(𝑦𝑒𝑠), βˆ…βŸ©. Regarding the attack relation, according to rule 3 the only attacks are: π‘Ž5 attacks π‘Ž3 and π‘Ž4 , while π‘Ž6 attacks π‘Ž1 π‘Ž2 . Argument π‘Ž5 does not attack π‘Ž6 and vice versa since they use the same feature and they are therefore mutually exclusive. The resulting 𝐴𝐷𝐺 is shown in figure 4. We conclude by observing that, despite the two models are equivalent, this does not guarantee that the two algorithms will learn similar models from data. In fact, the way the relationship between input tuples and target variable is learnt and the way knowledge is represented imply that the two models will in general be quite different. 5. An algorithm to learn ADGs from data In this section we present a first simple algorithm to learn 𝐴𝐷𝐺 from data. The algorithm is a preliminary attempt, and it can be improved in multiple ways. However, it represents a starting point to study and test the potential of 𝐴𝐷𝐺𝑠. Algorithm 2: The function BuildADG to learn an 𝐴𝐷𝐺 from a dataset π’Ÿ. The dataset π’Ÿ is described by the features 𝑓1 , .., 𝑓𝑛 and target variable 𝑦. The target variable is binary and takes the values 𝑦 + , 𝑦 βˆ’ . 𝐹𝑖 represents the set of all possible values for the feature 𝑓𝑖 . The function eval returns the selected performance indicator for a given 𝐴𝐷𝐺 Inputs: Dataset π’Ÿ, a performance threshold Ξ” 1 Outputs: an 𝐴𝐷𝐺 2 3 Function BuildADG( π’Ÿ,Ξ”) : 4 perf ← 0 ; Arg ← βˆ… 5 for each distinct pair (𝑓𝑖 , 𝑣𝑛 ) do 6 Arg ← Arg βˆͺ βŸ¨π‘“π‘– (𝑣𝑁 ), 𝑦 + ⟩ βˆͺ βŸ¨π‘“π‘– (𝑣𝑁 ), 𝑦 βˆ’ ⟩ βˆͺ βŸ¨π‘“π‘– (𝑣𝑁 ), βˆ…βŸ© 7 𝐴𝐷𝐺𝐡𝑒𝑠𝑑 ← βŸ¨βˆ…, βˆ…βŸ© 8 while Arg β‰  βˆ… do 9 𝐴𝐷𝐺𝐡𝑒𝑠𝑑 ← π΄π·πΊπ‘œπ‘™π‘‘ 10 for π‘Ž ∈ Arg do 11 𝐴𝐷𝐺𝑛𝑒𝑀 ← AddArg (π‘Ž, 𝐴𝐷𝐺𝐡𝑒𝑠𝑑 ) 12 perf𝑛𝑒𝑀 ←eval (ADG𝑛𝑒𝑀 ) 13 if perf𝑛𝑒𝑀 > π‘π‘’π‘Ÿπ‘“ + Ξ” then 14 perf ← perf𝑛𝑒𝑀 15 𝐴𝐷𝐺𝑏𝑒𝑠𝑑 ← 𝐴𝐷𝐺𝑛𝑒𝑀 16 π‘Žπ‘π‘’π‘ π‘‘ ← π‘Ž 17 if 𝐴𝐷𝐺𝐡𝑒𝑠𝑑 == π΄π·πΊπ‘œπ‘™π‘‘ then 18 return 𝐴𝐷𝐺𝑏𝑒𝑠𝑑 19 π΄π‘Ÿπ‘”.remove(π‘Ž) 20 return 𝐴𝐷𝐺𝑏𝑒𝑠𝑑 21 22 Function AddARG( π‘Ž = βŸ¨π‘“π‘Ž (π‘£π‘Ž ), π‘¦π‘Ž ⟩, 𝐴𝐷𝐺 = βŸ¨π΄π‘Ÿπ‘”, β„›βŸ©) ) : 23 for 𝑏 = βŸ¨π‘“π‘ (𝑣𝑏 ), 𝑦𝑏 ⟩ ∈ Arg do 24 𝐴𝐷𝐺← ← βŸ¨π΄π‘Ÿπ‘” βˆͺ {π‘Ž, 𝑏}, β„› βˆͺ {(π‘Ž, 𝑏)}⟩ 25 𝐴𝐷𝐺→ ← βŸ¨π΄π‘Ÿπ‘” βˆͺ 𝑏, β„› βˆͺ {(𝑏, π‘Ž)}⟩ 26 𝐴𝐷𝐺↔ ← βŸ¨π΄π‘Ÿπ‘” βˆͺ 𝑏, β„› βˆͺ {(π‘Ž, 𝑏), (𝑏, π‘Ž)}⟩ 27 if π‘“π‘Ž β‰  𝑓𝑏 ∧ π‘¦π‘Ž β‰  π‘¦π‘ž β‰  βˆ… then 28 𝐴𝐷𝐺 ← 𝑋 ∈ {𝐴𝐷𝐺← , 𝐴𝐷𝐺→ , 𝐴𝐷𝐺↔ } where eval(𝑋) is maximal 29 else if π‘“π‘Ž β‰  𝑓𝑏 then 30 𝐴𝐷𝐺 ← 𝑋 ∈ {𝐴𝐷𝐺, 𝐴𝐷𝐺← , 𝐴𝐷𝐺→ , 𝐴𝐷𝐺↔ } where eval(𝑋) is maximal 31 return 𝐴𝐷𝐺 The algorithm, called BuildADG , builds an 𝐴𝐷𝐺 incrementally by adding an argument at the time and by expanding the attack relation in order to maximize a performance indicator, such as the overall accuracy of the 𝐴𝐷𝐺. BuildADG is shown in algorithm 2. We consider a dataset π’Ÿ with features {𝑓1 , .., 𝑓𝑛 , 𝑦} where 𝑦 is the target variable, each feature 𝑓𝑖 takes value from its corresponding set of values 𝐹𝑖 , while the target variable is binary and it takes the two values 𝑦 + or 𝑦 βˆ’ . The algorithm has two inputs: the dataset π’Ÿ and a tuning parameter Ξ”. The algorithm starts by identifying the arguments that will be used to build the 𝐴𝐷𝐺. For each pair feature-value βŸ¨π‘“π‘– , π‘£βŸ© three arguments are added, one predicting 𝑦 + , one 𝑦 βˆ’ and the neutral argument with empty conclusion βŸ¨π‘“π‘– (𝑣), βˆ…βŸ©. Then, for each argument in π΄π‘Ÿπ‘” the algorithm adds to the 𝐴𝐷𝐺 the argument π‘Ž that increased the performance of the resulting 𝐴𝐷𝐺 by the highest interval, but only if the addition of π‘Ž increased the performance by at least Ξ” compared to the previous 𝐴𝐷𝐺. Argument π‘Ž is removed from the list of arguments and the procedure is repeated until all the arguments have been tested or it is not possible to increase the performance of the 𝐴𝐷𝐺 by Ξ”. The function AddArg is responsible for adding a new argument π‘Ž to the 𝐴𝐷𝐺. For each argument 𝑏 already in the 𝐴𝐷𝐺, we need to decide how the new argument π‘Ž interacts with 𝑏. The resulting 𝐴𝐷𝐺 must be well-formed to avoid logical inconsistencies. If argument π‘Ž and 𝑏 are mutually exclusive there is no need to add an attack link between them. The same is for the situation in which π‘Ž and 𝑏 are predicting the same value for the target variable 𝑦. If both π‘Ž and 𝑏 are predictive but they predict different values for 𝑦 and they do not have mutually exclusive supports, an attack must be present to keep conflict-freeness. There are three possibilities: π‘Ž attacks 𝑏, 𝑏 attacks π‘Ž or they mutually attack each other. One of these three attacks must be present to keep the 𝐴𝐷𝐺 well-formed, and the one generating the best 𝐴𝐷𝐺 is kept. If either π‘Ž or 𝑏 are non-predictive argument, there is also the fourth possibility that there is no attack relation between the arguments (line 30). 6. Evaluation In this section, we provide a first evaluation of the BuildADG algorithm and we compare its results to a C4.5 Decision Tree. The evaluation compares the total accuracy of the two models using three benchmark datasets. An evaluation considering other performance indicators or the explainability of the models is left for future works. The datasets used for the evaluation are well-known, publicly available datasets [7] for binary classification where all the features are categorical. We used Ξ” = 0 as parameters, meaning that any improvement to the 𝐴𝐷𝐺 is retained. Table 2 shows the results for the three datasets. Performance was computed by randomly dividing each dataset into a training and testing set using an 80%-20% split ratio. We report the accuracy, balanced accuracy and the size of both the decision tree and the 𝐴𝐷𝐺, measured by the number of nodes and links of the graph. Table 3 shows the most important features considered by the decision tree and by the 𝐴𝐷𝐺. Information gain is used to rank features in the decision tree, while the proportion of times that an argument is necessary to make a prediction is used to rank features in the 𝐴𝐷𝐺. Table 2 Results obtained by the BuildADG and the C.45 algorithm. The column size reports the number of nodes and links. Decision Tree 𝐴𝐷𝐺 Dataset Records Features Acc CI 95% B. Acc size Acc B.Acc size Bank 600 11 0.82 [0.74,0.89] 0.82 28/27 0.75 0.75 6/7 US Census 32561 14 0.84 [0.83,0.85] 0.73 14/13 0.83 0.75 21/78 Car Price 1782 6 0.94 [0.91,0.96] 0.92 20/19 0.88 0.88 8/11 In both the bank and car price datasets, the accuracy of the decision tree was statistically higher than the 𝐴𝐷𝐺 accuracy. The gap was smaller for the car price dataset, where 𝐴𝐷𝐺 registered a balanced accuracy of 0.88. For the US census dataset we obtained positive results: the accuracy of the 𝐴𝐷𝐺 was in the 95% confidence interval of the decision tree accuracy and only marginally lower (83.4% versus 84.2%), and the balanced accuracy of the 𝐴𝐷𝐺 was higher than the one of the decision tree. Regarding the size of the models learnt, we used the number of nodes and links to measure it (note how the number of links in a tree is 𝑁 βˆ’ 1, where 𝑁 is the number of nodes). The 𝐴𝐷𝐺 graph was smaller for the car price and bank dataset, but more complex for the US census dataset. Table 3 shows the most important variables (ranked by importance) for decision tree and for 𝐴𝐷𝐺. There is a good degree of overlapping for all the three datasets, meaning that the two algorithms substantially agreed on the most important variables. Discussion. The results obtained are promising but they also expose some of the weakness of the preliminary algorithm proposed. Indeed, the algorithm is naive in several aspects. For instance, the addition of a new argument is accepted if the absolute number of correctly classified instances is increased by an interval Ξ” even if the accuracy (i.e. the percentage of correct predictions) could decrease, showing how the algorithm has a bias in favour of increasing the coverage of the instances rather than maximizing the accuracy. The algorithm also needs a pruning procedure, similar to the ones used in a decision tree. The result of the US census dataset shows that an 𝐴𝐷𝐺 can became very complex, harming the understandability of the outputs but also increasing the chance of model overfitting. A pruning strategy should reduce the complexity of the model, keeping its accuracy high. One idea could be to introduce a regularization parameter similar to the one present in the cost-based pruning mechanism of a decision tree, parameter that will penalize complex 𝐴𝐷𝐺𝑠 , forcing the algorithm to find a trade-off between complexity and performance. Finally, our preliminary algorithm is not computationally efficient, and it can be optimized in multiple ways, including an approximation using Monte Carlo simulation. Table 3 Most important variables for decision tree and 𝐴𝐷𝐺 Dataset Decision Tree 𝐴𝐷𝐺 ∩ Bank age, children, income children, income, married 2/3 relation, marital status, cap gain, cap gain, degree, ed num, US Census 5/7 degree, ed num, gender, occupation hours, relations, age, gender persons, safety, price, Car Price persons, safety, maintenance, lug boot 3/4 maintenance 7. Related Works Recently, there has been an increasing number of studies mixing argumentation theory and machine learning methods. However, very little studies directly face the problem of learning an argumenta- tion graph from data. The large majority of applications are in the field of argumentation mining, where arguments are extracted automatically from text using NLP techniques either in a data-driven fashion or in a mixed approach where an explicit structure of arguments has to be matched on the text [8],[9],[10]. In both cases, learning an argumentation graph from data is not part of the problem. In [11] the authors learnt the strength of a probabilistic argumentation frameworks using the Bayesian inference rule, while in [12] the authors proposed an algorithm to learn a probabilistic argumentation graph given a set of extensions. A different approach is the one where machine learning techniques have been adopted to predict the acceptability of arguments under a given semantics. In [13] this problem was tackled by modelling it as a multinomial classification task and by using convolutional neural networks to learn the acceptability of arguments. The work by Craandijk and Bex [14] had a similar aim. The authors proposed to use special neural networks, called argumentation graph neural network (AGNN), to learn a binary classification model predict- ing whether an argument is accepted or rejected. The difference from our approach is that in those approaches an argumentation framework already exists, and the problem is to learn the output of a semantics applied to the argumentation framework. On the contrary, our problem is to learn such argumentative framework from data. Moreover, our aim is to learn a Dung-like argumentation framework that could generate understandable justifications, while the aim of both [13] and [14] is not the intelligibility of the model, but rather to train a black-box deep neural network to compute a semantics accurately. In [15] the authors proposed to use genetic algorithms to learn a gradual argumentation graph, considered as an instance of a sparse multi-layer neural network. To obtain a well-interpretable model, the authors proposed to use a fitness function balancing sparseness and accuracy of the classifier. The paper presents experimental results on standard benchmark datasets from the UCI machine learning repository [7]. The results obtained showed an accuracy comparable to decision trees across the three datasets evaluated. In [16] the authors proposed a method to equip autonomous agents with the ability to argue and explain their decisions. Similar to our approach, arguments and attack relations between arguments are built from a set of training examples. The generation of arguments is also based on all the pairs feature-values found in the training dataset. Two types of attacks are defined, symmetrical rebuttal attacks and unidirectional undercutting attacks. Differently from our work, these attacks are explicitly identified by the structure of the arguments rather than being decided based on how well they fit the dataset given. 8. Conclusions In this paper, we presented a novel data-mining algorithm called argumentative decision graphs (𝐴𝐷𝐺). An 𝐴𝐷𝐺 is a special argumentation framework where arguments have a rule-based structure and an attack relation is defined among arguments. 𝐴𝐷𝐺𝑠 are learnt from data in a supervised way and they can be used for classification tasks. We have discussed the main differences and similarities with similar models such as decision trees, showing a translation between the two formalisms. Unlike decision trees, the output of an 𝐴𝐷𝐺 can be also an undecided status, where the graph does not have enough reasons to predict a value for the target variable. This is due to the use of argumentation semantics to identify the arguments of an 𝐴𝐷𝐺 that are accepted and consequently make a prediction on the target variable. We evaluated a preliminary greedy algorithm to learn an 𝐴𝐷𝐺 from data using benchmark datasets and we compared our results with the C4.5 decision tree algorithm. Our results showed how 𝐴𝐷𝐺 had an accuracy lower or comparable to decision trees, a generally less complex model and a good agreement on the importance of the variables between the two models. The algorithm presented is naive in some of its assumptions, and it can be improved in many aspects, including how arguments interact, the way the impact of an argument is evaluated and how to reduce its computational time. Overall, we believe to have provided enough evidence to justify further research into 𝐴𝐷𝐺𝑠. References [1] P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games, Artificial intelligence 77 (1995) 321–357. [2] M. W. Caminada, D. M. Gabbay, A logical account of formal argumentation, Studia Logica 93 (2009) 109–145. [3] J. R. Quinlan, Induction of decision trees, Machine learning 1 (1986) 81–106. [4] F. V. Jensen, T. D. Nielsen, Bayesian networks and decision graphs, volume 2, Springer, 2007. [5] J. L. Pollock, Cognitive carpentry, a blueprint for how to build a person, Mit Press, 1995. [6] B. Hssina, A. Merbouha, H. Ezzikouri, M. Erritali, A comparative study of decision tree id3 and c4. 5, International Journal of Advanced Computer Science and Applications 4 (2014) 13–19. [7] D. Dua, C. Graff, UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml. [8] O. Cocarascu, A. Stylianou, K. Čyras, F. Toni, Data-empowered argumentation for dialectically explainable predictions, in: ECAI 2020, IOS Press, 2020, pp. 2449–2456. [9] O. Cocarascu, F. Toni, Detecting deceptive reviews using argumentation, in: Proceedings of the 1st International Workshop on AI for Privacy and Security, 2016, pp. 1–8. [10] M. Lippi, P. Torroni, Argument mining: A machine learning perspective, in: International Workshop on Theory and Applications of Formal Argumentation, Springer, 2015, pp. 163–176. [11] K. Noor, A. Hunter, A bayesian probabilistic argumentation framework for learning from online reviews, in: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, 2020, pp. 742–747. [12] R. Riveret, G. Governatori, On learning attacks in probabilistic abstract argumentation, in: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016, pp. 653–661. [13] I. Kuhlmann, M. Thimm, Using graph convolutional networks for approximate reasoning with abstract argumentation frameworks: A feasibility study, in: International Conference on Scalable Uncertainty Management, Springer, 2019, pp. 24–37. [14] D. Craandijk, F. Bex, Deep learning for abstract argumentation semantics, arXiv preprint arXiv:2007.07629 (2020). [15] J. Spieler, N. Potyka, S. Staab, Learning gradual argumentation frameworks using genetic algorithms, arXiv preprint arXiv:2106.13585 (2021). [16] L. Amgoud, M. Serrurier, Agents that argue and explain classifications, Autonomous Agents and Multi-Agent Systems 16 (2008) 187–209.