22 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


Approximate Reasoning with ASPIC+ by
         Argument Sampling
                      Matthias THIMM, Tjitze RIENSTRA
                     University of Koblenz-Landau, Germany

          Abstract. We present approximation algorithms for reasoning in struc-
          tured argumentation approaches such as ASPIC+. While classical ap-
          proaches consist in constructing all arguments from a knowledge base
          and determine acceptable arguments from the resulting argument graph,
          we sample only a small number of arguments and thus only consider a
          subgraph of the complete argument graph. We develop two approaches
          for sampling arguments, one which samples arguments independently
          and one which samples arguments dependently on a query. We present
          results of an extensive experimental evaluation that showed that run-
          time can be drastically reduced while the accuracy is only marginally
          affected.

          Keywords. ASPIC+, approximate reasoning, algorithms


1. Introduction

Structured argumentation approaches consider arguments to be collections of for-
mulas and/or rules which entail some conclusion. The most prominent structured
approaches are ASPIC+ [11], ABA [14], DeLP [10], and deductive argumentation
[3]. These approaches consider a knowledge base of formulas and/or rules as a
starting point. Queries are answered, e. g. in the case of ASPIC+, by determin-
ing all arguments constructible from the knowledge base, identifying attacks be-
tween these arguments using e. g. contradictions between conclusions of different
arguments, and resolving the conflicts by representing the constructed arguments
and attacks as an abstract argumentation framework [7] and relying on reason-
ing methods for this abstract case. Computationally, reasoning with structured
argumentation approaches can be quite demanding as both checking whether a
set of formulas and/or rules is an argument can be challenging, and the number
of arguments that can be constructed on the basis of a knowledge base may be
super-polynomial (and even infinite in some approaches). Some formal analyses
on this, in particular regarding the approach of ABA, can be found in [6,8]. Ex-
isting solvers for ASPIC+ that implement complete reasoning procedures include
TOAST [12] and EPR1 , see also [5].
     In this paper, we are interested in approximate methods to reasoning with
structured argumentation approaches. This means we are investigating methods
that are not necessarily sound and complete in general, but trade correctness for

  Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
  1 http://www.wietskevisser.nl/research/epr/
   Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 23


performance. Our approach is motivated by use cases where the number of rules
is so high—such as rule mining scenarios as mentioned in [13] which may yield
thousands of rules and hundred thousands of arguments— that one has to rely on
approximate methods. Of course, one of the aims of ASPIC+ (and any compu-
tational model of argumentation) is to provide explainable and correct decision
support and we trade this for fast answers. In high-risk domains, correctness is,
of course, of higher priority. However, while we give up correctness of reasoning,
our approach is still able to provide explanations for reasoning results (i. e. some
of the constructed arguments) that can help the decision maker. In our view, this
is better than not being able to provide answers at all due to complexity issues
of correct reasoning.
     For our investigation, we will focus on the propositional logic instantiation
under the grounded semantics of the general framework ASPIC+ [11], but our
algorithms are general enough to be easily applicable to other instantiations, se-
mantics, and approaches as well. We develop two parametrised algorithms that
solve the general problem of checking whether a certain proposition is acceptable
wrt. a given knowledge base. We will describe our algorithms in more detail in Sec-
tion 4. In order to show the practical applicability of our approaches, we perform
an experimental evaluation that compares our approaches with baseline methods.
This evaluation focuses on the two aspects of accuracy and runtime performance.
Our results show that approximation methods in general and our sampling-based
algorithms in particular are viable alternatives for reasoning with complex argu-
mentation scenarios, as runtime can be significantly decreased without losing too
much accuracy.
     To summarise, the contributions of this paper are as follows:
   1. We present two novel approximation algorithms for ASPIC+ that rely on
      sampling arguments (Section 4).
   2. We conduct an extensive experimental evaluation of our approaches that
      show their viability (Section 5).
Moreover, Section 2 recalls necessary preliminaries while Section 3 describes base-
line approaches for reasoning with ASPIC+. Section 6 concludes with a summary.


2. Preliminaries

We recall necessary preliminaries regarding abstract argumentation [7] and
ASPIC+ [11].
Definition 1 An argumentation framework is a pair (A, ) where A is a set whose
elements are called arguments and ⊆ A × A is a binary relation called attack.
We also write a     b instead of (a, b) ∈ .
We say that a set E ⊆ A is conflict-free iff there are no a, b ∈ E s. t. a       b.
Moreover, a set E defends an argument a ∈ A iff for all c ∈ A with c     a there is
b ∈ E with b     c. Building on these notions, the basic extension-based semantics
[7] can be defined as follows.
Definition 2 Let (A, ) be an argumentation framework. A set E ⊆ A is said to
be admissible iff E is conflict-free and for all a ∈ E, E defends a; complete iff
24 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


it is admissible and for all a ∈ A s. t. E defends a, a ∈ E; preferred iff E is
complete and maximal wrt. set inclusion; grounded iff E is complete and minimal
wrt. set inclusion; stable iff E is conflict-free and for all b ∈ A \ E there is a ∈ E
with a     b.

Given an argument a ∈ A, a is said to be (credulously) accepted under semantics
σ (either complete, grounded, stable, or preferred ) iff a is a member of at least one
σ-extension of (A, ).
    Structured argumentation approaches can be used to give structure to ar-
guments. In the following, we present a minimal variant of the propositional in-
stantiation of ASPIC+ [11]. Note that ASPIC+ is a general framework that can
be instantiated using a variety of different logics and is also able to adhere for
the inclusion of orderings between rules, but we only stick to a very simple ver-
sion in order to show that our algorithms are independent wrt. specific language
features.2
    Let L be a finite set of propositions and let L̂ be the set of literals of L, i. e.,
L̂ = {a, ¬a | a ∈ L}. For a ∈ L define a = ¬a and ¬a = a. ASPIC+ differentiates
rules into strict rules (rules that are always supposed to hold) and defeasible rules
(rules that “usually” hold).

Definition 3 A knowledge base K is a pair K = (Ks , Kd ) where

    • Ks is a set of strict rules φ1 , . . . , φn → φ with φ1 , . . . , φn , φ ∈ L̂.
    • Kd is a set of defeasible rules φ1 , . . . , φn ⇒ φ with φ1 , . . . , φn , φ ∈ L̂.

A strict rule φ1 , . . . , φn → φ with n = 0 is written as → φ and is also called an
axiom. A defeasible rule φ1 , . . . , φn ⇒ φ with n = 0 is written as ⇒ φ and is also
called an assumption. For practical reasons we often identify K = (Ks , Kd ) with
Ks ∪ Kd , e. g., expressions such as “r ∈ K” are to be read as “r ∈ Ks or r ∈ Kd ”.
     Arguments can now be constructed by chaining rules. Following [11], for each
argument A we denote by Prem(A) the set of axioms and assumptions used to
construct A, Conc(A) is the conclusion of A, Sub(A) is the set of sub-arguments
of A, DefRules(A) the set of defeasible rules in A, and TopRule(A) is the last rule
used in A.

Definition 4 The set of arguments AK generated by a knowledge base K = (Ks , Kd )
is inductively defined as follows:

    • If ⇒ φ ∈ K then (⇒ φ) is an argument with Prem(⇒ φ) = {⇒ φ}, Conc(⇒
      φ) = φ, Sub(⇒ φ) = {⇒ φ}, DefRules(⇒ φ) = {⇒ φ}, TopRule(⇒ φ) =
      (⇒ φ).
    • If → φ ∈ K then (→ φ) is an argument with Prem(→ φ) = {→ φ}, Conc(→
      φ) = φ, Sub(→ φ) = {→ φ}, DefRules(→ φ) = ∅, TopRule(→ φ) = (→ φ).
    • If φ1 , . . . , φn ⇒ ψ ∈ K and A1 , . . . , An are arguments such that φ1 =
      Conc(A1 ), . . . , φn = Conc(An ), then A = (A1 , . . . , An ⇒ ψ) is an argu-
      ment such that: Prem(A) = Prem(A1 ) ∪ . . . ∪ Prem(An ), Conc(A) = ψ,
      Sub(A) = Sub(A1 ) ∪ . . . ∪ Sub(An ) ∪ {A}, DefRules(A) = DefRules(A1 ) ∪
      . . . ∪ DefRules(An ) ∪ {φ1 , . . . , φ1 ⇒ ψ}, TopRule(A) = φ1 , . . . , φn ⇒ ψ.
  2 Note also that we depart from ASPIC+ terminology at times.
   Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 25


              A4         A6         A3         A5        A2         A1


              Figure 1. The argumentation framework AFK from Example 1


   • If φ1 , . . . , φn → ψ ∈ K and A1 , . . . , An are arguments such that φ1 =
     Conc(A1 ), . . . , φn = Conc(An ), then A = (A1 , . . . , An → ψ) is an argu-
     ment such that: Prem(A) = Prem(A1 ) ∪ . . . ∪ Prem(An ), Conc(A) = ψ,
     Sub(A) = Sub(A1 ) ∪ . . . ∪ Sub(An ) ∪ {A}, DefRules(A) = DefRules(A1 ) ∪
     . . . ∪ DefRules(An ), TopRule(A) = φ1 , . . . , φn → ψ.

An argument A is called strict if DefRules(A) = ∅, otherwise it is called defeasible.
In our simplified framework, we only consider rebuts [11] as the attack relation
between arguments.

Definition 5 Let A and B be two arguments. We say that A attacks B, denoted
as A     B, if Conc(A) = a for some B 0 ∈ Sub(A) of the form B100 , . . . , Bn00 ⇒ a.

Using the previous two definitions an abstract argumentation framework can be
derived from a knowledge base K as follows.

Definition 6 The abstract argumentation framework AFK corresponding to a
knowledge base K is an argumentation framework AFK = (AK , ) where AK is
the set of arguments generated by K as defined by Definition 4 and is the attack
relation on AK as defined by Definition 5.

We conclude this section with a small example illustrating our simplified ASPIC+
framework.

Example 1 Let K = (Ks , Kd ) be a knowledge base with

         Ks = {→ a, → c, → ¬d, d → ¬b}                Kd = {a ⇒ b, c ⇒ d}

The corresponding abstract argumentation framework is AFK = (AK ,              ) with
AK = {A1 , . . . A6 },

         A1 = (→ a)               A2 = (→ c)               A3 = (→ ¬d)
         A4 = (A1 ⇒ b)            A5 = (A2 ⇒ d)            A6 = (A5 → ¬b)

and A3       A5 , A3       A6 and A6     A4 . AFK is depicted in Figure 1. Note that
{A1 , A2 , A3 , A4 } is the single grounded, complete, stable, and preferred extension
of AFK . It follows that A1 , A2 , A3 , A4 are accepted under these semantics.


3. Reasoning in ASPIC+

As already hinted by Definition 6, reasoning in ASPIC+ is reduced to reasoning
in abstract argumentation frameworks. More precisely, given an abstract argu-
mentation semantics σ (either complete, grounded, stable, or preferred ) as a meta
26 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


parameter, we say that a literal l ∈ L is acceptable wrt. a knowledge base K if
there is an argument A of K with Conc(A) = l and A is acceptable under σ in
AFK .3 In Example 1, literals a, c, ¬d, b are therefore acceptable for all considered
semantics σ.
     As the most obvious baseline for the evaluation of our approximate inference
algorithms, we consider a direct implementation of Definition 6 that constructs
all possible arguments of a knowledge base. More concretely, the Naive algorithm
first constructs all arguments from axioms and assumptions. Then it iteratively
continues by considering each (strict and defeasible) rule and each combination
of already constructed arguments for each literal in the body of that rule. It ter-
minates once no further argument can be constructed. However, it is clear that
not all arguments from a knowledge base have to be constructed in order to de-
termine the acceptability of a certain literal. Works such as [9,2,4] have shown
that many arguments need not to be considered because they are equivalent to
other arguments or they are simply irrelevant for a certain query. Therefore, as
a second baseline we consider a very simple notion of relevance to prune the set
of constructed arguments. More concretely, the module-based algorithm (MB) re-
stricts its attention to the subset of rules that is closed under the syntactic neigh-
bourhood relationship with the query literal, where a literal or rule is said to be
a syntactic neighbour of another rule iff the two share at least one vocabulary
element. This approach reduces the set of generated arguments to those that are
relevant to the query literal. This assumes that sets of arguments with distinct
vocabulary elements (such that no attack is present between these sets) are ir-
relevant to each other as far as their acceptability is concerned. This property,
called crash-resistence, is valid under all but the stable semantics [15].


4. Approximate reasoning via sampling

In the following, we present two algorithms that sample arguments in order to
obtain a sub-graph of the full corresponding abstract argumentation framework
of the knowledge base. We will therefore only modify the part of constructing
the argumentation framework, but leave the actual reasoning on the abstract
framework untouched. For both algorithms we assume the existence of a function
attRel(args) that, given a set of arguments args constructs the attack relation
according to Definition 5.

4.1. Independent Sampling

Our first approach consists in simply sampling some number of arguments inde-
pendently from all constructible arguments. As the total number of constructible
arguments is unknown beforehand (and also hard to determine), we cannot tell the
algorithm to sample a certain percentage of the constructible arguments. Thus,
the algorithm is parametrised by two values maxArgs and maxDuplicates. The
first value maxArgs restricts the number of distinct arguments to be sampled. As
the same argument may be sampled twice (and certainly will if there are actually
less constructible arguments than maxArgs) the parameter maxDuplicates sets the

  3 Note that other variants of acceptability can be defined.
   Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 27


Algorithm 1 The randI algorithm
 1: function genRandI (K, maxArgs, maxDuplicates)
 2:   args ← ∅
 3:   for 0 ≤ i < maxArgs do
 4:      arg ← sampleArgument(K)
 5:      if arg 6∈ args then args ← args ∪ {arg} else duplicates ← duplicates + 1
 6:      if duplicates ≥ maxDuplicates then break
 7:   end for
 8:   return (args, attRel(args))
 9: end function
10:
11: function sampleArgument(K)
12:   arg ← fail
13:   while arg = fail do
14:      ψ ← getRandomLiteral(K)
15:      arg ← constructArgument(K, ψ, {ψ})
16:   end while
17:   return arg
18: end function
19:
20: function constructArgument(K, ψ, acc)
21:   candidateRules ← {r ∈ K | Conc(r) = ψ ∧ Premises(r) ∩ acc = ∅}
22:   if candidateRules = ∅ then return fail
23:   r ← randomElementOf(candidateRules)
24:   subArgSet ← ∅
25:   for φ ∈ Premises(r) do
26:      subArg ← constructArgument(K, φ, acc ∪ {φ})
27:      if subArg = fail then return fail
28:      subArgSet ← subArgSet ∪ {subArg}
29:   end for
30:   return newArgument(r, subArgSet)
31: end function


upper bound of how many duplicates can be encountered before the algorithm is
terminated, even if maxArgs many arguments have not been sampled.
     The algorithm randI is shown in Listing 1 (we only show the part re-
sponsible for constructing the abstract argumentation framework). The function
genRandI takes as input a knowledge base K and the two parameters maxArgs
and maxDuplicates. The sampling of a single argument is implemented by the
sampleArgument function. It first picks a random literal and then attempts
to construct an argument for that literal. The construction, implemented by the
constructArgument function, first picks a random rule with the required con-
clusion, but only if that rule does not contain a premise that already occurs in the
argument that is being constructed (this is kept track of by the set acc). It then
recursively construct sub-arguments for that rule until the argument is complete.
If the construction of an argument fails (this happens if there is no rule for some
literal) then constructArgument returns “fail” and the sampling is restarted.
     In general, if genRandI terminates, it generates an argumentation frame-
work restricted to a random subset of arguments of size at least 1 (in the rare
event that sampleArgument returns the same argument every time) and at
most maxArgs (in the event that the number of duplicate arguments never exceeds
28 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


Algorithm 2 The randD algorithm
 1: function genRandD (K, φ, p)
 2:   args ← argsWithConc(K, φ)
 3:   argsDone ← ∅
 4:   concsDone ← {φ}
 5:   repeat ← true
 6:   while repeat = true do
 7:     newArgs ← ∅
 8:     for A ∈ args \ argsDone do
 9:        argsDone ← argsDone ∪ {A}
10:        if flip(p) then
11:           for ψ ∈ attackingConcs(A) \ concsDone do
12:              concsDone ← concsDone ∪ {ψ}
13:              newArgs ← newArgs ∪ argsWithConc(K, ψ)
14:           end for
15:        end if
16:     end for
17:     if newArgs = ∅ then repeat ← false else args ← args ∪ newArgs
18:   end while
19:   return (args, attRel(args))
20: end function


maxDuplicates). However, note that this algorithm may never terminate (theo-
retically) if at line 14 a literal is repeatedly chosen, for which no argument exists.
However, this behaviour was not observed in our experiments (see Section 5).

4.2. Directional Sampling

The directional sampling (randD ) algorithm is shown in Listing 2. The follow-
ing functions are assumed to be defined: argsWithConc(K, φ) returns all argu-
ments A constructible from K with conclusion φ; attackingConcs(A) returns
all formulas φ such that, if B is an argument with Conc(B) = φ then B attacks
A; and FLIP(p) returns true (resp. false) with probability p (resp. 1 − p).
     The algorithm takes as input a knowledge base K, formula φ, and probability
p and is based on initially constructing all arguments for φ, then the attackers of
this set, the attackers of those attackers, and so on. However, for each argument
A that is constructed, we first call FLIP(p), and we generate the attackers of A
only if the result is true (line 10). To avoid generating an argument more than
once, we use the set argsDone to keep track of arguments whose attackers are
generated, and concsDone for conclusions for which arguments are generated.
     The result of running randD for a query formula φ is an argumentation
framework which contains all arguments for φ as well as a random selection of
the (indirect) attackers of those arguments. However, an argument A is included
only if all arguments on the directed path from A to some argument for φ are also
included. This restricts the set of generated arguments to those that are relevant
to the query formula, under the assumption that the semantics in use satisfies
the directionality property [15]. Directionality is satisfied under all semantics we
consider except stable semantics.
     In general, genRandD generates an argumentation framework consisting
only of the arguments for the given conclusion, plus a subset of its attackers and
their attackers etc.
   Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 29


5. Experimental Evaluation

In order to test the feasibility of our algorithms we performed an experimental
evaluation. The goal of this evaluation is to show that our approaches significantly
reduce the runtime of evaluating queries while still being accurate in their answers.
    All algorithms presented in this paper have been implemented using Java in
the TweetyProject4 , including the complete algorithms described in Section 3.
The source code for all implementations can be found here5 .

5.1. Benchmark data

Due to the lack of existing real-world ASPIC+ knowledge bases, we rely on ran-
domly generated knowledge bases and knowledge bases extracted from sources
with originally different purposes. More precisely, in our evaluation we used the
following three sets of benchmarks:

Random knowledge bases (Random) We implemented a simple algorithm6 that
    randomly generates (propositional logic) ASPIC+ knowledge bases as fol-
    lows. The algorithm takes as input the number of propositions (n), the
    number of formulas (m), the maximum number of literals in bodies of rules
    (l) and the percentage of strict rules (s), and generates m rules, each with
    at most l body literals (uniformly distributed, zero body literals are also
    possible, giving rise to axioms and assumptions) and uniformly distributed
    head literal. For each knowledge base generated in that fashion, we selected
    a single literal at random to be the fixed query. Using this generator, we cre-
    ated a set of 5400 random instances with n ∈ {5, 10, 15}, m ∈ {10, 20, 30},
    l ∈ {2, 3}, s ∈ {0.2, 0.4, 0.6}.
Knowledge bases learnt from data (Animals) The dataset “Animals with at-
    tributes”7 describes 50 animals, e. g. ox, mouse, dolphin, using 85 binary
    attributes such as “swims”, “black”, and “arctic”. Following the approach
    outlined in [13], we use the Apriori algorithm [1] to mine association rules
    from this dataset for a given minimal confidence value c and minimal sup-
    port value s. Association rules with confidence value 1 are interpreted as
    strict rules, all other rules are interpreted as defeasible rules. Finally, we
    selected one animal at random and added all but one of its attributes as
    an axiom, leaving the remaining attribute as the fixed query for the gener-
    ated knowledge base. We set c ∈ {0.6, 0.65, 0.70, 0.75, 0.8, 0.85, 0.90, 0.95},
    s ∈ {0.6, 0.65, 0.70, 0.75, 0.8, 0.85, 0.90, 0.95}, and allowed maximal 4 literals
    per rule. The final dataset contained 1920 instances.
Knowledge bases extracted from MaxSAT instances (MaxSAT) For this dataset,
    we used the set of incomplete weighted SAT instances for the MaxSAT
    Evaluation 20178 , which are propositional clauses divided into a set of soft
    clauses and a set of hard clauses. First, we removed the 7 largest instances
    from this set, as they brought about severe memory issues for all our solvers.
  4 http://tweetyproject.org
  5 http://tweetyproject.org/r/?r=aspic_reasoner
  6 http://tweetyproject.org/r/?r=aspic_random
  7 http://attributes.kyb.tuebingen.mpg.de
  8 http://mse17.cs.helsinki.fi/benchmarks.html
30 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


     For each remaining instance, hard clauses with only one literal are inter-
     preted as axioms. For larger hard clauses, one literal is uniformly selected to
     be the head of a strict rule while the complements of the remaining literals
     are put into the body of the rule. Soft clauses are similarly transformed to
     assumptions and defeasible rules. Note that this transformation does not
     preserve the semantics of the original instances, but still yields knowledge
     bases with a structure similar in spirit to the original instances. For each
     original instance, we generated 10 different variants using this scheme and
     for each of these variants, we selected one literal at random to be the fixed
     query. The final dataset contained 1490 instances.

All datasets used in our experiments, i. e., knowledge bases with associated
queries, are available online9 .

5.2. Experiment details

Each instance of the three datasets was used to query the following five solvers:

Naive The naive complete algorithm described in Section 3.
MB The module-based complete algorithm described in Section 3.
DIR The complete version of the algorithm randD where the sampling proba-
     bility is set to 1, see Section 4.
ISAMP The implementation of algorithm randI ; as this algorithm depends on
     two parameters (maximum number of sampled arguments and maximum
     number of duplicates), we performed a small pre-test in order to find a
     suitable parameter combination; the results presented in the next section
     refer to setting the first parameter to 500 and the second one to 50.
DSAMP The implementation of algorithm randD ; as this algorithm depends on
     a parameter (sampling probability), we performed a small pre-test in order
     to find a suitable parameter; the results presented in the next section refer
     to setting this parameter to 0.9.

For determining acceptable arguments of the individual corresponding argumen-
tation frameworks we used grounded semantics for all solvers (for reasons of sim-
plicity of computation).
     For each instance/solver combination we set a timeout of 5 minutes and
recorded the runtime. In order to measure the accuracy of the approximate ap-
proaches ISAMP and DSAMP, we considered the subset of instances that could
be solved by at least one of the complete approaches Naive, MB, and DIR. For
each of those instances we checked whether the approximate approaches returned
the same answer and we took the ratio of the number of correct answers to the
size of this subset as a measure of accuracy.
     We did not include TOAST [12], an already existing (complete) solver for
ASPIC+, in our evaluation. As TOAST is only available through a web form and
a web service (and no source code is available), runtime performance cannot be
compared in a fair manner. Furthermore, we also did not include the complete
solver EPR10 as its approach to construct arguments is equivalent to DIR.
 9 http://mthimm.de/misc/ds_aspic2020.zip
 10 http://www.wietskevisser.nl/research/epr/
    Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 31


                       Random                         Animals                      MaxSAT
                S         RT       Acc           S      RT        Acc        S      RT       Acc

 Naive        99.9%      617.5    100.0%     98.1%     3355.1    100.0%    99.9%   505.0    100.0%
 MB           99.8%      571.5    100.0%     98.9%     1881.1    100.0%    99.6%   316.8    100.0%
 DIR          99.9%      476.4    100.0%     99.3%     2514.4    100.0%    99.6%   137.4    100.0%

 ISAMP       100.0%      336.8    99.1%      99.5%     455.9     99.0%     99.9%   311.1    99.9%
  DSAMP       99.9%      482.6  99.1%      99.4%    1625.1    99.4%     99.9%    122.6    100.0%
Table 1. Results of the experimental evaluation on the three datasets Random (5400 instances),
Animals (1920 instances), and MaxSAT (1490 instances); column “S” indicates percentage of
solved instances within the time limit of 5 minutes, column “RT” gives the average runtime in
milliseconds, and “Acc” gives the percentage of correctly solved instances; the best results per
column are highlighted in bold face.


                                                     Random Animals MaxSAT

                    Avg. percentage strict rules     24.4%      14.9%     88.1%
                    Avg. percentage def. rules       36.9%      52.9%     10.2%
                    Avg. percentage axioms           15.8%      29.1%     0.2%
                    Avg. percentage assumptions      22.9%      3.1%      1.5%

Table 2. Statistics of the three datasets Random, Animals, and MaxSAT ; cells indicate the
average percentage of strict rules, defeasible rules, axioms, and assumptions within each dataset.


    All experiments were performed on a single virtual machine running Ubuntu
with eight Intel Xeon 2GHz CPUs and 16GB RAM.

5.3. Results

Table 1 summarises the results of our experiments. For each of the three datasets
Random, Animals, and MaxSAT and each solver mentioned above, we report on
the percentage of solved instances within the time limit of 5 minutes (column
“S”), the average runtime of solved instances in milliseconds (column “RT”), and
the accuracy as a percentage of the number of correctly solved instances wrt. all
solved instances (column “Acc”). In each column we emphasised the best solver
with bold face. As a first remark, note that the complete solvers Naive, MB,
and DIR have, of course, perfect accuracy. Second, all solvers are quite similar
in terms of the percentage of solved instances within the time limit (98%–100%).
So let us look a bit closer at the important measures of runtime and accuracy of
our sampling-based approaches. In terms of runtime, our new solvers significantly
outperfom the baseline approaches in all cases. For the Random dataset, the
solver ISAMP has an average runtime of 336.8 milliseconds, compared to 476.4
milliseconds of the best performing complete solver (RT). The difference is most
significant for the Animals dataset where ISAMP has an average 455.9 millisec-
onds compared to 1881.1 milliseconds (MB), therefore outperforming the com-
plete approaches by a factor over 3. In the MaxSAT dataset, the solver DSAMP
outperforms the best direct one (DIR), though only slightly. However, recall that
32 Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling


we used grounded semantics to determine acceptable arguments in the corre-
sponding abstract argumentation frameworks, which is a problem of polynomial
complexity in the size of the argumentation framework. If we had used a more
complex semantics, differences in runtime would have been even more significant.
Evaluating this aspect is envisioned for future work.
     Interestingly, the gain in performance comes with almost no cost wrt. accu-
racy. In all datasets, all approximate approaches show an accuracy of at least
99%. This shows that our approaches are indeed competitive for the task at hand.
     In order to explain the difference between the performances of solver ISAMP
and DSAMP wrt. to the datasets Random and Animals on the one hand (where
ISAMP performs better than DSAMP) and MaxSAT on the other (where DSAMP
outperforms ISAMP and the differences between the complete approaches and the
approximate approaches differ only slightly), we took a closer look at the struc-
ture of the knowledge bases in the different datasets. For each instance we com-
puted the distribution of strict rules, defeasible rules, axioms, and assumptions
and determined the average of those values for each dataset. Table 2 reports those
numbers. Comparing these statistics, we can see that instances in the MaxSAT
dataset have a significantly larger proportion of strict rules (and therefore fewer
defeasible rules). Consequently, many arguments built from this dataset will have
few or no attackers as arguments can only be attacked on defeasible rules in our
simplified framework, cf. Definition 5. By sampling dependently on the existence
of counterarguments, DSAMP likely includes most arguments relevant for mak-
ing correct decisions (as also indicated by the accuracy of 100% on this dataset).
This also explains the similarity of the performances of DSAMP and DIR, the
latter being a version of DSAMP with sampling probability 1. On the other hand,
ISAMP does not care about the connectedness of arguments and samples argu-
ments independently. This means that many arguments irrelevant to the query
are constructed, this resulting in a larger (relative) runtime.
     To summarise the results, by sampling arguments we get significant improve-
ments of the average runtime with only minimal loss of accuracy. Moreover, the
performance gain is larger when an instance contains many defeasible rules and
fewer strict rules.


6. Summary and Conclusion


In this paper we presented the first approximation algorithms for structured ar-
gumentation approaches such as ASPIC+. Instead of constructing all possible ar-
guments from a knowledge base, our algorithms only sample a limited number of
arguments. We developed two variants of this general idea. The first variant (al-
gorithm randI with implementation ISAMP) samples arguments independently
from the knowledge base while the second variant (algorithm randD with imple-
mentation DSAMP) samples arguments dependently on arguments constructed
so far. Our experimental evaluation showed that approximation by sampling sig-
nificantly improves runtime while accuracy is only marginally affected.
     While the present paper used ASPIC+ as the underlying argumentation for-
malism, it should be noted that our algorithms are general enough to be applied
to other structured argumentation approaches such as ABA, DeLP, or deduc-
       Thimm and Rienstra / Approximate Reasoning with ASPIC+ by Argument Sampling 33


tive argumentation. Investigating these application formalisms is part of ongoing
work.

Acknowledgements The research reported here was partially supported by the
Deutsche Forschungsgemeinschaft (grant KE 1686/3-1).


References

 [1]    Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules
        in large databases. In VLDB’94, Proceedings of 20th International Conference on Very
        Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pages 487–499, 1994.
 [2]    Leila Amgoud, Philippe Besnard, and Srdjan Vesic. Equivalence in logic-based argumen-
        tation. Journal of Applied Non-Classical Logics, 24(3):181–208, 2014.
 [3]    Philippe Besnard and Anthony Hunter. Constructing argument graphs with deductive
        arguments: a tutorial. Argument & Computation, 5(1):5–30, 2014.
 [4]    AnneMarie Borg and Christian Straßer. Relevance in structured argumentation. In Pro-
        ceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
        (IJCAI-18), 2018.
 [5]    Federico Cerutti, Sarah A. Gaggl, Matthias Thimm, and Johannes P. Wallner. Foundations
        of implementations for formal argumentation. In Pietro Baroni, Dov Gabbay, Massimil-
        iano Giacomin, and Leendert van der Torre, editors, Handbook of Formal Argumentation,
        chapter 14. College Publications, 2018.
 [6]    Yannis Dimopoulos, Bernhard Nebel, and Francesca Toni. On the computational com-
        plexity of assumption-based argumentation for default reasoning. Artificial Intelligence,
        141(1):57 – 78, 2002.
 [7]    Phan Minh Dung. On the Acceptability of Arguments and its Fundamental Role in Non-
        monotonic Reasoning, Logic Programming and n-Person Games. Artificial Intelligence,
        77(2):321–358, 1995.
 [8]    Wolfgang Dvořák and Paul E. Dunne. Computational problems in formal argumenta-
        tion and their complexity. In Pietro Baroni, Dov Gabbay, Massimiliano Giacomin, and
        Leendert van der Torre, editors, Handbook of Formal Argumentation, chapter 14. College
        Publications, 2018.
 [9]    Vasiliki Efstathiou and Anthony Hunter. Algorithms for generating arguments and coun-
        terarguments in propositional logic. International Journal of Approximate Reasoning,
        52:672–704, 2011.
[10]    Alejandro Javier Garcı́a and Guillermo Ricardo Simari. Defeasible logic programming:
        Delp-servers, contextual queries, and explanations for answers. Argument & Computation,
        5(1):63–88, 2014.
[11]    Sanjay Modgil and Henry Prakken. The ASPIC+ framework for structured argumentation:
        A tutorial. Argument & Computation, 5:31–62, 2014.
[12]    Mark Snaith and Chris Reed. TOAST: online aspic+ implementation. In Proceedings of
        the Fourth International Conference on Computational Models of Argument (COMMA
        2012), pages 509–510. IOS Press, 2012.
[13]    Matthias Thimm and Kristian Kersting. Towards argumentation-based classification. In
        Logical Foundations of Uncertainty and Machine Learning, Workshop at IJCAI’17, Au-
        gust 2017.
[14]    Francesca Toni. A tutorial on assumption-based argumentation. Argument & Computa-
        tion, 5(1):89–117, 2014.
[15]    Leendert van der Torre and Srdjan Vesic. The principle-based approach to abstract argu-
        mentation semantics. In Pietro Baroni, Dov Gabbay, Massimiliano Giacomin, and Leen-
        dert van der Torre, editors, Handbook of Formal Argumentation, chapter 16. College Pub-
        lications, 2018.