=Paper=
{{Paper
|id=Vol-1341/paper7
|storemode=property
|title=Argumentation Theory in the Field: An Empirical Study of Fundamental Notions
|pdfUrl=https://ceur-ws.org/Vol-1341/paper7.pdf
|volume=Vol-1341
|dblpUrl=https://dblp.org/rec/conf/argnlp/RosenfeldK14
}}
==Argumentation Theory in the Field: An Empirical Study of Fundamental Notions==
<pdf width="1500px">https://ceur-ws.org/Vol-1341/paper7.pdf</pdf>
<pre>
                         Argumentation Theory in the Field:
                     An Empirical Study of Fundamental Notions

                  Ariel Rosenfeld                                   Sarit Kraus
        Bar-Ilan University, Ramat-Gan, Israel          Bar-Ilan University, Ramat-Gan, Israel
           rosenfa5@cs.biu.ac.il                              sarit@cs.biu.ac.il


                     Abstract                           (Klein, 2011), Araucaria (Reed and Rowe, 2004),
                                                        ArgTrust (Tang et al., 2012) and Web-Based Intel-
    Argumentation Theory provides a very                ligent Collaborative System (Liu et al., 2007), that
    powerful set of principles, ideas and mod-          try to provide systems where people can handle
    els. Yet, in this paper we will show that           argumentative situations in a coherent and valid
    its fundamental principles unsatisfactorily         way. We believe that these argumentative tools
    explain real-world human argumentation              and others, as efficient and attractive as they might
    and should be adapted. We will present              be, have a difficult time attracting users outside the
    an extensive empirical study on the incom-          academia due to the gap between the Argumenta-
    patibility of abstract argumentation and            tion Theory and the human argumentative behav-
    human argumentative behavior, followed              ior, which, as previously stated, has not been ad-
    by practical expansion of existing models.          dressed in the context of Argumentation Theory
                                                        thus far.
1   Introduction
                                                           In order to further develop argumentative ap-
Argumentation Theory has developed rapidly              plications and agents, we conducted a novel em-
since Dung’s seminal work (Dung, 1995). There           pirical study, with hundreds of human subjects,
has been extensive work extending Dung’s frame-         showing the incompatibility between some of the
work and semantics; Value Argumentation Frame-          fundamental ideas, stated above, and human argu-
work (VAF) (Bench-Capon et al., 2002), Bipo-            mentation. In an attempt to mimic and understand
lar Argumentation Framework (BAF) (Cayrol and           the human argumentative process, these inconsis-
Lagasquie-Schiex, 2005) and Weighted Argumen-           tencies, which appear even in the weakest argu-
tation Framework (WAF) (Dunne et al., 2011) to          mentative requirements as conflict freedom, pose
name a few. All reasonable frameworks and se-           a large concern for theoreticians and practitioners
mantics rely on the same fundamental notions:           alike. Our findings indicate that the fundamental
Conflict Freedom, Acceptability, Extensions from        notions are not good predictive features of peo-
(Dung, 1995), and expand upon them in some              ple’s actions. A possible solution is also presented
way. One more notion, which was not addressed           which provided better results in explaining peo-
in (Dung, 1995), Support, has been increasingly         ple’s arguments than the existing theory. This so-
gaining attention (Boella et al., 2010). Overall, the   lution, which we call Relevance, captures a per-
same principals and ideas have prevailed for many       ceptual distance between arguments. That is, how
years.                                                  one argument affects another and how this affect
   All of these models and semantics try to pro-        is comprehended by a reasoner. Relevance also
vide a normative approach to argumentation, i.e,        holds a predicatory value as shown in recent work
how argumentation should work from a logical            (Rosenfeld and Kraus, 2014).
standard. From a descriptive point of view, the            This article’s main contribution is in showing
study of (Rahwan et al., 2010), where the authors       that the Argumentation Theory has difficulties in
investigated the reinstatement principle in behav-      explaining a big part of the human argumentative
ioral experiments, is the only experimental study,      behavior, in an extensive human study. Secondly,
as far as we know, that tested argumentation in         the proposed notion of relevance could in turn pro-
the field. Nevertheless, many argumentative tools       vide the argumentation community with an addi-
have been developed over time; MIT’s delibrium          tional tool to investigate the existing theory and
semantics.

2   Dung’s Fundamental Notions
Argumentation is the process of supporting claims
with grounds and defending them against attacks.
Without explicitly specifying the underlying lan-
guage (natural language, first order logic. . . ), ar-
gument structure or attack/support relations, Dung
has designed an abstract argumentation framework
(Dung, 1995). This framework, combined with
proposed semantics (reasoning rules), enables a
reasoner to cope and reach conclusions in an en-
vironment of arguments that may conflict, support
and interact with each other. These arguments may
vary in their grounds and validity.
Definition 1. A Dungian Argumentation Frame-             Figure 1: An example of a Bipolar Argumenta-
work (AF) is a pair < A, R >, where A is a set of        tion Framework; nodes are arguments, arrows in-
arguments and R is an attack relation over A × A.        dicate attacks and arrows with diagonal lines indi-
Conflict-Free: A set of arguments S is conflict-         cate support.
free if there are no arguments a and b in S such
that aRb holds.                                          Support (Amgoud et al., 2008). A supporting ar-
Acceptable: An argument a ∈ A is considered ac-          gument can also be viewed as a part of another ar-
ceptable w.r.t a set of arguments S iff ∀b.bRa →         gument internal structure. These two options only
∃c ∈ S.cRb.                                              differ in the AF structure; the reasoning outcome
Admissible: A set S is considered admissible iff         is not influenced. The support relation was intro-
it is conflict-free, and each argument in S is ac-       duced in order to better represent realistic knowl-
ceptable with respect to S.                              edge.
   Dung also defined several semantics by which,         Let us consider the following example;
given an AF , one can derive the sets of arguments       Example.
that should be considered Justified (to some ex-         During a discussion between reporters, R1 and R2 ,
tent). These sets are called Extensions. The differ-     about the publication of information I concerning
ent extenstions capture different notions of justifi-    person X, the following arguments are presented:
cation where some are more strict than others.           R1 : I is important information, thus we must pub-
Definition 2. An extension S ⊆ A is a set of ar-         lish it.
guments that satisfies some rules of reasoning.          R2 : I concerns the person X, where X is a private
Complete Extension: E is a complete extension            person and we cannot publish information about a
of A iff it is an admissible set and every acceptable    private person without his consent.
argument with respect to E belongs to E.                 If you were R1 , what would you say next?
Preferred Extension: E is a preferred-extension          A. X is a minister, so X is a public person, not a
in A iff it is a maximal (with respect to set inclu-     private person.
sion) admissible set of arguments.                       B. X has resigned, so X is no longer a minister.
Stable Extension: E is a stable-extension in A           C. His resignation has been refused by the chief of
iff it is a conflict-free set that attacks every ar-     the government.
gument that does not belong in E. Formally,              D. This piece is exclusive to us; If we publish it
∀a ∈ A\E, ∃b ∈ S such that bRa.                          we can attain a great deal of appreciation from our
Grounded Extension: E is the (unique) grounded           readers.
extension of A iff it is the smallest element (with      See Figure 1 for a graphical representation.
respect to the inclusion) among the complete ex-            In this example, all mentioned semantics agree
tensions of A.                                           on a single (unique) extension which consists of
Definition 3. Similar to the attack relation R, one      all arguments except ”Resigned” (option B) and
can consider a separate relation S which indicates       ”Private Person” (R2 ’s argument). Thus, all ar-
guments except ”Resigned” and ”Private person”        3.1   Results
should be considered Justified, regardless of the
choice of semantics.                                  The first property we tested was Conflict-Freedom,
   Argumentation Theory consists of many more         which is probably the weakest requirement of a
ideas and notions, yet the very fundamental ones      set of arguments. We had anticipated that all Ai
stated above are the focus of this work.              would have this property, yet only 78% of the de-
                                                      liberants used a conflict-free set Ai . Namely, that
3   Real Dialogs Experiment                           22% of the deliberants used at least 2 conflict-
                                                      ing arguments, i.e, one attacks the other. From
To get a deeper understanding of the relations be-    a purely logical point of view, the use of con-
tween people’s behaviour in argumentation and         flicting arguments is very grating. Yet, we know
the stated notions, we used real argumentative        that some people try to portray themselves as bal-
conversations from Penn Treebank Corpus (1995)        anced and unbiased, and as such use contradic-
(Marcus et al., 1993) of transcribed telephone        tory arguments to show that they can consider
calls and a large number of chats collected to-       both ends of the argument and can act as good ar-
ward this aim. The Penn Treebank Corpus con-          bitrators. When we examined Acceptability, we
sists of transcribed phone calls on various top-      tested if every argument a ∈ Ai is acceptable w.r.t
ics, among them some controversial topics such as     Ai \ {a}. We found that 58% of the deliberants
”Should the death penalty be implemented?” and        followed this rule. Admissibility was tested ac-
”Should a trial be decided by a judge or jury?”,      cording to both the original framework and the re-
with which we chose to begin. We went through         stricted framework. Merely 28% of the Ai s used
all 33 dialogs on ”Capital Punishment” and 31 di-     are considered admissible w.r.t the original frame-
alogs on ”Trial by Jury” to identify the arguments    work, while more than 49% qualify when consid-
used in them and cleared all irrelevant sentences     ering the restricted BAF. We can see that people
(i.e, greetings, unrelated talk etc.). The short-     usually do not make the extra effort to ensure that
est deliberation consisted of 3 arguments and the     their argument-set is admissible. A possible ex-
longest one comprised of 15 arguments (a mean of      planation can be values (norms and morals), as de-
7). To these dialogs we added another 157 online      scribed in (Bench-Capon et al., 2002). Given a set
chats on ”Would you get an influenza vaccination      of values, a reasoner may not recognize the attack-
this winter?” collected from Israeli students, ages   ing arguments as defeating arguments as they ad-
ranging from 19 to 32 (mean=24), using a chat in-     vocate a weaker value. As such, the reasoner con-
terface we implemented. We constructed 3 BAFs,        siders his set admissible. A similar explanation is
similar to the one in Figure 1, using the arguments   provided in (Dunne et al., 2011), where a reasoner
extracted from 5 randomly selected conversations.     can assign a small weight to the attacking argu-
Each conversation which was not selected for the      ments and as such still consider his set admissi-
BAF construction was then annotated using the ar-     ble. These explanations can also partially account
guments in the BAFs. All in all, we had 64 phone      for the disheartening results in the test of Exten-
conversations and 157 online chats, totaling 221,     sions. When examining the original framework,
all of which are of argumentative nature.             less than 30% of Ai s used were a part of some ex-
   Every conversation provided us with 2 argu-        tension, with Preferred, Grounded and Stable per-
ment sets A1 and A2 , both subsets of A. We tested    forming very similarly (28%, 30%, 25%). When
every Ai (i = 1, 2) such that |Ai | ≥ 3 in order to   considering the restricted framework, 49%, 50%
avoid almost completely trivial sets.                 and 37% of the deliberants used Ai s that were
   Participants were not expected to be aware of      part of some extension prescribed by Preferred,
all arguments in the BAF, as they were not pre-       Grounded and Stable (respectively) under the re-
sented to them. Thus, in testing the Admissibility    stricted BAF. As for Support, 27% of the argu-
of Ai and whether Ai is a part of some Extension,     ments selected were supporting arguments, i.e, ar-
we examined both the original BAF and the re-         guments which do not attack any other argument
stricted BAF induced by A1 ∪ A2 . That is, the ar-    in the framework. Although they cannot change
gumentation framework in which A = A1 ∪ A2            the reasoning outcomes, people naturally consider
and the attack and support relations are defined      the supporting arguments, which traditionally are
over A1 ∪ A2 × A1 ∪ A2 , denoted as AF↓A1 ∪A2 .       not considered ”powerful”.
   To strengthen our findings we performed yet an-
other experiment. We tested the notions in a con-
trolled and structured environment, where the par-
ticipant is aware of all arguments in the frame-
work.

4   Structured Argumentative Scenarios
We collected 6 fictional scenarios, based on known
argumentative examples from the literature (Wal-
ton, 2005; Liu et al., 2007; Cayrol and Lagasquie-              Figure 2: SUV example of BAF
Schiex, 2005; Amgoud et al., 2008; Tang et al.,
2012).
   Two groups of subjects took part in this study;    ample was as follows; A.35%, B.24%, C.8%, D.
the first consisted of 64 US citizens, all of whom    33%. There is only one (unique) extension in this
are workers of Amazon Mechanical Turk, ages           scenario which includes ”High interest” and ”high
ranging from 19 to 69 (mean=38, s.d=13.7) with        taxes”. Especially when considering ”Taking out a
varying demographics. The second consisted of         loan”, it should be considered overruled (unjusti-
78 computer science B.Sc. students from Bar-Ilan      fied/invalid), or at least very weak, as it is attacked
University (Israel), ages ranging from 18 to 37       by an undisputed argument. As we can see, only
(mean=25, s.d=3.7) with similar demographics.         slightly over half of the subjects choose an argu-
   Each subject was presented with the 6 scenar-      ment from the extension, i.e, a somewhat Justified
ios. Each scenario was presented in a short textual   argument.
dialog between 2 participants, similar to the jour-
nalists’ example above. The subject was instructed    4.1   Results
to place himself in one of the deliberants’ roles,    The distribution of selections, in all scenarios, sug-
given the partial conversation, and to choose the     gests that there could be different factors in play,
next argument he would use from the four avail-       which differ from one subject to another. Thus,
able arguments. We instructed the subject to con-     there is no decisive answer to what a person would
sider only the arguments in the dialog and the pro-   say next. Unfortunately, testing Conflict Freedom
posed ones, and refrain from assuming any other       and Admissibility is inapplicable here. None of
information or possible arguments in the dialog’s     the subjects was offered an argument that conflicts
context.                                              with its previous one and could not choose more
   The following example, based on (Liu et al.,       than one argument to construct an admissible set.
2007), was presented to the subjects;                 When examining Extensions, all scenarios which
Example.                                              were presented to the subject are Well Founded
A couple is discussing whether or not to buy an       (that is to say, there exists no infinite sequence
SUV.                                                  a0 , a1 , . . . , an , . . . such that ∀i.(ai , ai+1 ) ∈ R).
Spouse number 1 (S1 ): ”We should buy an SUV;         As such, all mentioned semantics coincide - only
it’s the right choice for us”.                        one extension is Grounded, Stable and Preferred.
Spouse number 2 (S2 ): ”But we can’t afford an        Of the 6 scenarios, 5 had suggested 2 justified ar-
SUV, it’s too expensive”.                             guments and 2 overruled arguments (arguments
The participant was then asked to put himself in      which are not part of any extension) to the sub-
S1 ’s shoes and choose the next argument to use       ject. In these 5 scenarios, 67.3% of the time a jus-
in the conversation. The options were: A. ”Good       tified argument was selected (on average). This
car loan programs are available from a bank”, B.      result is disappointing since 50% is achieved by
”The interest rates on car loans will be high”’, C.   randomly selecting arguments. As for Support,
”SUVs are very safe, safety is very important to      49.4% of the arguments selected were supporting
us”, D. ”There are high taxes on SUVs”.               arguments, i.e, arguments which do not attack any
See Figure 2 for a graphical representation of the    other argument in the framework. Even more in-
aforementioned framework.                             teresting is that 80% of the time people chose (di-
   The distribution of selections in the above ex-    rectly or indirectly) an argument supporting their
first argument. This phenomenon can be regarded          tion which given argument a ∈ A (for evalua-
as a Confirmation Bias, which is recorded in many        tion) returns a set of arguments A0 ⊆ A which
fields (Nickerson, 1998). Confirmation bias is a         are, given the reasoner’s cognitive limitations and
phenomenon wherein people have been shown to             knowledge, relevant to a. Using Rel, we can
actively seek and assign more weight to evidence         distinguish between relevant and irrelevant argu-
that confirms their beliefs, and ignore or under-        ments w.r.t a given argument, yet we gain addi-
weigh evidence that could disconfirm their beliefs.      tional strength in incorporating the reasoner’s lim-
Confirmation Bias can also explain the persistence       itation and biases.
of discredited beliefs, i.e, why people continue to         We denote the restriction of AF to arguments
consider an argument valid/invalid despite its log-      relevant to a as AF ↓Rel(a) ≡< A0 , R0 > where
ical argumentative status. Here it is extremely in-      A0 = Rel(a) and R0 = A0 × A0 ∩ R.
teresting since the subjects only played a role and      On AF ↓Rel(a) one can deploy any semantic of
it was not really their original argument. There is a    choice.
strong tension between the Confirmation Bias and            The simplest way to instantiate the Rel is
Extensions. In some scenarios the subject is given       Rel(·) = A, meaning that all arguments in the
a situation in which he ”already used” an overruled      AF are relevant to the given argument. This in-
argument, and therefore had a problem advocating         stantiation is the way the classic frameworks ad-
it by using a supporting argument.                       dress the reasoner’s limitations, simply by saying
   We had anticipated that in finite and simple ar-      – there are none. As shown in (Liao and Huang,
gumentative frameworks people would naturally            2013), it is not necessary to discover the status of
choose the ”right” arguments, yet we again see           all arguments in order to evaluate a specific argu-
that the argumentative principals unsatisfactorily       ment/set of arguments. Thus, considering Rel(a)
explain people’s argumentative selections. This is       as the maximal set of affecting arguments (argu-
not a complete surprise, since we have many ex-          ments in which their status affects the status of a)
amples in the literature where people do not ad-         is another natural way to consider relevance, yet
here to the optimal, monolithic strategies that can      without considering cognitive limitations.
be derived analytically (Camerer, 2003).                    We suggest the following instantiation, which
   We have shown here, in two separate experi-           we examined empirically.
ments, that a similar phenomenon occurs in the
context of argumentation - people do not choose          Definition 5. Let D(a, b) be a distance function,
”ideal” arguments according to the Argumentation         which given arguments a, b returns the directed
Theory.                                                  distance from argument a to b in AF ’s graph.
                                                           Given a distance measurement D we can define
5   Relevance                                            an edge-relevance function as follows:
It is well known that human cognition is limited, as     Definition 6. RelD (a) = {b|D(b, a) ≤ k} where
seen in many examples in (Faust, 1984) and oth-          k is a non-negative constant.
ers. In chess for example, it is common to think            Naturally, when setting k to 0, every argument
that a beginner can consider about 3 moves ahead         a is considered justified in AF↓RelD (a) (under any
and a master about 6. If we consider the argu-           semantics). k can be thought of as a depth limita-
mentation process as a game (McBurney and Par-           tion for the search algorithm used by the reasoner.
sons, 2009), a player (an arguer) cannot fully com-      Of course, if k = ∞, AF↓RelD (a) = {All affecting
prehend all possible moves (arguments) and their         arguments on a}.
utility (justification status) before selecting a move
(argument to use) when the game (framework) is           5.1   Empirical Testing
complex. The depth and branching factor limita-          We used several D functions in our work on
tions of the search algorithms are of course per-        predicting arguments given a partial conversation
sonal. For example, we would expect an educated          (Rosenfeld and Kraus, 2014). When k = 0, as
adult to be able to better consider her arguments        stated above all arguments should be considered
than a small child.                                      justified. Analyzing the free-form dialogs using
Definition 4. Let a,b be arguments in some AF .          Grounded semantics with k = 2 resulted in 72%
Rel : A → P (A) is a personal relevance func-            of the arguments used being part of some exten-
sion, whereas without relevance a little less than        Phan Minh Dung. 1995. On the acceptability of ar-
50% was part of some extension.                             guments and its fundamental role in nonmonotonic
                                                            reasoning, logic programming and n-person games.
   Relevance provides a way to rationally justify           Artificial intelligence, 77(2):321–357.
every argument within an AF to some extent. Un-
like VAF (Bench-Capon et al., 2002) and WAF               Paul E Dunne, Anthony Hunter, Peter McBurney, Si-
                                                            mon Parsons, and Michael Wooldridge. 2011.
(Dunne et al., 2011), which rely on exogenous
                                                            Weighted argument systems: Basic definitions, al-
knowledge about values and weights from the rea-            gorithms, and complexity results. Artificial Intelli-
soner, relevance can be instantiated without any            gence, 175(2):457–486.
prior knowledge on the reasoner and still offer a
                                                          David Faust. 1984. The limits of scientific reasoning.
better explanatory analysis of the framework.               U of Minnesota Press.

6   Conclusions                                           Mark Klein. 2011. How to harvest collective wisdom
                                                           on complex problems: An introduction to the mit
We presented an empirical study, with over 400             deliberatorium. Center for Collective Intelligence
                                                           working paper.
human subjects and 250 annotated dialogs. Our
results, based on both free-form human deliber-           Beishui Liao and Huaxin Huang. 2013. Partial seman-
ations and structured experiments, show that the            tics of argumentation: basic properties and empiri-
fundamental principles of Argumentation Theory              cal. Journal of Logic and Computation, 23(3):541–
                                                            562.
cannot explain a large part of the human argumen-
tative behavior. Thus, Argumentation Theory, as it        Xiaoqing Frank Liu, Samir Raorane, and Ming C Leu.
stands, should not be assumed to have descriptive           2007. A web-based intelligent collaborative system
                                                            for engineering design. In Collaborative product de-
or predicatory qualities when it is implemented             sign and manufacturing methodologies and applica-
with people.                                                tions, pages 37–58. Springer.
   Our relevance notion provides a new way to
                                                          Mitchell P Marcus, Mary Ann Marcinkiewicz, and
rationalize arguments without prior knowledge               Beatrice Santorini. 1993. Building a large anno-
about the reasoner. Relevance, as well as other             tated corpus of english: The penn treebank. Compu-
psychological and social aspects, should be ex-             tational linguistics, 19(2):313–330.
plored to better fit the Argumentation Theory to          Peter McBurney and Simon Parsons. 2009. Dialogue
human behavior. This required step is crucial to            games for agent argumentation. In Argumentation
the integration of argumentation in different hu-           in artificial intelligence, pages 261–280. Springer.
man domains.
                                                          Raymond S Nickerson. 1998. Confirmation bias: A
                                                            ubiquitous phenomenon in many guises. Review of
                                                            general psychology, 2(2):175.
References
                                                          Iyad Rahwan, Mohammed I Madakkatel, Jean-
Leila Amgoud, Claudette Cayrol, Marie-Christine              François Bonnefon, Ruqiyabi N Awan, and Sherief
  Lagasquie-Schiex, and Pierre Livet. 2008. On bipo-         Abdallah. 2010. Behavioral experiments for as-
  larity in argumentation frameworks. International          sessing the abstract argumentation semantics of re-
  Journal of Intelligent Systems, 23(10):1062–1093.          instatement. Cognitive Science, 34(8):1483–1502.

Trevor JM Bench-Capon, Sylvie Doutre, and Paul E          Chris Reed and Glenn Rowe. 2004. Araucaria: Soft-
  Dunne. 2002. Value-based argumentation frame-             ware for argument analysis, diagramming and repre-
  works. In Artificial Intelligence.                        sentation. International Journal on Artificial Intelli-
                                                            gence Tools, 13(04):961–979.
Guido Boella, Dov M Gabbay, Leendert WN van der
                                                          Ariel Rosenfeld and Sarit Kraus. 2014. Provid-
  Torre, and Serena Villata. 2010. Support in abstract
                                                            ing arguments in discussions based on the predic-
  argumentation. In COMMA, pages 111–122.
                                                            tion of human argumentative behavior. Unpublished
                                                            manuscript.
Colin Camerer. 2003. Behavioral game theory: Exper-
  iments in strategic interaction. Princeton University   Yuqing Tang, Elizabeth Sklar, and Simon Parsons.
  Press.                                                    2012. An argumentation engine: Argtrust. In Ninth
                                                            International Workshop on Argumentation in Multi-
Claudette Cayrol and Marie-Christine Lagasquie-             agent Systems.
  Schiex. 2005. On the acceptability of arguments
  in bipolar argumentation frameworks. In Symbolic        Douglas N Walton. 2005. Argumentation methods for
  and quantitative approaches to reasoning with un-         artificial intelligence in law. Springer.
  certainty, pages 378–389. Springer.

</pre>