=Paper=
{{Paper
|id=Vol-1341/paper7
|storemode=property
|title=Argumentation Theory in the Field: An Empirical Study of Fundamental Notions
|pdfUrl=https://ceur-ws.org/Vol-1341/paper7.pdf
|volume=Vol-1341
|dblpUrl=https://dblp.org/rec/conf/argnlp/RosenfeldK14
}}
==Argumentation Theory in the Field: An Empirical Study of Fundamental Notions==
Argumentation Theory in the Field: An Empirical Study of Fundamental Notions Ariel Rosenfeld Sarit Kraus Bar-Ilan University, Ramat-Gan, Israel Bar-Ilan University, Ramat-Gan, Israel rosenfa5@cs.biu.ac.il sarit@cs.biu.ac.il Abstract (Klein, 2011), Araucaria (Reed and Rowe, 2004), ArgTrust (Tang et al., 2012) and Web-Based Intel- Argumentation Theory provides a very ligent Collaborative System (Liu et al., 2007), that powerful set of principles, ideas and mod- try to provide systems where people can handle els. Yet, in this paper we will show that argumentative situations in a coherent and valid its fundamental principles unsatisfactorily way. We believe that these argumentative tools explain real-world human argumentation and others, as efficient and attractive as they might and should be adapted. We will present be, have a difficult time attracting users outside the an extensive empirical study on the incom- academia due to the gap between the Argumenta- patibility of abstract argumentation and tion Theory and the human argumentative behav- human argumentative behavior, followed ior, which, as previously stated, has not been ad- by practical expansion of existing models. dressed in the context of Argumentation Theory thus far. 1 Introduction In order to further develop argumentative ap- Argumentation Theory has developed rapidly plications and agents, we conducted a novel em- since Dung’s seminal work (Dung, 1995). There pirical study, with hundreds of human subjects, has been extensive work extending Dung’s frame- showing the incompatibility between some of the work and semantics; Value Argumentation Frame- fundamental ideas, stated above, and human argu- work (VAF) (Bench-Capon et al., 2002), Bipo- mentation. In an attempt to mimic and understand lar Argumentation Framework (BAF) (Cayrol and the human argumentative process, these inconsis- Lagasquie-Schiex, 2005) and Weighted Argumen- tencies, which appear even in the weakest argu- tation Framework (WAF) (Dunne et al., 2011) to mentative requirements as conflict freedom, pose name a few. All reasonable frameworks and se- a large concern for theoreticians and practitioners mantics rely on the same fundamental notions: alike. Our findings indicate that the fundamental Conflict Freedom, Acceptability, Extensions from notions are not good predictive features of peo- (Dung, 1995), and expand upon them in some ple’s actions. A possible solution is also presented way. One more notion, which was not addressed which provided better results in explaining peo- in (Dung, 1995), Support, has been increasingly ple’s arguments than the existing theory. This so- gaining attention (Boella et al., 2010). Overall, the lution, which we call Relevance, captures a per- same principals and ideas have prevailed for many ceptual distance between arguments. That is, how years. one argument affects another and how this affect All of these models and semantics try to pro- is comprehended by a reasoner. Relevance also vide a normative approach to argumentation, i.e, holds a predicatory value as shown in recent work how argumentation should work from a logical (Rosenfeld and Kraus, 2014). standard. From a descriptive point of view, the This article’s main contribution is in showing study of (Rahwan et al., 2010), where the authors that the Argumentation Theory has difficulties in investigated the reinstatement principle in behav- explaining a big part of the human argumentative ioral experiments, is the only experimental study, behavior, in an extensive human study. Secondly, as far as we know, that tested argumentation in the proposed notion of relevance could in turn pro- the field. Nevertheless, many argumentative tools vide the argumentation community with an addi- have been developed over time; MIT’s delibrium tional tool to investigate the existing theory and semantics. 2 Dung’s Fundamental Notions Argumentation is the process of supporting claims with grounds and defending them against attacks. Without explicitly specifying the underlying lan- guage (natural language, first order logic. . . ), ar- gument structure or attack/support relations, Dung has designed an abstract argumentation framework (Dung, 1995). This framework, combined with proposed semantics (reasoning rules), enables a reasoner to cope and reach conclusions in an en- vironment of arguments that may conflict, support and interact with each other. These arguments may vary in their grounds and validity. Definition 1. A Dungian Argumentation Frame- Figure 1: An example of a Bipolar Argumenta- work (AF) is a pair < A, R >, where A is a set of tion Framework; nodes are arguments, arrows in- arguments and R is an attack relation over A × A. dicate attacks and arrows with diagonal lines indi- Conflict-Free: A set of arguments S is conflict- cate support. free if there are no arguments a and b in S such that aRb holds. Support (Amgoud et al., 2008). A supporting ar- Acceptable: An argument a ∈ A is considered ac- gument can also be viewed as a part of another ar- ceptable w.r.t a set of arguments S iff ∀b.bRa → gument internal structure. These two options only ∃c ∈ S.cRb. differ in the AF structure; the reasoning outcome Admissible: A set S is considered admissible iff is not influenced. The support relation was intro- it is conflict-free, and each argument in S is ac- duced in order to better represent realistic knowl- ceptable with respect to S. edge. Dung also defined several semantics by which, Let us consider the following example; given an AF , one can derive the sets of arguments Example. that should be considered Justified (to some ex- During a discussion between reporters, R1 and R2 , tent). These sets are called Extensions. The differ- about the publication of information I concerning ent extenstions capture different notions of justifi- person X, the following arguments are presented: cation where some are more strict than others. R1 : I is important information, thus we must pub- Definition 2. An extension S ⊆ A is a set of ar- lish it. guments that satisfies some rules of reasoning. R2 : I concerns the person X, where X is a private Complete Extension: E is a complete extension person and we cannot publish information about a of A iff it is an admissible set and every acceptable private person without his consent. argument with respect to E belongs to E. If you were R1 , what would you say next? Preferred Extension: E is a preferred-extension A. X is a minister, so X is a public person, not a in A iff it is a maximal (with respect to set inclu- private person. sion) admissible set of arguments. B. X has resigned, so X is no longer a minister. Stable Extension: E is a stable-extension in A C. His resignation has been refused by the chief of iff it is a conflict-free set that attacks every ar- the government. gument that does not belong in E. Formally, D. This piece is exclusive to us; If we publish it ∀a ∈ A\E, ∃b ∈ S such that bRa. we can attain a great deal of appreciation from our Grounded Extension: E is the (unique) grounded readers. extension of A iff it is the smallest element (with See Figure 1 for a graphical representation. respect to the inclusion) among the complete ex- In this example, all mentioned semantics agree tensions of A. on a single (unique) extension which consists of Definition 3. Similar to the attack relation R, one all arguments except ”Resigned” (option B) and can consider a separate relation S which indicates ”Private Person” (R2 ’s argument). Thus, all ar- guments except ”Resigned” and ”Private person” 3.1 Results should be considered Justified, regardless of the choice of semantics. The first property we tested was Conflict-Freedom, Argumentation Theory consists of many more which is probably the weakest requirement of a ideas and notions, yet the very fundamental ones set of arguments. We had anticipated that all Ai stated above are the focus of this work. would have this property, yet only 78% of the de- liberants used a conflict-free set Ai . Namely, that 3 Real Dialogs Experiment 22% of the deliberants used at least 2 conflict- ing arguments, i.e, one attacks the other. From To get a deeper understanding of the relations be- a purely logical point of view, the use of con- tween people’s behaviour in argumentation and flicting arguments is very grating. Yet, we know the stated notions, we used real argumentative that some people try to portray themselves as bal- conversations from Penn Treebank Corpus (1995) anced and unbiased, and as such use contradic- (Marcus et al., 1993) of transcribed telephone tory arguments to show that they can consider calls and a large number of chats collected to- both ends of the argument and can act as good ar- ward this aim. The Penn Treebank Corpus con- bitrators. When we examined Acceptability, we sists of transcribed phone calls on various top- tested if every argument a ∈ Ai is acceptable w.r.t ics, among them some controversial topics such as Ai \ {a}. We found that 58% of the deliberants ”Should the death penalty be implemented?” and followed this rule. Admissibility was tested ac- ”Should a trial be decided by a judge or jury?”, cording to both the original framework and the re- with which we chose to begin. We went through stricted framework. Merely 28% of the Ai s used all 33 dialogs on ”Capital Punishment” and 31 di- are considered admissible w.r.t the original frame- alogs on ”Trial by Jury” to identify the arguments work, while more than 49% qualify when consid- used in them and cleared all irrelevant sentences ering the restricted BAF. We can see that people (i.e, greetings, unrelated talk etc.). The short- usually do not make the extra effort to ensure that est deliberation consisted of 3 arguments and the their argument-set is admissible. A possible ex- longest one comprised of 15 arguments (a mean of planation can be values (norms and morals), as de- 7). To these dialogs we added another 157 online scribed in (Bench-Capon et al., 2002). Given a set chats on ”Would you get an influenza vaccination of values, a reasoner may not recognize the attack- this winter?” collected from Israeli students, ages ing arguments as defeating arguments as they ad- ranging from 19 to 32 (mean=24), using a chat in- vocate a weaker value. As such, the reasoner con- terface we implemented. We constructed 3 BAFs, siders his set admissible. A similar explanation is similar to the one in Figure 1, using the arguments provided in (Dunne et al., 2011), where a reasoner extracted from 5 randomly selected conversations. can assign a small weight to the attacking argu- Each conversation which was not selected for the ments and as such still consider his set admissi- BAF construction was then annotated using the ar- ble. These explanations can also partially account guments in the BAFs. All in all, we had 64 phone for the disheartening results in the test of Exten- conversations and 157 online chats, totaling 221, sions. When examining the original framework, all of which are of argumentative nature. less than 30% of Ai s used were a part of some ex- Every conversation provided us with 2 argu- tension, with Preferred, Grounded and Stable per- ment sets A1 and A2 , both subsets of A. We tested forming very similarly (28%, 30%, 25%). When every Ai (i = 1, 2) such that |Ai | ≥ 3 in order to considering the restricted framework, 49%, 50% avoid almost completely trivial sets. and 37% of the deliberants used Ai s that were Participants were not expected to be aware of part of some extension prescribed by Preferred, all arguments in the BAF, as they were not pre- Grounded and Stable (respectively) under the re- sented to them. Thus, in testing the Admissibility stricted BAF. As for Support, 27% of the argu- of Ai and whether Ai is a part of some Extension, ments selected were supporting arguments, i.e, ar- we examined both the original BAF and the re- guments which do not attack any other argument stricted BAF induced by A1 ∪ A2 . That is, the ar- in the framework. Although they cannot change gumentation framework in which A = A1 ∪ A2 the reasoning outcomes, people naturally consider and the attack and support relations are defined the supporting arguments, which traditionally are over A1 ∪ A2 × A1 ∪ A2 , denoted as AF↓A1 ∪A2 . not considered ”powerful”. To strengthen our findings we performed yet an- other experiment. We tested the notions in a con- trolled and structured environment, where the par- ticipant is aware of all arguments in the frame- work. 4 Structured Argumentative Scenarios We collected 6 fictional scenarios, based on known argumentative examples from the literature (Wal- ton, 2005; Liu et al., 2007; Cayrol and Lagasquie- Figure 2: SUV example of BAF Schiex, 2005; Amgoud et al., 2008; Tang et al., 2012). Two groups of subjects took part in this study; ample was as follows; A.35%, B.24%, C.8%, D. the first consisted of 64 US citizens, all of whom 33%. There is only one (unique) extension in this are workers of Amazon Mechanical Turk, ages scenario which includes ”High interest” and ”high ranging from 19 to 69 (mean=38, s.d=13.7) with taxes”. Especially when considering ”Taking out a varying demographics. The second consisted of loan”, it should be considered overruled (unjusti- 78 computer science B.Sc. students from Bar-Ilan fied/invalid), or at least very weak, as it is attacked University (Israel), ages ranging from 18 to 37 by an undisputed argument. As we can see, only (mean=25, s.d=3.7) with similar demographics. slightly over half of the subjects choose an argu- Each subject was presented with the 6 scenar- ment from the extension, i.e, a somewhat Justified ios. Each scenario was presented in a short textual argument. dialog between 2 participants, similar to the jour- nalists’ example above. The subject was instructed 4.1 Results to place himself in one of the deliberants’ roles, The distribution of selections, in all scenarios, sug- given the partial conversation, and to choose the gests that there could be different factors in play, next argument he would use from the four avail- which differ from one subject to another. Thus, able arguments. We instructed the subject to con- there is no decisive answer to what a person would sider only the arguments in the dialog and the pro- say next. Unfortunately, testing Conflict Freedom posed ones, and refrain from assuming any other and Admissibility is inapplicable here. None of information or possible arguments in the dialog’s the subjects was offered an argument that conflicts context. with its previous one and could not choose more The following example, based on (Liu et al., than one argument to construct an admissible set. 2007), was presented to the subjects; When examining Extensions, all scenarios which Example. were presented to the subject are Well Founded A couple is discussing whether or not to buy an (that is to say, there exists no infinite sequence SUV. a0 , a1 , . . . , an , . . . such that ∀i.(ai , ai+1 ) ∈ R). Spouse number 1 (S1 ): ”We should buy an SUV; As such, all mentioned semantics coincide - only it’s the right choice for us”. one extension is Grounded, Stable and Preferred. Spouse number 2 (S2 ): ”But we can’t afford an Of the 6 scenarios, 5 had suggested 2 justified ar- SUV, it’s too expensive”. guments and 2 overruled arguments (arguments The participant was then asked to put himself in which are not part of any extension) to the sub- S1 ’s shoes and choose the next argument to use ject. In these 5 scenarios, 67.3% of the time a jus- in the conversation. The options were: A. ”Good tified argument was selected (on average). This car loan programs are available from a bank”, B. result is disappointing since 50% is achieved by ”The interest rates on car loans will be high”’, C. randomly selecting arguments. As for Support, ”SUVs are very safe, safety is very important to 49.4% of the arguments selected were supporting us”, D. ”There are high taxes on SUVs”. arguments, i.e, arguments which do not attack any See Figure 2 for a graphical representation of the other argument in the framework. Even more in- aforementioned framework. teresting is that 80% of the time people chose (di- The distribution of selections in the above ex- rectly or indirectly) an argument supporting their first argument. This phenomenon can be regarded tion which given argument a ∈ A (for evalua- as a Confirmation Bias, which is recorded in many tion) returns a set of arguments A0 ⊆ A which fields (Nickerson, 1998). Confirmation bias is a are, given the reasoner’s cognitive limitations and phenomenon wherein people have been shown to knowledge, relevant to a. Using Rel, we can actively seek and assign more weight to evidence distinguish between relevant and irrelevant argu- that confirms their beliefs, and ignore or under- ments w.r.t a given argument, yet we gain addi- weigh evidence that could disconfirm their beliefs. tional strength in incorporating the reasoner’s lim- Confirmation Bias can also explain the persistence itation and biases. of discredited beliefs, i.e, why people continue to We denote the restriction of AF to arguments consider an argument valid/invalid despite its log- relevant to a as AF ↓Rel(a) ≡< A0 , R0 > where ical argumentative status. Here it is extremely in- A0 = Rel(a) and R0 = A0 × A0 ∩ R. teresting since the subjects only played a role and On AF ↓Rel(a) one can deploy any semantic of it was not really their original argument. There is a choice. strong tension between the Confirmation Bias and The simplest way to instantiate the Rel is Extensions. In some scenarios the subject is given Rel(·) = A, meaning that all arguments in the a situation in which he ”already used” an overruled AF are relevant to the given argument. This in- argument, and therefore had a problem advocating stantiation is the way the classic frameworks ad- it by using a supporting argument. dress the reasoner’s limitations, simply by saying We had anticipated that in finite and simple ar- – there are none. As shown in (Liao and Huang, gumentative frameworks people would naturally 2013), it is not necessary to discover the status of choose the ”right” arguments, yet we again see all arguments in order to evaluate a specific argu- that the argumentative principals unsatisfactorily ment/set of arguments. Thus, considering Rel(a) explain people’s argumentative selections. This is as the maximal set of affecting arguments (argu- not a complete surprise, since we have many ex- ments in which their status affects the status of a) amples in the literature where people do not ad- is another natural way to consider relevance, yet here to the optimal, monolithic strategies that can without considering cognitive limitations. be derived analytically (Camerer, 2003). We suggest the following instantiation, which We have shown here, in two separate experi- we examined empirically. ments, that a similar phenomenon occurs in the context of argumentation - people do not choose Definition 5. Let D(a, b) be a distance function, ”ideal” arguments according to the Argumentation which given arguments a, b returns the directed Theory. distance from argument a to b in AF ’s graph. Given a distance measurement D we can define 5 Relevance an edge-relevance function as follows: It is well known that human cognition is limited, as Definition 6. RelD (a) = {b|D(b, a) ≤ k} where seen in many examples in (Faust, 1984) and oth- k is a non-negative constant. ers. In chess for example, it is common to think Naturally, when setting k to 0, every argument that a beginner can consider about 3 moves ahead a is considered justified in AF↓RelD (a) (under any and a master about 6. If we consider the argu- semantics). k can be thought of as a depth limita- mentation process as a game (McBurney and Par- tion for the search algorithm used by the reasoner. sons, 2009), a player (an arguer) cannot fully com- Of course, if k = ∞, AF↓RelD (a) = {All affecting prehend all possible moves (arguments) and their arguments on a}. utility (justification status) before selecting a move (argument to use) when the game (framework) is 5.1 Empirical Testing complex. The depth and branching factor limita- We used several D functions in our work on tions of the search algorithms are of course per- predicting arguments given a partial conversation sonal. For example, we would expect an educated (Rosenfeld and Kraus, 2014). When k = 0, as adult to be able to better consider her arguments stated above all arguments should be considered than a small child. justified. Analyzing the free-form dialogs using Definition 4. Let a,b be arguments in some AF . Grounded semantics with k = 2 resulted in 72% Rel : A → P (A) is a personal relevance func- of the arguments used being part of some exten- sion, whereas without relevance a little less than Phan Minh Dung. 1995. On the acceptability of ar- 50% was part of some extension. guments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Relevance provides a way to rationally justify Artificial intelligence, 77(2):321–357. every argument within an AF to some extent. Un- like VAF (Bench-Capon et al., 2002) and WAF Paul E Dunne, Anthony Hunter, Peter McBurney, Si- mon Parsons, and Michael Wooldridge. 2011. (Dunne et al., 2011), which rely on exogenous Weighted argument systems: Basic definitions, al- knowledge about values and weights from the rea- gorithms, and complexity results. Artificial Intelli- soner, relevance can be instantiated without any gence, 175(2):457–486. prior knowledge on the reasoner and still offer a David Faust. 1984. The limits of scientific reasoning. better explanatory analysis of the framework. U of Minnesota Press. 6 Conclusions Mark Klein. 2011. How to harvest collective wisdom on complex problems: An introduction to the mit We presented an empirical study, with over 400 deliberatorium. Center for Collective Intelligence working paper. human subjects and 250 annotated dialogs. Our results, based on both free-form human deliber- Beishui Liao and Huaxin Huang. 2013. Partial seman- ations and structured experiments, show that the tics of argumentation: basic properties and empiri- fundamental principles of Argumentation Theory cal. Journal of Logic and Computation, 23(3):541– 562. cannot explain a large part of the human argumen- tative behavior. Thus, Argumentation Theory, as it Xiaoqing Frank Liu, Samir Raorane, and Ming C Leu. stands, should not be assumed to have descriptive 2007. A web-based intelligent collaborative system for engineering design. In Collaborative product de- or predicatory qualities when it is implemented sign and manufacturing methodologies and applica- with people. tions, pages 37–58. Springer. Our relevance notion provides a new way to Mitchell P Marcus, Mary Ann Marcinkiewicz, and rationalize arguments without prior knowledge Beatrice Santorini. 1993. Building a large anno- about the reasoner. Relevance, as well as other tated corpus of english: The penn treebank. Compu- psychological and social aspects, should be ex- tational linguistics, 19(2):313–330. plored to better fit the Argumentation Theory to Peter McBurney and Simon Parsons. 2009. Dialogue human behavior. This required step is crucial to games for agent argumentation. In Argumentation the integration of argumentation in different hu- in artificial intelligence, pages 261–280. Springer. man domains. Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2):175. References Iyad Rahwan, Mohammed I Madakkatel, Jean- Leila Amgoud, Claudette Cayrol, Marie-Christine François Bonnefon, Ruqiyabi N Awan, and Sherief Lagasquie-Schiex, and Pierre Livet. 2008. On bipo- Abdallah. 2010. Behavioral experiments for as- larity in argumentation frameworks. International sessing the abstract argumentation semantics of re- Journal of Intelligent Systems, 23(10):1062–1093. instatement. Cognitive Science, 34(8):1483–1502. Trevor JM Bench-Capon, Sylvie Doutre, and Paul E Chris Reed and Glenn Rowe. 2004. Araucaria: Soft- Dunne. 2002. Value-based argumentation frame- ware for argument analysis, diagramming and repre- works. In Artificial Intelligence. sentation. International Journal on Artificial Intelli- gence Tools, 13(04):961–979. Guido Boella, Dov M Gabbay, Leendert WN van der Ariel Rosenfeld and Sarit Kraus. 2014. Provid- Torre, and Serena Villata. 2010. Support in abstract ing arguments in discussions based on the predic- argumentation. In COMMA, pages 111–122. tion of human argumentative behavior. Unpublished manuscript. Colin Camerer. 2003. Behavioral game theory: Exper- iments in strategic interaction. Princeton University Yuqing Tang, Elizabeth Sklar, and Simon Parsons. Press. 2012. An argumentation engine: Argtrust. In Ninth International Workshop on Argumentation in Multi- Claudette Cayrol and Marie-Christine Lagasquie- agent Systems. Schiex. 2005. On the acceptability of arguments in bipolar argumentation frameworks. In Symbolic Douglas N Walton. 2005. Argumentation methods for and quantitative approaches to reasoning with un- artificial intelligence in law. Springer. certainty, pages 378–389. Springer.