Mining Fine-grained Argument Elements Adam Wyner Department of Computing Science University of Aberdeen Meston Building, Meston Walk Aberdeen, AB24 3UK, Scotland azwyner@abdn.ac.uk Abstract between statements in the corpus, including indi- cators for conclusions and a premises; speech acts The paper discusses the architecture and and propositional attitudes; contrasting sentiment development of an Argument Workbench, terminology; and domain terminology that is rep- which supports an analyst in reconstruct- resented in the verbs, nouns, and modifiers of sen- ing arguments from across textual sources. tences. Moreover, linguistic expression is various, The workbench takes a semi-automated, given alternative syntactic or lexical forms for re- interactive approach searching in a corpus lated semantic meaning. It is difficult for people to for fine-grained argument elements, which reconstruct argument from text, and even moreso are concepts and conceptual patterns in for a computer. expressions that are associated with argu- Yet, the presentation of argumentation in text is mentation schemes. The expressions can not a random or arbitrary combination of such el- then be extracted from a corpus and re- ements, but is somewhat structured into reasoning constituted into instantiated argumentation patterns, e.g. defeasible argumentation schemes schemes for and against a given conclu- (Walton, 1996). Furthermore, the scope of linguis- sion. Such arguments can then be input tic variation is not unlimited, nor unconstrained: to an argument evaluation tool. diathesis alternations (related syntactic forms) ap- 1 Introduction pear in systematic patterns (Levin, 1993); a the- sarus is a finite compendium of lexical seman- We have large corpora of unstructured textual in- tic relationships (Fellbaum, 1998); discourse rela- formation such as in consumer websites (Ama- tions (Webber et al., 2011) and speech acts (Searle zon), newspapers (BBC’s “Have Your Say”, or in and Vanderveken, 1985) (by and large) signal sys- policy responses to public consultations. The in- tematic semantic relations between sentences or formation is complex, high volume, fragmentary, between sentences and contexts; and the expres- and either linearly (Amazon or BBC) or alinearly sivity of contrast and sentiment is scoped (Horn, (policy responses) presented as a series of com- 2001; Pang and Lee, 2008). A more open-ended ments or statements. Given the lack of structure of aspect of argumentation in text is domain knowl- the corpora, the cumulative argumentative mean- edge that appears as terminology. Yet here too, ing of the texts is obscurely distributed across in a given corpus on a selected topic, discus- texts. In order to make coherent sense of the infor- sants demonstrate a high degree of topical co- mation, the content must be extracted, analysed, herence, signalling that similar or related concep- and restructured into a form suitable for further tual domain models are being deployed. Though formal and automated reasoning (e.g. ASPAR- argumentation text is complex and coherence is TIX (Egly et al., 2008) that is grounded in Argu- obscured, taken together it is also underlyingly mentation Frameworks (Dung, 1995)). There re- highly organised; after all, people do argue, which mains a significant knowledge acquisition bottle- is meaningful only where there is some under- neck (Forsythe and Buchanan, 1993) between the standing about what is being argued about and textual source and formal representation. how the meaning of their arguments is linguis- Argumentation text is rich, multi-dimensional, tically conveyed. Without such underlying or- and fine-grained, consisting of (among others): a ganisation, we could not successfully reconstruc- range of (explicit and implicit) discourse relations tion and evaluate arguments from source materi- als, which is contrary to what is accomplished in 2.2 Components of Analysis argument analysis. The analysis has several subcomponents: a con- The paper proposes that the elements and struc- sumer argumentation scheme, discourse indica- tures of the lexicon, syntax, discourse, argumen- tors, sentiment terminology, and a domain model. tation, and domain terminology can be deployed The consumer argumentation scheme (CAS) is de- to support the identification and extraction of rel- rived from the value-based practical reasoning ar- evant fine-grained textual passages from across gumentation scheme (Atkinson and Bench-Capon, complex, distributed texts. The passages can then 2007); it represents the arguments for or against be reconstituted into instantiated argumentation buying the consumer item relative to preferences schemes. It discusses an argument workbench that and values. A range of explicit discourse indica- takes a semi-automated, interactive approach, us- tors (Webber et al., 2011) are automatically anno- ing a text mining development environment, to tated, such as those signalling premise, e.g. be- flexibly query for concepts (i.e. semantically an- cause, conclusion e.g. therefore, or contrast and notated) and patterns of concepts within sentences, exception, e.g. not, except. Sentiment terminol- where the concepts and patterns are associated ogy (Nielsen, 2011) is signalled by lexical seman- with argumentation schemes. The concepts and tic contrast: The flash worked poorly is the se- patterns are based on the linguistic and domain mantic negation of The flash worked flawlessly, information. The results of the queries are ex- where poorly is a negative sentiment and flaw- tracted from a corpus and interactively reconsti- lessly is a positive sentiment. Contrast indicators tuted into instantiated argumentation schemes for can similarly be used. Domain terminology spec- and against a given conclusion. Such arguments ifies the objects and properties that are relevant to can then be input to an argument evaluation tool. the users. To some extent the terminology can be From such an approach, a “grammar” for argu- automatically acquired (term frequency) or man- ments can be developed and resources (e.g. gold ually derived and structured into an ontology, e.g corpora) provided. from consumer report magazines or available on- The paper presents a sample use case, elements tologies. Given the modular nature of the analy- and structures, tool components, and outputs of sis as well as the tool, auxilary components can queries. Broadly, the approach builds on (Wyner be added such as speech act verbs, propositional et al., 2013; Wyner et al., 2014; Wyner et al., attitude verbs, sentence conjunctions to split sen- 2012). The approach is contrasted against statis- tences, etc. Each such component adds a further tical/machine learning, high level approaches that dimension to the analysis of the corpus. specify a grammar, and tasks to annotate single passages of argument. 2.3 Components of the Tool To recognise the textual elements of Section 2.2, we use the GATE framework (Cunningham et al., 2 Tool Development and Use 2002) for language engineering applications. It is an open source desktop application written in Java In this section, some of the main elements of the that provides a user interface for professional lin- tool and how it is used are briefly outlined. guists and text engineers to bring together a wide variety of natural language processing tools in a 2.1 Use Case and Materials pipeline and apply them to a set of documents. Our approach to GATE tool development follows The sample use case is based on Amazon con- (Wyner and Peters, 2011). Once a GATE pipeline sumer reviews about purchasing a camera. Con- has been applied to a corpus, we can view the an- sumer reviews can be construed as presenting ar- notations of a text either in situ or extracted using guments concerning a decision about what to buy GATE’s ANNIC (ANNotations In Context) corpus based on various factors. Consumers argue in such indexing and querying tool. reviews about what features a camera has, the rel- In GATE, the gazetteers associate textual pas- ative advantages, experiences, and sources of mis- sages in the corpus that match terms on the lists information. These are qualitative, linguistically with an annotation. The annotations introduced by expressed arguments. gazetteers are used by JAPE rules, creating anno- support tool, so some degree of judgement is re- quired. From the results of queries on the corpus, we have identified the following premises bearing on image quality, where we paraphrase the source and infer the values from context. Agents are also left implicit, assuming that a single agent does not Figure 1: Query and Sample Result make contradictory statements. The premises in- stantiate the CAS in a positive form, where A1 is tations that are visible as highlighted text, can be an argument for buying the camera; similarly, we reused to construct higher level annotations, and can identify statements and instantiated argumen- are easily searchable in ANNIC. Querying for an tation schemes against buying the camera. annotation or a pattern of annotations, we retrieve all the terms with the annotation. A1. P1: The pictures are perfectly exposed. P2: The pictures are well-focused. 2.4 Output and Queries V1: These properties promote image quality. The ANNIC tool indexes the annotated text and C1: Therefore, you (the reader) should by supports semantic querying. Searching in the cor- the Canon SX220. pus for single or complex patterns of annotations returns all those strings that are annotated with Searching in the corpus we can find statements pattern along with their context and source doc- contrary to the premises in A1, constituting an at- ument. Complex queries can also be formed. A tack on A1. To defeat these attacks and maintain query and a sample result appear in Figure 1, A1, we would have to search further in the corpus where the query finds all sequences where the for contraries to the attacks. Searching for such first string is annotated with PremiseIndicator, fol- statements and counterstatements is facilitated by lowed some tokens, then a string annotated with the query tool. Positive sentiment, some tokens, and finally end- ing with a string that is annotated as CameraProp- 3 Discussion erty. The search returned a range of candidate structures that can be further scrutinised; the query The paper presents an outline of an implemented, can be iteratively refined to zero on in other rele- semi-automatic, interactive rule-based text ana- vant passages. The example can be taken as part lytic tool to support analysts in identifying fine- of a positive justification for buying the camera. grained, relevant textual passages that can be re- The query language (the language of the annota- constructed into argumentation schemes and at- tions) facilitates complex search for any of the an- tacks. As such, it is not evaluated with respect notations in the corpus, enabling exploration of the to recall and precision (Mitkof, 2003) in com- statements in the corpus. parison to a gold standard, but in comparison to user facilitation (i.e. analysts qualitative evalu- 2.5 Analysis of Arguments and their ation of using the tool or not), a work that re- Evaluation mains to be done. The tool is an advance over The objective of the tool is to find specific pat- graphically-based argument extraction tools that terns of terminology in the text that can be used rely on the analysts’ unstructured, implicit, non- to instantiate the CAS argumentation scheme both operationalised knowledge of discourse indicators for and against purchase of a particular model of and content (van Gelder, 2007; Rowe and Reed, camera. We iteratively search the corpus for prop- 2008; Liddo and Shum, 2010; Bex et al., 2014). erties, instantiate the argumentation scheme, and There are logic programming approaches that au- identify attacks. Once we have instantiated argu- tomatically annotate argumentative texts: (Pallotta ments in attack relations, we may evaluate the ar- and Delmonte, 2011) classify statements accord- gumentation framework. Our focus in this paper ing to rhetorical roles using full sentence parsing is the identification of arguments and attacks from and semantic translation; (Saint-Dizier, 2012) pro- the source material rather than evaluation. It is im- vides a rule-oriented approach to process specific, portant to emphasise that we provide an analyst’s highly structured argumentative texts. (Moens et al., 2007) manually annotates legal texts then con- The development of the tool can proceed mod- structs a grammar that is tailored to automatically ularly, adding argumentation schemes, developing annotated the passages. Such rule-oriented ap- more articulated domain models, disambiguating proaches do not use argumentation schemes or do- discourse indicators (Webber et al., 2011), intro- main models; they do not straightforwardly pro- ducing auxilary linguistic indicators such as other vide for complex annotation querying; and they verb classes, and other parts of speech that distin- are stand-alone tools that are not integrated with guish sentence components. The tool will be ap- other NLP tools. plied to more extensive corpora and have output that is associated with argument graphing tools. The interactive, incremental, semi-automatic More elaborate query patterns could be executed approach taken here is in contrast to statis- to derive more specific results. In general, the tical/machine learning approaches. Such ap- openness and lexibility of the tool provide a plat- proaches rely on prior creation of gold standard form for future, detailed solutions to a range of corpora that are annotated manually and adjudi- argumentation related issues. cated (considering interannotator agreement). The gold standard corpora are then used to induce a model that (if succesful) annotates corpora com- References parably well to the human annotation. For exam- [Atkinson and Bench-Capon2007] Katie Atkinson and ple, where sentences in a corpora are annotated as Trevor J. M. Bench-Capon. 2007. Practical rea- premise or conclusion, the model ought also to an- soning as presumptive argumentation using action notate the sentences similarly; in effect, what a based alternating transition systems. Artificial In- person uses to classify a sentence as premise or telligence, 171(10-15):855–874. conclusion can be acquired by the computer. Sta- [Bex et al.2014] Floris Bex, Mark Snaith, John tistical approaches yield a probability that some Lawrence, and Chris Reed. 2014. Argublogging: element is classified one way or the other; the jus- An application for the argument web. J. Web Sem., tification, such as found in a rule-based system, 25:9–15. for the classification cannot be given. Moreover, [Cunningham et al.2002] Hamish Cunningham, Diana refinement of results in statistical approaches rely Maynard, Kalina Bontcheva, and Valentin Tablan. on enlarging the training data. Importantly, the 2002. GATE: A framework and graphical develop- rule-based approach outlined here could be used ment environment for robust NLP tools and applica- tions. In Proceedings of the 40th Anniversary Meet- to support the creation of gold standard corpora ing of the Association for Computational Linguistics on which statistical models can be trained. Finally, (ACL’02), pages 168–175. we are not aware of statistical models that support [Dung1995] Phan Minh Dung. 1995. On the accept- the extraction of the fine-grained information that ability of arguments and its fundamental role in appears to be required for extracting argument el- nonmonotonic reasoning, logic programming and n- ements. person games. Artificial Intelligence, 77(2):321– 358. We should emphasis an important aspect of this tool in relation to the intended use on corpora. [Egly et al.2008] Uwe Egly, Sarah Alice Gaggl, and The tool is designed to apply to reconstruct or Stefan Woltran. 2008. Answer-set programming en- codings for argumentation frameworks. Argument construct arguments that are identified in complex, and Computation, 1(2):147–177. high volume, fragmentary, and alinearly presented comments or statements. This is in contrast to [Fellbaum1998] Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT many approaches that, by and large, follow the Press. structure of arguments within a particular (large and complex) document, e.g. the BBC’s Moral [Forsythe and Buchanan1993] Diana E. Forsythe and Maze (Bex et al., 2014), manuals (Saint-Dizier, Bruce G. Buchanan. 1993. Knowledge acquisition for expert systems: some pitfalls and suggestions. 2012), and legal texts (Moens et al., 2007). In In Readings in knowledge acquisition and learning: addition, the main focus of our tool is not just automating the construction and improvement of ex- the premise-claim relationship, but rich concep- pert systems, pages 117–124. Morgan Kaufmann tual patterns that indicate the content of expres- Publishers Inc., San Francisco, CA, USA. sions and are essential in instantiating argumenta- [Horn2001] Laurence Horn. 2001. A Natural History tion schemes. of Negation. CSLI Publications. [Levin1993] Beth Levin. 1993. English Verb Classes [Wyner and Peters2011] Adam Wyner and Wim Peters. and Alternations: A Preliminary Investigation. Uni- 2011. On rule extraction from regulations. In Katie versity of Chicago Press. Atkinson, editor, Legal Knowledge and Information Systems - JURIX 2011: The Twenty-Fourth Annual [Liddo and Shum2010] Anna De Liddo and Si- Conference, pages 113–122. IOS Press. mon Buckingham Shum. 2010. Cohere: A prototype for contested collective intelligence. [Wyner et al.2012] Adam Wyner, Jodi Schneider, Katie In ACM Computer Supported Cooperative Work Atkinson, and Trevor Bench-Capon. 2012. Semi- (CSCW 2010) - Workshop: Collective Intelligence automated argumentative analysis of online product In Organizations - Toward a Research Agenda, reviews. In Proceedings of the 4th International Savannah, Georgia, USA, February. Conference on Computational Models of Argument (COMMA 2012), pages 43–50. IOS Press. [Mitkof2003] Ruslan Mitkof, editor. 2003. The Ox- ford Handbook of Computational Linguistics. Ox- [Wyner et al.2013] Adam Wyner, Tom van Engers, and ford University Press. Anthony Hunter. 2013. Working on the argument pipeline: Through flow issues between natural lan- [Moens et al.2007] Marie-Francine Moens, Erik Boiy, guage argument, instantiated arguments, and argu- Raquel Mochales-Palau, and Chris Reed. 2007. Au- mentation frameworks. In ??, editor, Proceedings of tomatic detection of arguments in legal texts. In the Workshop on Computational Models of Natural Proceedings of the 11th International Conference on Argument, volume LNCS, pages ??–?? Springer. To Artificial Intelligence and Law (ICAIL ’07), pages appear. 225–230, New York, NY, USA. ACM Press. [Wyner et al.2014] Adam Wyner, Katie Atkinson, and [Nielsen2011] Finn Årup Nielsen. 2011. A new Trevor Bench-Capon. 2014. A functional perspec- ANEW: Evaluation of a word list for sentiment anal- tive on argumentation schemes. In Peter McBur- ysis in microblogs. CoRR, abs/1103.2903. ney, Simon Parsons, and Iyad Rahwan, editors, Post-Proceedings of the 9th International Workshop [Pallotta and Delmonte2011] Vincenzo Pallotta and on Argumentation in Multi-Agent Systems (ArgMAS Rodolfo Delmonte. 2011. Automatic argumentative 2013), pages ??–?? To appear. analysis for interaction mining. Argument and Computation, 2(2-3):77–106. [Pang and Lee2008] Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Founda- tions and Trends in Information Retrieval, 2(1-2):1– 135, January. [Rowe and Reed2008] Glenn Rowe and Chris Reed. 2008. Argument diagramming: The Araucaria Project. In Alexandra Okada, Simon Buckingham Shum, and Tony Sherborne, editors, Knowledge Cartography: Software Tools and Mapping Tech- niques, pages 163–181. Springer. [Saint-Dizier2012] Patrick Saint-Dizier. 2012. Pro- cessing natural language arguments with the platform. Argument & Computation, 3(1):49–82. [Searle and Vanderveken1985] John Searle and Daniel Vanderveken. 1985. Foundations of Illocutionary Logic. Cambridge University Press. [van Gelder2007] Tim van Gelder. 2007. The rationale for Rationale. Law, Probability and Risk, 6(1-4):23– 42. [Walton1996] Douglas Walton. 1996. Argumenta- tion Schemes for Presumptive Reasoning. Erlbaum, Mahwah, N.J. [Webber et al.2011] Bonnie Webber, Markus Egg, and Valia Kordoni. 2011. Discourse structure and lan- guage technology. Natural Language Engineering, December. Online first.