Mining Fine-grained Argument Elements
                                          Adam Wyner
                                  Department of Computing Science
                                      University of Aberdeen
                                   Meston Building, Meston Walk
                                   Aberdeen, AB24 3UK, Scotland
                                     azwyner@abdn.ac.uk


                    Abstract                           between statements in the corpus, including indi-
                                                       cators for conclusions and a premises; speech acts
    The paper discusses the architecture and           and propositional attitudes; contrasting sentiment
    development of an Argument Workbench,              terminology; and domain terminology that is rep-
    which supports an analyst in reconstruct-          resented in the verbs, nouns, and modifiers of sen-
    ing arguments from across textual sources.         tences. Moreover, linguistic expression is various,
    The workbench takes a semi-automated,              given alternative syntactic or lexical forms for re-
    interactive approach searching in a corpus         lated semantic meaning. It is difficult for people to
    for fine-grained argument elements, which          reconstruct argument from text, and even moreso
    are concepts and conceptual patterns in            for a computer.
    expressions that are associated with argu-
                                                          Yet, the presentation of argumentation in text is
    mentation schemes. The expressions can
                                                       not a random or arbitrary combination of such el-
    then be extracted from a corpus and re-
                                                       ements, but is somewhat structured into reasoning
    constituted into instantiated argumentation
                                                       patterns, e.g. defeasible argumentation schemes
    schemes for and against a given conclu-
                                                       (Walton, 1996). Furthermore, the scope of linguis-
    sion. Such arguments can then be input
                                                       tic variation is not unlimited, nor unconstrained:
    to an argument evaluation tool.
                                                       diathesis alternations (related syntactic forms) ap-
1   Introduction                                       pear in systematic patterns (Levin, 1993); a the-
                                                       sarus is a finite compendium of lexical seman-
We have large corpora of unstructured textual in-      tic relationships (Fellbaum, 1998); discourse rela-
formation such as in consumer websites (Ama-           tions (Webber et al., 2011) and speech acts (Searle
zon), newspapers (BBC’s “Have Your Say”, or in         and Vanderveken, 1985) (by and large) signal sys-
policy responses to public consultations. The in-      tematic semantic relations between sentences or
formation is complex, high volume, fragmentary,        between sentences and contexts; and the expres-
and either linearly (Amazon or BBC) or alinearly       sivity of contrast and sentiment is scoped (Horn,
(policy responses) presented as a series of com-       2001; Pang and Lee, 2008). A more open-ended
ments or statements. Given the lack of structure of    aspect of argumentation in text is domain knowl-
the corpora, the cumulative argumentative mean-        edge that appears as terminology. Yet here too,
ing of the texts is obscurely distributed across       in a given corpus on a selected topic, discus-
texts. In order to make coherent sense of the infor-   sants demonstrate a high degree of topical co-
mation, the content must be extracted, analysed,       herence, signalling that similar or related concep-
and restructured into a form suitable for further      tual domain models are being deployed. Though
formal and automated reasoning (e.g. ASPAR-            argumentation text is complex and coherence is
TIX (Egly et al., 2008) that is grounded in Argu-      obscured, taken together it is also underlyingly
mentation Frameworks (Dung, 1995)). There re-          highly organised; after all, people do argue, which
mains a significant knowledge acquisition bottle-      is meaningful only where there is some under-
neck (Forsythe and Buchanan, 1993) between the         standing about what is being argued about and
textual source and formal representation.              how the meaning of their arguments is linguis-
   Argumentation text is rich, multi-dimensional,      tically conveyed. Without such underlying or-
and fine-grained, consisting of (among others): a      ganisation, we could not successfully reconstruc-
range of (explicit and implicit) discourse relations   tion and evaluate arguments from source materi-
als, which is contrary to what is accomplished in      2.2   Components of Analysis
argument analysis.
                                                       The analysis has several subcomponents: a con-
   The paper proposes that the elements and struc-     sumer argumentation scheme, discourse indica-
tures of the lexicon, syntax, discourse, argumen-      tors, sentiment terminology, and a domain model.
tation, and domain terminology can be deployed         The consumer argumentation scheme (CAS) is de-
to support the identification and extraction of rel-   rived from the value-based practical reasoning ar-
evant fine-grained textual passages from across        gumentation scheme (Atkinson and Bench-Capon,
complex, distributed texts. The passages can then      2007); it represents the arguments for or against
be reconstituted into instantiated argumentation       buying the consumer item relative to preferences
schemes. It discusses an argument workbench that       and values. A range of explicit discourse indica-
takes a semi-automated, interactive approach, us-      tors (Webber et al., 2011) are automatically anno-
ing a text mining development environment, to          tated, such as those signalling premise, e.g. be-
flexibly query for concepts (i.e. semantically an-     cause, conclusion e.g. therefore, or contrast and
notated) and patterns of concepts within sentences,    exception, e.g. not, except. Sentiment terminol-
where the concepts and patterns are associated         ogy (Nielsen, 2011) is signalled by lexical seman-
with argumentation schemes. The concepts and           tic contrast: The flash worked poorly is the se-
patterns are based on the linguistic and domain        mantic negation of The flash worked flawlessly,
information. The results of the queries are ex-        where poorly is a negative sentiment and flaw-
tracted from a corpus and interactively reconsti-      lessly is a positive sentiment. Contrast indicators
tuted into instantiated argumentation schemes for      can similarly be used. Domain terminology spec-
and against a given conclusion. Such arguments         ifies the objects and properties that are relevant to
can then be input to an argument evaluation tool.      the users. To some extent the terminology can be
From such an approach, a “grammar” for argu-           automatically acquired (term frequency) or man-
ments can be developed and resources (e.g. gold        ually derived and structured into an ontology, e.g
corpora) provided.                                     from consumer report magazines or available on-
   The paper presents a sample use case, elements      tologies. Given the modular nature of the analy-
and structures, tool components, and outputs of        sis as well as the tool, auxilary components can
queries. Broadly, the approach builds on (Wyner        be added such as speech act verbs, propositional
et al., 2013; Wyner et al., 2014; Wyner et al.,        attitude verbs, sentence conjunctions to split sen-
2012). The approach is contrasted against statis-      tences, etc. Each such component adds a further
tical/machine learning, high level approaches that     dimension to the analysis of the corpus.
specify a grammar, and tasks to annotate single
passages of argument.                                  2.3   Components of the Tool
                                                       To recognise the textual elements of Section 2.2,
                                                       we use the GATE framework (Cunningham et al.,
2     Tool Development and Use
                                                       2002) for language engineering applications. It is
                                                       an open source desktop application written in Java
In this section, some of the main elements of the      that provides a user interface for professional lin-
tool and how it is used are briefly outlined.          guists and text engineers to bring together a wide
                                                       variety of natural language processing tools in a
2.1    Use Case and Materials                          pipeline and apply them to a set of documents.
                                                       Our approach to GATE tool development follows
The sample use case is based on Amazon con-            (Wyner and Peters, 2011). Once a GATE pipeline
sumer reviews about purchasing a camera. Con-          has been applied to a corpus, we can view the an-
sumer reviews can be construed as presenting ar-       notations of a text either in situ or extracted using
guments concerning a decision about what to buy        GATE’s ANNIC (ANNotations In Context) corpus
based on various factors. Consumers argue in such      indexing and querying tool.
reviews about what features a camera has, the rel-       In GATE, the gazetteers associate textual pas-
ative advantages, experiences, and sources of mis-     sages in the corpus that match terms on the lists
information. These are qualitative, linguistically     with an annotation. The annotations introduced by
expressed arguments.                                   gazetteers are used by JAPE rules, creating anno-
                                                        support tool, so some degree of judgement is re-
                                                        quired.
                                                           From the results of queries on the corpus, we
                                                        have identified the following premises bearing on
                                                        image quality, where we paraphrase the source
                                                        and infer the values from context. Agents are also
                                                        left implicit, assuming that a single agent does not
       Figure 1: Query and Sample Result
                                                        make contradictory statements. The premises in-
                                                        stantiate the CAS in a positive form, where A1 is
tations that are visible as highlighted text, can be    an argument for buying the camera; similarly, we
reused to construct higher level annotations, and       can identify statements and instantiated argumen-
are easily searchable in ANNIC. Querying for an         tation schemes against buying the camera.
annotation or a pattern of annotations, we retrieve
all the terms with the annotation.                      A1. P1: The pictures are perfectly exposed.
                                                            P2: The pictures are well-focused.
2.4   Output and Queries
                                                            V1: These properties promote image quality.
The ANNIC tool indexes the annotated text and
                                                            C1: Therefore, you (the reader) should by
supports semantic querying. Searching in the cor-
                                                                the Canon SX220.
pus for single or complex patterns of annotations
returns all those strings that are annotated with       Searching in the corpus we can find statements
pattern along with their context and source doc-        contrary to the premises in A1, constituting an at-
ument. Complex queries can also be formed. A            tack on A1. To defeat these attacks and maintain
query and a sample result appear in Figure 1,           A1, we would have to search further in the corpus
where the query finds all sequences where the           for contraries to the attacks. Searching for such
first string is annotated with PremiseIndicator, fol-   statements and counterstatements is facilitated by
lowed some tokens, then a string annotated with         the query tool.
Positive sentiment, some tokens, and finally end-
ing with a string that is annotated as CameraProp-      3   Discussion
erty. The search returned a range of candidate
structures that can be further scrutinised; the query   The paper presents an outline of an implemented,
can be iteratively refined to zero on in other rele-    semi-automatic, interactive rule-based text ana-
vant passages. The example can be taken as part         lytic tool to support analysts in identifying fine-
of a positive justification for buying the camera.      grained, relevant textual passages that can be re-
The query language (the language of the annota-         constructed into argumentation schemes and at-
tions) facilitates complex search for any of the an-    tacks. As such, it is not evaluated with respect
notations in the corpus, enabling exploration of the    to recall and precision (Mitkof, 2003) in com-
statements in the corpus.                               parison to a gold standard, but in comparison to
                                                        user facilitation (i.e. analysts qualitative evalu-
2.5   Analysis of Arguments and their                   ation of using the tool or not), a work that re-
      Evaluation                                        mains to be done. The tool is an advance over
The objective of the tool is to find specific pat-      graphically-based argument extraction tools that
terns of terminology in the text that can be used       rely on the analysts’ unstructured, implicit, non-
to instantiate the CAS argumentation scheme both        operationalised knowledge of discourse indicators
for and against purchase of a particular model of       and content (van Gelder, 2007; Rowe and Reed,
camera. We iteratively search the corpus for prop-      2008; Liddo and Shum, 2010; Bex et al., 2014).
erties, instantiate the argumentation scheme, and       There are logic programming approaches that au-
identify attacks. Once we have instantiated argu-       tomatically annotate argumentative texts: (Pallotta
ments in attack relations, we may evaluate the ar-      and Delmonte, 2011) classify statements accord-
gumentation framework. Our focus in this paper          ing to rhetorical roles using full sentence parsing
is the identification of arguments and attacks from     and semantic translation; (Saint-Dizier, 2012) pro-
the source material rather than evaluation. It is im-   vides a rule-oriented approach to process specific,
portant to emphasise that we provide an analyst’s       highly structured argumentative texts. (Moens et
al., 2007) manually annotates legal texts then con-        The development of the tool can proceed mod-
structs a grammar that is tailored to automatically     ularly, adding argumentation schemes, developing
annotated the passages. Such rule-oriented ap-          more articulated domain models, disambiguating
proaches do not use argumentation schemes or do-        discourse indicators (Webber et al., 2011), intro-
main models; they do not straightforwardly pro-         ducing auxilary linguistic indicators such as other
vide for complex annotation querying; and they          verb classes, and other parts of speech that distin-
are stand-alone tools that are not integrated with      guish sentence components. The tool will be ap-
other NLP tools.                                        plied to more extensive corpora and have output
                                                        that is associated with argument graphing tools.
    The interactive, incremental, semi-automatic
                                                        More elaborate query patterns could be executed
approach taken here is in contrast to statis-
                                                        to derive more specific results. In general, the
tical/machine learning approaches. Such ap-
                                                        openness and lexibility of the tool provide a plat-
proaches rely on prior creation of gold standard
                                                        form for future, detailed solutions to a range of
corpora that are annotated manually and adjudi-
                                                        argumentation related issues.
cated (considering interannotator agreement). The
gold standard corpora are then used to induce a
model that (if succesful) annotates corpora com-        References
parably well to the human annotation. For exam-
                                                       [Atkinson and Bench-Capon2007] Katie Atkinson and
ple, where sentences in a corpora are annotated as         Trevor J. M. Bench-Capon. 2007. Practical rea-
premise or conclusion, the model ought also to an-         soning as presumptive argumentation using action
notate the sentences similarly; in effect, what a          based alternating transition systems. Artificial In-
person uses to classify a sentence as premise or           telligence, 171(10-15):855–874.
conclusion can be acquired by the computer. Sta-       [Bex et al.2014] Floris Bex, Mark Snaith, John
tistical approaches yield a probability that some         Lawrence, and Chris Reed. 2014. Argublogging:
element is classified one way or the other; the jus-      An application for the argument web. J. Web Sem.,
tification, such as found in a rule-based system,         25:9–15.
for the classification cannot be given. Moreover,      [Cunningham et al.2002] Hamish Cunningham, Diana
refinement of results in statistical approaches rely      Maynard, Kalina Bontcheva, and Valentin Tablan.
on enlarging the training data. Importantly, the          2002. GATE: A framework and graphical develop-
rule-based approach outlined here could be used           ment environment for robust NLP tools and applica-
                                                          tions. In Proceedings of the 40th Anniversary Meet-
to support the creation of gold standard corpora          ing of the Association for Computational Linguistics
on which statistical models can be trained. Finally,      (ACL’02), pages 168–175.
we are not aware of statistical models that support
                                                       [Dung1995] Phan Minh Dung. 1995. On the accept-
the extraction of the fine-grained information that       ability of arguments and its fundamental role in
appears to be required for extracting argument el-        nonmonotonic reasoning, logic programming and n-
ements.                                                   person games. Artificial Intelligence, 77(2):321–
                                                          358.
   We should emphasis an important aspect of this
tool in relation to the intended use on corpora.       [Egly et al.2008] Uwe Egly, Sarah Alice Gaggl, and
The tool is designed to apply to reconstruct or            Stefan Woltran. 2008. Answer-set programming en-
                                                           codings for argumentation frameworks. Argument
construct arguments that are identified in complex,        and Computation, 1(2):147–177.
high volume, fragmentary, and alinearly presented
comments or statements. This is in contrast to         [Fellbaum1998] Christiane Fellbaum, editor. 1998.
                                                           WordNet: An Electronic Lexical Database. MIT
many approaches that, by and large, follow the             Press.
structure of arguments within a particular (large
and complex) document, e.g. the BBC’s Moral            [Forsythe and Buchanan1993] Diana E. Forsythe and
Maze (Bex et al., 2014), manuals (Saint-Dizier,            Bruce G. Buchanan. 1993. Knowledge acquisition
                                                           for expert systems: some pitfalls and suggestions.
2012), and legal texts (Moens et al., 2007). In            In Readings in knowledge acquisition and learning:
addition, the main focus of our tool is not just           automating the construction and improvement of ex-
the premise-claim relationship, but rich concep-           pert systems, pages 117–124. Morgan Kaufmann
tual patterns that indicate the content of expres-         Publishers Inc., San Francisco, CA, USA.
sions and are essential in instantiating argumenta-    [Horn2001] Laurence Horn. 2001. A Natural History
tion schemes.                                             of Negation. CSLI Publications.
[Levin1993] Beth Levin. 1993. English Verb Classes         [Wyner and Peters2011] Adam Wyner and Wim Peters.
   and Alternations: A Preliminary Investigation. Uni-        2011. On rule extraction from regulations. In Katie
   versity of Chicago Press.                                  Atkinson, editor, Legal Knowledge and Information
                                                              Systems - JURIX 2011: The Twenty-Fourth Annual
[Liddo and Shum2010] Anna De Liddo and Si-                    Conference, pages 113–122. IOS Press.
    mon Buckingham Shum. 2010. Cohere: A
    prototype for contested collective intelligence.       [Wyner et al.2012] Adam Wyner, Jodi Schneider, Katie
    In ACM Computer Supported Cooperative Work                Atkinson, and Trevor Bench-Capon. 2012. Semi-
    (CSCW 2010) - Workshop: Collective Intelligence           automated argumentative analysis of online product
    In Organizations - Toward a Research Agenda,              reviews. In Proceedings of the 4th International
    Savannah, Georgia, USA, February.                         Conference on Computational Models of Argument
                                                              (COMMA 2012), pages 43–50. IOS Press.
[Mitkof2003] Ruslan Mitkof, editor. 2003. The Ox-
   ford Handbook of Computational Linguistics. Ox-         [Wyner et al.2013] Adam Wyner, Tom van Engers, and
   ford University Press.                                     Anthony Hunter. 2013. Working on the argument
                                                              pipeline: Through flow issues between natural lan-
[Moens et al.2007] Marie-Francine Moens, Erik Boiy,           guage argument, instantiated arguments, and argu-
   Raquel Mochales-Palau, and Chris Reed. 2007. Au-           mentation frameworks. In ??, editor, Proceedings of
   tomatic detection of arguments in legal texts. In          the Workshop on Computational Models of Natural
   Proceedings of the 11th International Conference on        Argument, volume LNCS, pages ??–?? Springer. To
   Artificial Intelligence and Law (ICAIL ’07), pages         appear.
   225–230, New York, NY, USA. ACM Press.
                                                           [Wyner et al.2014] Adam Wyner, Katie Atkinson, and
[Nielsen2011] Finn Årup Nielsen. 2011. A new                 Trevor Bench-Capon. 2014. A functional perspec-
    ANEW: Evaluation of a word list for sentiment anal-       tive on argumentation schemes. In Peter McBur-
    ysis in microblogs. CoRR, abs/1103.2903.                  ney, Simon Parsons, and Iyad Rahwan, editors,
                                                              Post-Proceedings of the 9th International Workshop
[Pallotta and Delmonte2011] Vincenzo Pallotta and             on Argumentation in Multi-Agent Systems (ArgMAS
    Rodolfo Delmonte. 2011. Automatic argumentative           2013), pages ??–?? To appear.
    analysis for interaction mining. Argument and
    Computation, 2(2-3):77–106.

[Pang and Lee2008] Bo Pang and Lillian Lee. 2008.
    Opinion mining and sentiment analysis. Founda-
    tions and Trends in Information Retrieval, 2(1-2):1–
    135, January.

[Rowe and Reed2008] Glenn Rowe and Chris Reed.
   2008. Argument diagramming: The Araucaria
   Project. In Alexandra Okada, Simon Buckingham
   Shum, and Tony Sherborne, editors, Knowledge
   Cartography: Software Tools and Mapping Tech-
   niques, pages 163–181. Springer.

[Saint-Dizier2012] Patrick Saint-Dizier. 2012. Pro-
    cessing natural language arguments with the
    <TextCoop> platform. Argument & Computation,
    3(1):49–82.

[Searle and Vanderveken1985] John Searle and Daniel
    Vanderveken. 1985. Foundations of Illocutionary
    Logic. Cambridge University Press.

[van Gelder2007] Tim van Gelder. 2007. The rationale
    for Rationale. Law, Probability and Risk, 6(1-4):23–
    42.

[Walton1996] Douglas Walton. 1996. Argumenta-
   tion Schemes for Presumptive Reasoning. Erlbaum,
   Mahwah, N.J.

[Webber et al.2011] Bonnie Webber, Markus Egg, and
   Valia Kordoni. 2011. Discourse structure and lan-
   guage technology. Natural Language Engineering,
   December. Online first.