=Paper= {{Paper |id=Vol-1876/paper06 |storemode=property |title=Rhetorical Figure Annotation with XML |pdfUrl=https://ceur-ws.org/Vol-1876/paper06.pdf |volume=Vol-1876 |authors=Sebastian Ruan,Chrysanne Di Marco,Randy Allen Harris |dblpUrl=https://dblp.org/rec/conf/ijcai/RuanMH16 }} ==Rhetorical Figure Annotation with XML== https://ceur-ws.org/Vol-1876/paper06.pdf
    Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




                                 Rhetorical Figure Annotation with XML
                   Sebastian Ruan,* Chrysanne Di Marco,* and Randy Allen Harris**
                                 *Cheriton School of Computer Science
                             **Department of English Language and Literature
                                University of Waterloo, Waterloo Ontario
                     saruan@uwaterloo.ca, cdimarco@uwaterloo.ca, raha@uwaterloo.ca


                         Abstract                                   memorable impression. Why? Two reasons. Firstly, the
                                                                    formal structure and the functional structure are virtually
    There is a driving need to interrogate large bodies
                                                                    isomorphic: Kennedy (and speechwriter Ted Sorensen) ex-
    of text for pragmatic meaning, e.g., to detect sen-
                                                                    pressed the rejection of one civic attitude and its replace-
    timent, diagnose genre, plot chains of reasoning,
                                                                    ment by the opposite one, in the iconicity of reversing the
    and so forth. But this type of meaning is often im-
                                                                    terms of reference. Secondly, that very snug form/function
    plicit, 'hidden' meaning, evoked by linguistic cues,
                                                                    coupling inhabits a material structure that is, on its own,
    stylistic arrangement, or argumentation structure—
                                                                    cognitively very sticky. The Kennedy-Sorensen phrase has
    features that have hitherto been difficult for Natural
                                                                    become so widely known, that is, so easily shared, so fre-
    Language Processing (NLP) systems to recognize
                                                                    quently invoked and quoted and recited because of (1) the
    and use. Pragmatic concerns were historically the
                                                                    schematic congruence with which the form matches the Re-
    province of rhetorical studies, and we have turned
                                                                    jection-Replacement function its arrangement serves, and
    to rhetoric in order to find new solutions to compu-
                                                                    (2) the cognitive affinities humans have for its structural
    tational pragmatics. This paper highlights a form of
                                                                    properties (opposition, repetition, and symmetry).
    rhetorical device that encodes deep levels of prag-
    matic meaning and yet lends itself to automated de-                  The cognitive affinities explain its mnemonic and aes-
    tection. These devices are the linguistic configura-            thetic effects, but, an interest in Computation Argumenta-
    tions known as rhetorical figures, which have been              tion scholars focuses attention on its tight form-functional
    poorly understood and vastly underutilized in                   correlation, in an approach known as figural logic. The form
    Computational Linguistics and Computational Ar-                 makes it tractable for automated detection, while the func-
    gumentation. We present an annotation scheme us-                tion gives us its rhetorical purpose. In terms of argument
    ing XML for rhetorical figures to make figuration               mining, an application that accessed this correlation could
    more tractable for NLP, enhancing applications for              epitomize Kennedy's inaugural address (which argued for
    argument mining, along with a range of other tasks.             the rejection of an ethos of entitlement and its replacement
    We also discuss the intellectual and technical chal-            by an ethos of duty) virtually on the basis of this expression
    lenges involved in figure annotation and the impli-             alone.
    cations for Machine Learning.                                         We are developing an approach to computational prag-
                                                                    matics that combines the insights for argumentation that
1   Introduction                                                    rhetorical figures provide, together with argument mining,
Rhetorical figures are cognitively governed linguistic devic-       corpus linguistics, and machine learning, with payoffs for
es that serve functional, mnemonic, and aesthetic purposes.         both computer science and for rhetoric. There has to this
Take the famous maxim from Kennedy's inaugural address:             point been success at detecting some rhetorical figures, but
                                                                    little sense of what to do with them once they have been
      1.! Ask not what your country can do for you. Ask what        detected.
        you can do for your country. [Kennedy (and
        Sorensen) 1961]                                                 There has been a growing interest in the convergence of
                                                                    rhetoric, argumentation, and NLP, sparked by such works as
This expression quickly became proverbial in the American           Teufel, Carletta and Moens [1999] Crosswhite [2000],
consciousness for the way it captures the spirit of a particu-      Grasso [2002a, 2002b], Reed and Norman [2003], Green
lar historical moment, the ethos of a particular administra-        [2010, 2015], and Teufel [2010], largely under the presiding
tion, and the aspirations of a particular generation. Count-
less more prosaic formulations, by Kennedy and others,
expressed that confluence too, but they left a distinctly less




    24
                                                            Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




genius of Toulmin [2003/1958].1 But aside from passing                   very few important modern exceptions like Perleman and
mentions here and there, rhetorical figures have been almost             Olbrecht-Tyeca [1969], it was largely forgotten as figures
wholly neglected. Our work addresses this surprising omis-               came to be associated with style; style, with aesthetics and
sion.                                                                    superficiality.2
     Our approach is a more sophisticated use of rhetorical                   But figures are not without their challenges for Natural
figures than has been attempted, operating at layers of for-             Language Processing. Metaphor remains elusive, for in-
mal and functional abstraction. It depends fundamentally on              stance, despite all the attention it has attracted in cognitive
an annotation format for rhetorical figures.                             science, AI, and linguistics, including Computational Lin-
     In this paper we argue for the importance of rhetorical             guistics, in the last two decades. Metaphor is a type of figure
figures for NLP generally and argument mining specifically;              known as a trope, which depends on semantic deviation. We
we identify the challenges and opportunities of integrating a            are not yet successful enough with straight-laced semantics
knowledge of figures into NLP; and, most specifically, we                to support forays into semantic distortions. Some tropes
offer an XML annotation scheme for rhetorical figures that               (such as oxymoron, which is a juxtaposition of antonymic
meets some of these challenges and therefore opens up new                terms, such as square circle or deafening silence) can be
opportunities for NLP.                                                   reliably detected [Gawryjolek 2009]. We believe antithesis
                                                                         (juxtaposed opposite predications, as in Sentence 2, a dou-
                                                                         ble antithesis) has a similar potential for reliable detection.
2       Opportunities and Challenges                                     (We adopt the convention of identifying the defining figura-
Computationally, figures are important for four central rea-             tive elements parenthetically.)
sons. First, they are endemic to human language. This is
                                                                              2.! The young would choose an exciting life; the old a
very well established for a few tropes, such as metaphor,
                                                                                happy death. (young, old; life, death) [Alexis
which is the central focus of Cognitive Linguistics and
                                                                                2015:155]
deeply entrenched in ontologies like FrameNet and Word-
Net. But it is equally true of literally (a word we don't use            But most semantic distortions—tropes—are far from tracta-
lightly) hundreds of other figures. If we want language-                 ble computationally. Nor do many of them provide the tight
perceptive algorithms, they must have knowledge of figure                form/function coupling that has such a promising payoff for
structure. Secondly, figures epitomize argument structure,               Computational Argumentation.
increasingly a prime concern for NLP. Again, this is well                     Another type of figure, schemes, are formal deviations,
understood for metaphor (and simile, though it gets much                 shifts of expected structure, as in Sentence 1, an antime-
less overt attention), which epitomize analogic argumenta-               tabole (reverse lexical repetition; in this case you and your
tion. Thirdly, many figures (especially the ones called                  country). The computational detection of figures, including
schemes) work in terms of formal patterns that algorithms                antimetabole, is finding success [Gawryjolek 2009; Gawr-
can detect through surface analysis; our Sentence 1 illus-               yjolek, Harris, and DiMarco 2009; Hromada 2011; O'Reilly
trates this aspect clearly. Fourthly, they correlate with rhe-           2010; O'Reilly and Paurobally 2010; Dubremetz and Nivre
torical functions (pragmatic and argumentative meaning).                 2015].
We will illustrate this shortly. For now, the rejection-                      The work of these researchers is sometimes only loose-
replacement function of Sentence 1 will have to stand.                   ly connected to the rhetorical traditions. Many of them, too,
     The contemporary scholar most responsible for the po-               only concerned detection—an essential first step but one
sition that rhetorical figures are constructions with especial-          that doesn't get us very close to argument mining. They did
ly tight couplings of form and function is Jeanne                        not attempt to find meaning in the figures they detected.
Fahnestock, whose figural logic is brilliantly articluated in            Gawryjolek [2009], Hromada [2011], Dubremetz and Nivre
Rhetorical Figures in Scientific Argumentation [1999; see
also Tindale 2000:69-85; Harris 2013]. Fahnestock charts
rhetorical figures not only for their pragmatic contributions
to everyday language but for the way they epitomize lines of                 2
                                                                               As Rubinelli (2006) points out, topoi are various. Aristotle
argument. As she cogently shows, this position goes back at              distinguished principally between common topoi, such as argument
least to Aristotle, who links specific figures directly to spe-          from opposites, argument from correlatives, and argument from
cific lines of argument (that is, topoi). But, aside from a              definition, which can be applied to arguments in any domain, and
                                                                         particular topoi, which can be applied in particular argument fields.
                                                                         In this paper we are concerned with common topoi, which align
    1
      We do not put Mann and Thompson's [1988] Rhetorical                with rhetorical figures, but see Gladkova, DiMarco, and Harris
Structure Theory (RST) in this category because, while it has made       [2011, 2016] for our approach to particular epistemic topoi in oph-
some valuable insights into text linguistics, it is simply incorrectly   thalmic clinical research. It differs both from Rubinelli's approach
named, by scholars who appear to know little or nothing about            and, more generally, from the types of schemes being used in
rhetoric. RST has really to do with text coherence rather than with      Computation Argumentation analysis by associating "constella-
rhetoric as traditionally understood, as the study of suasive lan-       tions" of features, i.e., features that are linguistically, syntagmati-
guage.                                                                   cally, and semantically related, with specific schemes (here, topoi).




                                                                                                                                        25
    Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




[2015], for instance, appear to have been unfamiliar with the        has priority. Order doesn't matter to addition (multiplication,
rhetorical functions antimetabole serves.                            union, etc.).
    Antimetabole has a small set of rhetorical functions,                 We have built a curated list of over 400 antimetaboles
keyed to the iconicity of its formal structure (which evokes         illustrating these functions, but only have space for a few
balance and opposition, as well as sequence or priority). We         more representative examples:
have very limited space in this paper to demonstrate these               Reciprocal Force
rhetorical functions, so a few examples will have to suffice.
                                                                         7.! A corollary of PHC [the Principle of Hierarchical
     One function of antimetabole is to convey Reciprocal                  Coincidence] is that resources flow toward political
Force, illustrated by Sentence 3, Newton's third law of mo-                power, and political power flows toward resources;
tion. (We adopt the convention of identifying the defining                 or, the power of state and of capital typically appear
figurative elements parenthetically.)                                      in conjunction and are mutually reinforcing. (re-
      3.! If you press a stone with your finger, the finger is al-         sources / political power) [Sartwell 2014]
        so pressed by the stone. (stone / finger) [Newton                8.! Women are changing the universities and the uni-
        1803.1 [1687]:15]                                                  versities are changing women. (women / universities)
Newton's third law is often expressed as "for every action,                [Greer 1988: 629]
there is an equal and opposite reaction," but Newton's own               Reciprocal Specification
argument favored the antimetabole, whose very structure                   9.!The negation of a conjunction is the disjunction of
suggests "equal and opposite" (We give the example in Eng-                     the negations and the negation of a disjunction is
lish, but Newton's original Latin is also antimetabolic.)                      the conjunction of the negations. (negation of a
     A very similar rhetorical function of antimetabole is to                  conjunction / disjunction of the negations) [De
convey Reciprocal Specification, a kind of mutual defini-                      Morgan's law; traditional]
tion, illustrated by Sentence 4:                                         10.! Anger and depression, the pop-psych books tell us,
     4.! Gay rights are human rights, and human rights are                    are two sides of the same coin: depression is anger
        gay rights. (human rights / gay rights) [Clinton 2013:                suppressed, anger is depression liberated. (depres-
        0:08-0:12]                                                            sion / anger) [Hertzberg 2008]
In this phrase the notions of human rights and gay rights are            Comprehensiveness
reciprocally identified with each other. You can't have one               11.! I meant what I said and I said what I meant.
unless you have the other.                                                     (meant / said) [Seuss 1940]
    Another rhetorical function of the antimetabole is to                 12.! Whether we bring our enemies to justice or bring
convey Comprehensiveness, illustrated by the ordinary-                         justice to our enemies, justice will be done. (our
language example, Sentence 5:                                                  enemies / justice) [Bush [and Frum] 2001]
                                                                         Irrelevance of Order
      5.! A place for everything, and everything in its place.
        (place / everything) [Traditional]                                  13.! With a similar qualification, in the Cambridge
The reverse repetition in Sentence 5 shifts from reciprocal                      Grammar of the English Language, a head 'plays
force to a reciprocal coverage, largely because it has prepo-                    the primary role' in 'determining the distribution of
sitional predication rather than the transitive predication of                   the phrase' (introductory chapter signed by Pullum
Newton's Sentence 3. We call this function comprehensive-                        and Huddleston, in Huddleston and Pullum
ness because the sequential iconicity means a back-and-                          2002:24) (Pullum / Huddleston) [Matthews
forth, alpha-to-omega, omega-to-alpha coverage of some                           2007:24]
domain—in this case, the domain of tidiness. All things                     14.! "Spanglish," [is] the combination of Spanish and
have assigned places; all places have their assigned things.                     English (or English and Spanish) (Spanish / Eng-
                                                                                 lish) [Unknown, "Western Spanglish Language"]
    A fourth rhetorical function of the antimetabole is to
                                                                     It is these functions, coupled with the relative ease of rhetor-
convey Irrelevance-Of-Order, well known from algebra and
                                                                     ical-scheme detection, that make rhetorical figures so prom-
predicate calculus:
                                                                     ising for computational tasks in which comprehension is
      6.!m + n = n + m (m / n) [Traditional; commutative             central, like argument mining and text summarization.
          principle]
                                                                         Again, however, there are challenges. They are not as
There are other ways to express the principle of commuta-            thorny as the challenges of most tropes because they con-
tion, but none as natural and iconic as formulae like 6. Op-         cern surface analysis, not semantic plumbing. But they exist.
posite sequences of the same variables, on either side of the        In particular, figures rarely come in isolation. The Kennedy-
same operator, pivoted by a predication of identity, equiva-         Sorenson maxim, for instance (Sentence 1), is an antime-
lence, or equality inescapably means that neither sequence           tabole (you / your country). But it is also an antithesis (ask
                                                                     not X / ask X). It is, thirdly, a mesodiplosis (clause-medial




    26
                                                       Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




repetition; here, can do occurs in the middle of both claus-        early stages, but we believe it holds considerable promise,
es).                                                                and we believe machine-learning corpus studies can be ex-
     We call this phenomenon, when figures co-occur and             tremely helpful, especially for the challenges and opportuni-
mutually reinforce each other, stacking. It presents both a         ties of stacking.
challenge and an opportunity. It is a challenge because ra-               Figural stacking, as we come to understand the func-
ther than detecting a single figure or multiple independent         tional combinatorics better, is perhaps the greatest promise
figures, we need to detect overlapping figures. It is an op-        of rhetorical figures for computational understanding of
portunity because the functions are enhanced and stabilized         natural language. Our paradigm example, which stacks the
under stacking. When two or more figures coincide in the            schemes antimetabole, mesodiplosis (both entailing ploche),
same utterance, the functions they convey are highly con-           and the trope antithesis provides a pitch-perfect example of
sistent. Formal stacking breeds a functional conspiracy.            the rhetorical function, Reject-Replace. A computational
     For instance, when antimetabole stacks with antithesis         analysis of Kennedy's inaugural address tuned to the work-
(conjoined or highly proximal opposite predications), the           ings of rhetorical figures could tell us what the address was
joint function is primarily to reject the negated predication       about—namely, the rejection of an ethos of entitlement and
utterly and replace it with the positive predication. Again,        its replacement with an ethos of responsibility—virtually on
Sentence 1 is our paradigm, but here are two more:                  the basis of this particular stacking (along with, of course,
                                                                    the lexical semantics of you, your country, and so on)
Reject-Replace
                                                                         We can, and should, rely on rhetoricians to tell us what
     15.! We don't build services to make money; we make            the functions of certain figures and certain figure-stacks are,
          money to build better services. (services / money)        at least in these early stages. But the rhetorical tradition is
          [Mark Zuckerberg, qtd in Magid 2012]                      haphazard, and sometimes conflicting. The terminology
     16.! Plain statement must be defined in terms of meta-         alone is forbidding. As much as computational argument
          phor, not metaphor in terms of plain statement.           studies can benefit from a better understanding of rhetorical
          (plain statement / metaphor) [Buck 1899: 69]              figures, rhetorical figures can benefit from computational
The stacking of antithesis with the Reciprocal Specification        studies of form and meaning. (And, yes, that sentence was
function of antimetabole, however, generates a very specific        an antimetabole, stacked with mesodiplosis; the rhetorical
Subclassification function, as in Sentences 17 and18, which         function is Reciprocal Force, modulated by the possibility
say, respectively, that ultrabooks are a class of laptop, and       modality of can.)
compounds are a class of molecules:
                                                                         The path forward is to bootstrap rhetoricians'
Subcatetorization                                                   knowledge by way of annotation, marked-up text corpora,
      17.! Ultrabooks are laptops after all, but not all laptops    and machine learning, so that computationally mined data
           are ultrabooks. (ultrabooks / laptops) [Unknown          can start to tell them what functions figures have, through
           2013, "Ultrabooks vs Laptops"]                           confirmation, through refinement, and through new discov-
      18.! All compounds are molecules (since compounds             eries, all of which we have good reason to anticipate.
           consist of two or more atoms), but not all mole-              We can discover the proportionality of certain stackings
           cules are compounds (since some molecules con-           (anecdotally, both antithesis and mesodiplosis strongly co-
           tain only atoms of the same element). (compounds         occur with antimetabole), the correlation of the stackings
           / molecules) [Volpe 1975:7]                              with the rhetorical functions (as specified above, on the ba-
Some instances of stacking are so common and so predicta-           sis of limited and anecdotal research). At its best, this work
ble as to be entailments. Ploche, for instance, is simple lexi-     can revolutionize Computation Argumentation studies and
cal repetition, so it always stacks with antimetabole (reverse      rhetoric in the way corpus linguistics revolutionized lexi-
lexical repetition). If you find the latter, you always find the    cography and established ontologies like WordNet and
former. Rhetorically, ploche conveys the pragmatic func-            Framenet. But even at its least productive, we are very con-
tion, Identity-Of-Reference, which is always embedded in            fident of finding important form/function correlations that
the functions of antimetabole (if you have reciprocal force         can importantly inform Computation Argumentation and
or reciprocal specification, for instance, you have identical       discourse studies, in novel ways.
entities in a reciprocal relationship). Further, mesodiplosis
clause-medial lexical repetition) also entails ploche as well,      3    Figure Detection
conveying an identical force when the mesodiplosis is a
transitive verb (e.g., Sentences 3, 7, and 8), identical speci-     There have been limited successes in figure detection over
fication when it is a copula verb (e.g., Sentences 4, 9, and        the past several years due to strict figure mappings and
10).                                                                some unreported data [Gawryjolek 2009; Gawryjolek, Har-
                                                                    ris, and DiMarco 2009; Hromada 2011; Strommer 2011;
     We do not pretend to have a full and complete mapping          Alliheedi 2012; Alliheedi and DiMarco 2012; Dubremetz
of form to function, however. This work is still in the very




                                                                                                                             27
        Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




and Nivre 2015]. But it has been restricted both in method                   tingly, because of the way he defined antimetabole. He was
and in scope and has been unconcerned with function.                         unaware he was doing so and does not report his results.
     Hromada's [2011] work, for instance, was very success-                  Dubremetz and Nivre [2015] found some antitheses, be-
ful at the detection of antimetabole, but he defined antime-                 cause they were using negation as a correlative of antime-
tabole in an overdetermined way. Using the Waterloo Figure                   tabole (which markedly improved their success), but they
Representation Notation [Harris and DiMarco 2009]3                           were not looking for them and did not report their results.
(where W stands for Word, the subscripts indicate identity,                  Only Gawryjolek [2009] looked for stacked figures, but that
and "…" represents other linguistic matter, extraneous to the                was not his focus. He did not interpret the stacking at all,
figure, possibly null), Hromada defines antimetabole as                      nor report on the statistics. He was merely looking for mul-
, whereas a more accurate defi-
    A        B   C    C    B   A
                                                                             tiple figures in the same corpus, many of which overlapped.
nition (as in Harris and DiMarco [2009]) is simply [W]a … .                       And, of course, detecting rhetorical figures is the be-
[W]b … [W]b … [W]a. That is, Hromada searched only for                       ginning of the story. We know, from millennia of human-
antimetaboles when they stacked with mesodiplosis (clause                    istic research, that linguistic forms correlate with rhetorical
medial repetition), when there was no additional linguistic                  functions—that figures do communicative work beyond
matter.                                                                      'mere aesthetics'—and we can thank Fahnestock for collat-
     Most of these researchers did not look for stacked fig-                 ing and expanding this research so clearly in the contempo-
ures, except accidentally. Hromada [2011] looked for other                   rary era. On the basis of this research, we can use the de-
figures (anadiplosis, epanaphora, and epiphora), but only in                 tected figures to help chart meanings—sometimes very fun-
isolation.4 Conversely, he 'searched' for mesodiplosis unwit-                damental meanings, like the Reject-Replace antithetical
                                                                             antimetabole of Example 1, which diagnoses the exact ten-
                                                                             ure of Kennedy's inaugural address.
    3
        Hromada [2011] calls this notation, Rhetoric Figure Repre-
sentation Formalism or RFRF, which he adapts from Harris and
                                                                                   But how well do the form-function couplings that hu-
DiMarco [2009]. Harris and DiMarco did not label their formalism             manists have found stand up beyond the small sampling of
in their paper, but we use their term for it here. The WFRN is a             discourse that humanists have been able to explore—in the
formalism for the general structure of rhetorical schemes, but it            conversations, news stories, opinion pieces, blogs, review
does not represent functions at all. For this we need a richer sys-          articles, short stories, tweets, scientific arguments, and so
tem, which may be provided by Construction Grammar (e.g.,                    on, that populate the vast sea of everyday and specialist hu-
Hoffmann and Trousdale 2013). For an argument to this effect, see            man discourse? We don't know, but corpus studies should
[Turner 1997:55-60]. Certainly, there are idiomatic deployments of           tell us. Do Reciprocal Force antimetaboles collate with tran-
these patterns that fit the Construction Grammar mandate fairly
                                                                             sitive verbs, for instance? Do Reciprocal Specification and
well. For instance, the well-known antimetabolic Easier-to-take-
the-A-out-of-B-than-the-B-out-of-A catchphrase is the sort of ex-
                                                                             Subcategorization antimetaboles collate with copulas? Do
pression that preoccupies Construction Grammarians:                          Irrelevance-of-Order antimetaboles collate with conjunc-
     i.![I]t was easier to take the girl out of the brothel than to take     tions and disjunctions? How frequently does mesodiplosis
           the brothel out of the girl. [Walker 2011: 72]                    collate with antimetabole? What other stackings are there,
    ii.!It was much easier to take Kuhn out of Harvard than Harvard          with what functional implications? We have intuitions, and
           out of Kuhn. [Fuller 2001: 387]                                   much particularized research (that is, specific works of rhe-
   iii.!It was found easier to take the evacuee out of the slum than to      torical criticism), but intuitions and particularized research
           take the slum out of the evacuee. [Waller 1940: 30]               need to be tested on copora.
   iv.!After twenty-five years in the field. I've traded the front seat
           of a 4 x 4 for a swivel chair and a desk. The change did not           How do figures cluster in terms of genres? Do individ-
           come easily for me. As the old saying goes — it's a lot eas-      ual authors have identifiable figure proclivities? Is sentiment
           ier to take the man out of the field than to take the field out   a trigger for certain figures? Do certain argument types fa-
           of the man. [Unknown 1995, Oklahoma DWC 1995: 61]                 vour certain figures? Are there author-genre figural effects?
    v.!I could take Tarzan out of the jungle. Could I take the jungle        Argument-sentiment figural effects? Author-sentiment?
           out of Tarzan? [Maxwell 2012: 254]                                Again, intuitions and particularized research suggest an-
     4
         Anadiplosis is clause-final-clause-initial lexical repetition
                                                                             swers; again, these need to be tested.
(< … Wx >< Wx … >). Epanaphora is clause-initial lexical repeti-
tion (< Wx … >< W x … >). Epiphora is clause-final lexical repeti-               When multiple figures co-occur, as they almost always
tion (< … Wx >< … Wx >). Note that these researchers use some-               do, which functions stack, which remain independent, which
what different terminology. Hromada uses anaphora for our epa-
naphora, while Dubremetz and Nivre also use chiasmus for our
antimetabole. In the first case, we avoid anaphora (a synonym in             configurations corresponding to the same label, and with some
the rhetorical tradition for epanaphora) because of its more promi-          linguistic activity that really isn't figurative labeled as figures. The
nent designation in Computational Linguistics, for pronouns. In the          taxonomy of figures is, in short, a mess. We have developed a
second, we prefer the more specialized terms. It is worth noting             much more rigorous, consistent, and principled taxonomy of fig-
that the terminology of rhetorical figures, resulting from over two          ures at Waterloo. See Chien and Harris [2010]; Harris [2013:571-
millennia of research, is highly inconsistent, with different labels         575].
for the same linguistic configurations, with multiple linguistic




        28
                                                      Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)




ones take precedence in the possibility of a conflict? Are         advantages that we have adopted it in our recent work. It is
there functional differences between "accidental" figures          especially valuable for the flexibility it provides in creating
and "designed" figures. If figures are form-function cou-          one's own tags and attributes.
plings, does it even make sense to speak of 'accidental' fig-           Our original markup focused on the names of tags and
ures (we don't speak of accidental predications or passive         did not include attributes. This is adequate, using a general
clauses; they just are)?                                           markup template like the one in 19, for simple instances of
     This work can undoubtedly be strengthened by machine          isolated ploche, such as 20a (annotated as 20b):
learning. We have developed a format for annotating rhetor-             19.!         !     !
ical figures, in parallel to the annotation formalisms devel-                              ...text...
oped for part-of-speech tagging, speech-act annotation, and                    !     !     
! so on. Corpora annotated with rhetorical figures can be used ...text... to train systems on new and more sophisticated detection ! ! !
! tasks, especially for stackings and functional correlations. ...text... ! ! !
!! 4 Challenges and Solutions ...text... ! !
! We want to come at the detection problem for rhetorical figures from the other end. There is a "serious bottleneck … ...text... !
! [from] the lack of annotated data" [Dubremetz and Nivre 2015]. We believe that texts curated by rhetoricians, marked 20.! a. He hated white oppression and white domina- up for all occurrences of certain rhetorical figures, will pro- tion, not white people themselves. (white) vide rich data for machine learning, and we have developed b. ! He hated ! ! ! ! ! ! white! ! ! ! oppression and ! ! ! white!! ! ! ! domination, not ! ! ! white! ! ! ! ! ! people themselves. ! an annotation scheme to structure the data. The labels in our figure annotation scheme are in effect features pertaining to The container tag marks off the beginning of the figure identification and classification. Algorithms trained text while the
tag reveals the beginning posi- on such data will, in turn, be more fully equipped for auto- tion of the figure. The vital tags of this markup are the tags which encompass the defining The Extensible Markup Language (XML) is widely features of a figure. In Example 20b they are . Figure 1 illustrates the hierarchical nature of the markup ures. The main challenges of using such an annotation for 20b. These markers provide information about elements scheme is in the intricacies that figure-rich texts present. such as letter groups (A-Z are the same across tags if the These intricacies include stacking figures and interpenetrat- content of the tag has the same word or group of letters) and ing figures. The annotation methods developed in this paper relative positioning (1 to 3). Issues with this markup arise address these two issues. The desire is to develop an annota- quickly, but the main idea of marking defining elements still tion scheme that will highlight the structure of rhetorical has its uses. figures allowing them to be more easily understood by There are syntactic and semantic issues that form when computational learning-based algorithms while keeping applying the markup to more figure-rich texts. By analyzing figures intact. Now, using XML we analyze the develop- an example (1, repeated here for convenience), we demon- ment process of a suitable markup. strate the problems. (A fully formatted example is given in We have used HTML in the past for annotating fig- Figure 2.) ures—specifically, JANTOR (Java ANnotation Tool Of 21.! a. Ask not what your country can do for you. Ask Rhetoric) allowed for "manual and automated annotation of what you can do for your country. (your country / files in HTML format" [Gawryjolek 2009; Gawryjolek, Har- you; ask not x / ask y) [Kennedy (and Sorensen) ris, and DiMarco 2009]—but XML presents such obvious 1961] 29 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) b. ...! ......! ......! ... Ask not what Figure 3: Problems arise from ending the antithesis tag be- fore ending the antimetabole tag. your country Figure 3 displays the complexity of this version of the annotation scheme. The dashed arrows represent the conse- can do for quences of tagging when you need to mark the end of the you antithesis before the end of the antimetabole; there is no hierarchy, or perhaps only a partial and .fragmentary hierar- Ask chy, but it creates havoc. The nesting, if we can even call it that, is incomplete, falling outside XML's basic capacities. what Hierarchy problems also become apparent as you element/number>!tags are sub-tags of .! can do for The improved annotation scheme recognizes the above problems and attempts to resolve them. It focuses on high- your country lighting the defining elements of figures. A general markup is shown in number 22 (a fully formatted example for this markup is provided in Figure 5, given between the Conclu- sion and the Acknowledgements for purposes of layout): 22.! ...! Figure 2: The full hierarchical structure of Example 21b. text...! Bolding indicates syntax errors. If one wanted to create a hierarchy, say in the instance that figure1 always accompanies figure2 meaning figure1 is a subpart of figure2, this is still possible. The XML from the A syntax issue arises in Example 21b where multiple figure example would look like: … tags close in the incorrect parent tag. For example, we have . …! ……. Figure 3 below archy. shows the other figures that also fall to this error. As Figure 4 reveals, the improved markup focuses on The syntax of XML does not allow the interpenetration tagging parts of strings and providing them with more in- of tags. When considering this problem, it becomes apparent formation. The figure focusses on antithesis, antimetabole that the tags marking off the beginning and endings of fig- and ploche, where ploche referes to ploche1. Notice how we ures are causing the most trouble. Further analysis reveals that these tags are unnecessary. The key components of a figure are their defining elements such as repeating or con- trasting elements (words, sounds). The semantic complication has to do with nesting XML tags. Arbitrary hierarchies can form when some figures hap- pen to appear inside others. Rhetorical figures may, howev- er, contain other rhetorical figures which do observe hierar- chical properties. Thus we require a method that is more explicit about creating hierarchies. This is achievable with the introduction of attributes and thus the creation of a new annotation scheme. 30 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ABBA structure. To differentiate between them we write the position attribute of the first A as Antimetabole-1 and the second as Antimetabole-2. Using these tags and attributes to annotate rhetorical figures in text would create the required computational structure for figure analysis. 5 Conclusion The computational uses of rhetorical figures are indisputa- ble. We can clearly see their ability to enhance fields such as author and genre detection, NLP systems, and argumenta- tion mining. We also know how intricate they can become. Figure 4: Improved annotation scheme, tagging parts of Stacking and intersecting with one another, many figures strings and providing more information. can be overlooked as observed in the previous works men- tioned here. To exploit their uses, yet overcome their intri- are able to combine the antimetabole and ploche tags into cacy, a rhetorical figure markup becomes imperative and one attribute and avoid a hierarchy should be thought of as such. Using attributes also helps to separate information Our annotation scheme represents the first move in about a tag providing algorithms with easier access. The what we hope will be a line of research that others will find lettergroup attribute grants information on which tags sur- profitable to join. The outline of the annotation scheme has round the same word or, as the names suggests, groupings been developed, and now the flexibility of XML allows oth- of letters. If the letters inside the tag are the same as inside ers to improve and customize the mechanism for their own another tag the attribute will end in the same character. The uses. The eventual goal is to develop a markup scheme that position attribute clarifies the location of the letter group in provides computationally accessible information for all rhe- the figure. For example, antimetabole has two A's in its torical figures. Figure 5: The full hierarchical structure of Sentence 1 (repeated as 21a), in accord with the tagging specified in 22. Kelly, Isabel Li, Ricky Rong, and Terry Stewart; our inter- Acknowledgements national colleagues, including Cliff (again), Marie Du- We would like to thank Cliff O'Reilly for valuable bremetz, Jelena Mitrovic, Chris Reed, and James Wynn; and XML advice, as well as our colleagues at the University of the Social Sciences and Humanities Research Council of Waterloo, including Elena Afros, Adam Bradley, Ashley Canada for financial assistance. We also thank three anon- 31 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ymous reviewers for CMNA, for their helpful queries and ing rhetorical figures. Proceedings, CMNA IX (Computa- suggestions. Our figure-annotation research is part of an tional Models of Natural Argument), held with IJCAI-09, overall project of Computational Rhetoric at the University Pasadena, CA, July 13. of Waterloo, organized around a comprehensive OWL- Gladkova, Olga, Chrysanne DiMarco and Randy Harris, based ontology of rhetorical figures. 2016. Argumentative meanings and their stylistic configura- tions in clinical research publications. Argument & Compu- References tation 6.3: 310-346. Alexis, Andre. 2015. Fifteen dogs. Toronto: Coach House Gladkova, Olga, Randy Allen Harris and Chrysanne Di- Books. Marco. 2011. Schematic organization of clinical decision- Alliheedi, Mohammed. 2012. Multi-document summari- making: Findings from qualitative corpus analysis. Proceed- zation system using rhetorical information. Master of Math- ings, CMNA XI (Computational Models of Natural Argu- ematics thesis, Cheriton School of Computing, University of ment), 7 August 11, San Francisco, CA. Waterloo. [Supervised by Chrysanne DiMarco; Randy Allen Grasso, Floriana. 2002a. Towards a framework for rhetor- Harris, Second Reader.] ical argumentation. EDILOG 2002 - Proceedings of the 6th Alliheedi, Mohammed, and Chrysanne DiMarco. 2014. Workshop on the Semantics and Pragmatics of Dialogue, J. Rhetorical figuration as a metric in text summarization. Bos, M.E. Foster and C. Matheson (eds), Edinburgh, UK, 4- Proceedings, 2014 Canadian Artificial Intelligence Confer- 6 September 2002, p. 53-60. ence, Montreal, QC, May 6-9. Grasso, Floriana. 2002b. Towards computational rhetoric. Buck, Gertrude. 1899. The metaphor: A study in the psy- Informal Logic 29.3: 195-229. chology of rhetoric. in Contributions to Rhetorical Theory. Green, Nancy. 2010. Representation of argumentation in Ed. Fred Newton Scott. Ann Arbor: Inland. text with Rhetorical Structure Theory. Argumentation Bush, George W [and David Frum]. 2001. Address before 24.2:181-196. a joint session of the congress on the United States response Green, Nancy. 2015. Identifying argumentation schemes to the terrorist attacks of September 11. The American Pres- in genetics research articles. In Proceedings of the Second idency Project. Gerhard Peters and John T. Woolley. Workshop on Argumentation mining, North American Con- http://www.presidency.ucsb.edu/ws/?pid=64731 ference of the Association for Computational Linguistics Chien, Lynn, and Randy Allen Harris. 2010. Scheme (NAACL), 12-21, Denver, CO, 2015. trope chroma chengyu: Figuration in Chinese four-character Greer, Germaine. 1988. The proper study of womankind. idioms. Cognitive Semiotics 10.6:155-178. Times Literary Supplement (3-9 June). Clinton, Hilary. 2013. Statement for the Americans for Harris, Randy Allen. 2013. Figural logic in Mendel's Ex- Marriage Equality campaign. Human Rights Campaign. (18 periments on plant hybrids. Philosophy and Rhetoric 46.4: March.) http://www.hrc.org/videos/videos-hillary-clinton- 570-602. supports-marriage-equality#.UXAbPys4Xvl Harris, Randy Allen, and Chrysanne DiMarco. 2009. Crosswhite, James. 2000. Rhetoric and computation. Constructing a rhetorical figuration ontology. Symposium on Symposium on Argument and Computation. Bonskeid Persuasive Technology and Digital Behaviour Intervention, House, Perthshire, Scotland. June 27. Convention of the Society for the Study of Artificial Intelli- Dubremetz, Marie, and Joakim Nivre. 2015. Rhetorical gence and Simulation of Behaviour (AISB), Edinburgh, figure detection: the case of chiasmus. Proceedings of Scotland, April. NAACL-HLT Fourth Workshop on Computational Linguis- Hertzberg, Hendrik. 2008. The spat. New Yorker (Febru- tics for Literature, Denver, CO, June 4. ary 11). Fahnestock, Jeanne. 1999. Rhetorical figures in scientific Hoffmann, Thomas, and Graeme Trousdale, eds. 2013. argumentation. New York: Oxford University Press. The Oxford handbook of Construction Grammar. New Fuller, Steve. 2001. Thomas Kuhn: A philosophical histo- York: Oxford University Press. ry for our times. Chicago: Chicago University Press. Hromada, Daniel Devatman. 2011. Initial experiments Gawryjolek, Jakub J. 2009. Automated annotation and with multilingual extraction of rhetoric figures by means of visualization of rhetorical figures. Master of Mathematics PERL-compatible regular expressions. Proceedings of the thesis, Cheriton School of Computing, University of Water- Second Student Research Workshop associated with RANLP loo. [Supervised by Chrysanne DiMarco; Randy Allen Har- 2011, Hissar, Bulgaria. ris, Second Reader.] Kanoksilapatham, Budsaba. 2003. A corpus-based inves- Gawryjolek, Jakub J., Randy Allen Harris, and Chrysanne tigation of scientific research articles: Linking move analy- DiMarco. 2009. An annotation tool for automatically detect- sis with multidimensional analysis. PhD dissertation, De- partment of Linguistics, Georgetown University. 32 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) Kanoksilapatham, Budsaba. 2005. Rhetorical structure of tion, Cheriton School of Computing, University of Water- biochemistry research articles. English for Specific Purposes loo. [Supervised by Chrysanne DiMarco; Randy Allen Har- 24.3:269-292. ris, Committee Member.] Kennedy, John F. [and Theodor Sorensen]. 1961. Inaugu- Tindale, Christopher W. 2000. Acts of arguing: A rhetori- ral Address. cal model of argument. Albany, NY: State University of http://www.presidency.ucsb.edu/ws/index.php?pid=8032 New York Press. Magid, Larry. 2012. Zuckerberg claims we don't build Teufel, Simone, J. Carletta and M. Moens. 1999. An an- services to make money. Forbes (1 February) notation scheme for discourse-level argumentation in re- http://www.forbes.com/sites/larrymagid/2012/02/01/zuckerb search articles. In Proceedings of the Ninth Conference on erg-claims-we-dont-build-services-to-make-money/ European Chapter of the Association for Computational Mann, William C., and Sandra A. Thompson. 1988. Rhe- Linguistics, Stroudsburg, PA, 110-117. torical Structure Theory: Toward a functional theory of text Teufel, Simone. 2010. The structure of scientific articles: organization. Text 8.3:243-281. Applications to citation indexing and summarization. San Matthews, P. H. 2007. Syntactic relations: A critical sur- Francisco: CSLI Publications. vey. Cambridge Studies in Linguistics 114. Cambridge: Turner, Mark. 1997. Figure. In Figurative language and Cambridge University Press. thought, Cristina Cacciari, Ray Gibbs, Jr., Albert Katz, and Maxwell, Robin. 2012. Jane: The woman who loved Tar- Mark Turner, eds. New York: Oxford University Press, 44- zan. New York: Macmillan. 87. Newton, Sir Isaac. 1803 [1687]. The mathematical prin- Unknown author. Unknown date. Western Spanglish lan- ciples of natural philosophy. Three volumes. Trans. by An- guage: The United States unofficial language. Western drew Motte. London. H.D. Symonds. women in leadership and innovation: Discovering the well- springs of metaphorical voices. O'Reilly, Cliff. 2010. Lassoing rhetoric with OWL and http://westernwomenleadershipinnovation.net/western- SWRL. Unpublished MSc dissertation. Available: spanglish-language.html http://computationalrhetoricworkshop.uwaterloo.ca/wp- con- Unknown. 1995. Oklahoma Department Of Wildlife Con- tent/uploads/2016/06/LassoingRhetoricWithOWLAndSWR servation. Outdoor Oklahoma. Volumes 51-52. L.pdf Unknown. 2013. Ultrabooks vs Laptops. 2013. Java (Jan- O'Reilly, Cliff, and Shamima Paurobally. 2010. Lassoing uary 26). http://java- rhetoric with OWL and SWRL. Unpublished. Available maheshyadav.blogspot.ca/2013/01/ultrabooks-vs- http://www.academia.edu/2095469/Lassoing_Rhetoric_with laptops.html. _OWL_and_SWRL Volpe, Peter E. 1975. Man, nature, and society: An intro- Perelman, Chaïm, and Lucie Olbrecht-Tyteca. 1969. The duction to biology. Dubuque IA: W. C. Brown Company. new rhetoric: A treatise on argumentation. Translated by Walker, Daniel. 2011. God in a brothel: An undercover John Wilkinson. Notre Dame: Notre Dame University Press. journey into sex trafficking and rescue. Downers Grove, IL: Reed, Chris, and G.W.A. Rowe. 2004. Araucaria: Soft- InterVarsity Press. ware for argument analysis, diagramming and representa- Waller, Willard. 1940. War and the family. Hinsdale, Il: tion. International Journal of AI Tools 13.4):961-980. The Dryden press. Reed, Chris , and Timothy J. Norman, editors. 2003. Ar- gumentation machines: New frontiers in argument and computation. Dordrecht, The Netherlands: Kluwer. Rubinelli, Sara. 2006. The ancient argumentative game: Topoi and loci in action. Argumentation 20.3:253-272. Sartwell, Crispin. 2014. The left-right political spectrum is bogus. The Atlantic (June 20). http://www.theatlantic.com/politics/archive/2014/06/the- left-right-political-spectrum-is-bogus/373139/ Seuss, Dr. [Theodore S. Geisel.] 1940. Horton hatches the egg. New York: Random House. Strommer, Claus. 2011. Using rhetorical figures and shal- low attributes as a metric of intent in text. Doctoral Disserta- 33