Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) Rhetorical Figure Annotation with XML Sebastian Ruan,* Chrysanne Di Marco,* and Randy Allen Harris** *Cheriton School of Computer Science **Department of English Language and Literature University of Waterloo, Waterloo Ontario saruan@uwaterloo.ca, cdimarco@uwaterloo.ca, raha@uwaterloo.ca Abstract memorable impression. Why? Two reasons. Firstly, the formal structure and the functional structure are virtually There is a driving need to interrogate large bodies isomorphic: Kennedy (and speechwriter Ted Sorensen) ex- of text for pragmatic meaning, e.g., to detect sen- pressed the rejection of one civic attitude and its replace- timent, diagnose genre, plot chains of reasoning, ment by the opposite one, in the iconicity of reversing the and so forth. But this type of meaning is often im- terms of reference. Secondly, that very snug form/function plicit, 'hidden' meaning, evoked by linguistic cues, coupling inhabits a material structure that is, on its own, stylistic arrangement, or argumentation structure— cognitively very sticky. The Kennedy-Sorensen phrase has features that have hitherto been difficult for Natural become so widely known, that is, so easily shared, so fre- Language Processing (NLP) systems to recognize quently invoked and quoted and recited because of (1) the and use. Pragmatic concerns were historically the schematic congruence with which the form matches the Re- province of rhetorical studies, and we have turned jection-Replacement function its arrangement serves, and to rhetoric in order to find new solutions to compu- (2) the cognitive affinities humans have for its structural tational pragmatics. This paper highlights a form of properties (opposition, repetition, and symmetry). rhetorical device that encodes deep levels of prag- matic meaning and yet lends itself to automated de- The cognitive affinities explain its mnemonic and aes- tection. These devices are the linguistic configura- thetic effects, but, an interest in Computation Argumenta- tions known as rhetorical figures, which have been tion scholars focuses attention on its tight form-functional poorly understood and vastly underutilized in correlation, in an approach known as figural logic. The form Computational Linguistics and Computational Ar- makes it tractable for automated detection, while the func- gumentation. We present an annotation scheme us- tion gives us its rhetorical purpose. In terms of argument ing XML for rhetorical figures to make figuration mining, an application that accessed this correlation could more tractable for NLP, enhancing applications for epitomize Kennedy's inaugural address (which argued for argument mining, along with a range of other tasks. the rejection of an ethos of entitlement and its replacement We also discuss the intellectual and technical chal- by an ethos of duty) virtually on the basis of this expression lenges involved in figure annotation and the impli- alone. cations for Machine Learning. We are developing an approach to computational prag- matics that combines the insights for argumentation that 1 Introduction rhetorical figures provide, together with argument mining, Rhetorical figures are cognitively governed linguistic devic- corpus linguistics, and machine learning, with payoffs for es that serve functional, mnemonic, and aesthetic purposes. both computer science and for rhetoric. There has to this Take the famous maxim from Kennedy's inaugural address: point been success at detecting some rhetorical figures, but little sense of what to do with them once they have been 1.! Ask not what your country can do for you. Ask what detected. you can do for your country. [Kennedy (and Sorensen) 1961] There has been a growing interest in the convergence of rhetoric, argumentation, and NLP, sparked by such works as This expression quickly became proverbial in the American Teufel, Carletta and Moens [1999] Crosswhite [2000], consciousness for the way it captures the spirit of a particu- Grasso [2002a, 2002b], Reed and Norman [2003], Green lar historical moment, the ethos of a particular administra- [2010, 2015], and Teufel [2010], largely under the presiding tion, and the aspirations of a particular generation. Count- less more prosaic formulations, by Kennedy and others, expressed that confluence too, but they left a distinctly less 24 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) genius of Toulmin [2003/1958].1 But aside from passing very few important modern exceptions like Perleman and mentions here and there, rhetorical figures have been almost Olbrecht-Tyeca [1969], it was largely forgotten as figures wholly neglected. Our work addresses this surprising omis- came to be associated with style; style, with aesthetics and sion. superficiality.2 Our approach is a more sophisticated use of rhetorical But figures are not without their challenges for Natural figures than has been attempted, operating at layers of for- Language Processing. Metaphor remains elusive, for in- mal and functional abstraction. It depends fundamentally on stance, despite all the attention it has attracted in cognitive an annotation format for rhetorical figures. science, AI, and linguistics, including Computational Lin- In this paper we argue for the importance of rhetorical guistics, in the last two decades. Metaphor is a type of figure figures for NLP generally and argument mining specifically; known as a trope, which depends on semantic deviation. We we identify the challenges and opportunities of integrating a are not yet successful enough with straight-laced semantics knowledge of figures into NLP; and, most specifically, we to support forays into semantic distortions. Some tropes offer an XML annotation scheme for rhetorical figures that (such as oxymoron, which is a juxtaposition of antonymic meets some of these challenges and therefore opens up new terms, such as square circle or deafening silence) can be opportunities for NLP. reliably detected [Gawryjolek 2009]. We believe antithesis (juxtaposed opposite predications, as in Sentence 2, a dou- ble antithesis) has a similar potential for reliable detection. 2 Opportunities and Challenges (We adopt the convention of identifying the defining figura- Computationally, figures are important for four central rea- tive elements parenthetically.) sons. First, they are endemic to human language. This is 2.! The young would choose an exciting life; the old a very well established for a few tropes, such as metaphor, happy death. (young, old; life, death) [Alexis which is the central focus of Cognitive Linguistics and 2015:155] deeply entrenched in ontologies like FrameNet and Word- Net. But it is equally true of literally (a word we don't use But most semantic distortions—tropes—are far from tracta- lightly) hundreds of other figures. If we want language- ble computationally. Nor do many of them provide the tight perceptive algorithms, they must have knowledge of figure form/function coupling that has such a promising payoff for structure. Secondly, figures epitomize argument structure, Computational Argumentation. increasingly a prime concern for NLP. Again, this is well Another type of figure, schemes, are formal deviations, understood for metaphor (and simile, though it gets much shifts of expected structure, as in Sentence 1, an antime- less overt attention), which epitomize analogic argumenta- tabole (reverse lexical repetition; in this case you and your tion. Thirdly, many figures (especially the ones called country). The computational detection of figures, including schemes) work in terms of formal patterns that algorithms antimetabole, is finding success [Gawryjolek 2009; Gawr- can detect through surface analysis; our Sentence 1 illus- yjolek, Harris, and DiMarco 2009; Hromada 2011; O'Reilly trates this aspect clearly. Fourthly, they correlate with rhe- 2010; O'Reilly and Paurobally 2010; Dubremetz and Nivre torical functions (pragmatic and argumentative meaning). 2015]. We will illustrate this shortly. For now, the rejection- The work of these researchers is sometimes only loose- replacement function of Sentence 1 will have to stand. ly connected to the rhetorical traditions. Many of them, too, The contemporary scholar most responsible for the po- only concerned detection—an essential first step but one sition that rhetorical figures are constructions with especial- that doesn't get us very close to argument mining. They did ly tight couplings of form and function is Jeanne not attempt to find meaning in the figures they detected. Fahnestock, whose figural logic is brilliantly articluated in Gawryjolek [2009], Hromada [2011], Dubremetz and Nivre Rhetorical Figures in Scientific Argumentation [1999; see also Tindale 2000:69-85; Harris 2013]. Fahnestock charts rhetorical figures not only for their pragmatic contributions to everyday language but for the way they epitomize lines of 2 As Rubinelli (2006) points out, topoi are various. Aristotle argument. As she cogently shows, this position goes back at distinguished principally between common topoi, such as argument least to Aristotle, who links specific figures directly to spe- from opposites, argument from correlatives, and argument from cific lines of argument (that is, topoi). But, aside from a definition, which can be applied to arguments in any domain, and particular topoi, which can be applied in particular argument fields. In this paper we are concerned with common topoi, which align 1 We do not put Mann and Thompson's [1988] Rhetorical with rhetorical figures, but see Gladkova, DiMarco, and Harris Structure Theory (RST) in this category because, while it has made [2011, 2016] for our approach to particular epistemic topoi in oph- some valuable insights into text linguistics, it is simply incorrectly thalmic clinical research. It differs both from Rubinelli's approach named, by scholars who appear to know little or nothing about and, more generally, from the types of schemes being used in rhetoric. RST has really to do with text coherence rather than with Computation Argumentation analysis by associating "constella- rhetoric as traditionally understood, as the study of suasive lan- tions" of features, i.e., features that are linguistically, syntagmati- guage. cally, and semantically related, with specific schemes (here, topoi). 25 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) [2015], for instance, appear to have been unfamiliar with the has priority. Order doesn't matter to addition (multiplication, rhetorical functions antimetabole serves. union, etc.). Antimetabole has a small set of rhetorical functions, We have built a curated list of over 400 antimetaboles keyed to the iconicity of its formal structure (which evokes illustrating these functions, but only have space for a few balance and opposition, as well as sequence or priority). We more representative examples: have very limited space in this paper to demonstrate these Reciprocal Force rhetorical functions, so a few examples will have to suffice. 7.! A corollary of PHC [the Principle of Hierarchical One function of antimetabole is to convey Reciprocal Coincidence] is that resources flow toward political Force, illustrated by Sentence 3, Newton's third law of mo- power, and political power flows toward resources; tion. (We adopt the convention of identifying the defining or, the power of state and of capital typically appear figurative elements parenthetically.) in conjunction and are mutually reinforcing. (re- 3.! If you press a stone with your finger, the finger is al- sources / political power) [Sartwell 2014] so pressed by the stone. (stone / finger) [Newton 8.! Women are changing the universities and the uni- 1803.1 [1687]:15] versities are changing women. (women / universities) Newton's third law is often expressed as "for every action, [Greer 1988: 629] there is an equal and opposite reaction," but Newton's own Reciprocal Specification argument favored the antimetabole, whose very structure 9.!The negation of a conjunction is the disjunction of suggests "equal and opposite" (We give the example in Eng- the negations and the negation of a disjunction is lish, but Newton's original Latin is also antimetabolic.) the conjunction of the negations. (negation of a A very similar rhetorical function of antimetabole is to conjunction / disjunction of the negations) [De convey Reciprocal Specification, a kind of mutual defini- Morgan's law; traditional] tion, illustrated by Sentence 4: 10.! Anger and depression, the pop-psych books tell us, 4.! Gay rights are human rights, and human rights are are two sides of the same coin: depression is anger gay rights. (human rights / gay rights) [Clinton 2013: suppressed, anger is depression liberated. (depres- 0:08-0:12] sion / anger) [Hertzberg 2008] In this phrase the notions of human rights and gay rights are Comprehensiveness reciprocally identified with each other. You can't have one 11.! I meant what I said and I said what I meant. unless you have the other. (meant / said) [Seuss 1940] Another rhetorical function of the antimetabole is to 12.! Whether we bring our enemies to justice or bring convey Comprehensiveness, illustrated by the ordinary- justice to our enemies, justice will be done. (our language example, Sentence 5: enemies / justice) [Bush [and Frum] 2001] Irrelevance of Order 5.! A place for everything, and everything in its place. (place / everything) [Traditional] 13.! With a similar qualification, in the Cambridge The reverse repetition in Sentence 5 shifts from reciprocal Grammar of the English Language, a head 'plays force to a reciprocal coverage, largely because it has prepo- the primary role' in 'determining the distribution of sitional predication rather than the transitive predication of the phrase' (introductory chapter signed by Pullum Newton's Sentence 3. We call this function comprehensive- and Huddleston, in Huddleston and Pullum ness because the sequential iconicity means a back-and- 2002:24) (Pullum / Huddleston) [Matthews forth, alpha-to-omega, omega-to-alpha coverage of some 2007:24] domain—in this case, the domain of tidiness. All things 14.! "Spanglish," [is] the combination of Spanish and have assigned places; all places have their assigned things. English (or English and Spanish) (Spanish / Eng- lish) [Unknown, "Western Spanglish Language"] A fourth rhetorical function of the antimetabole is to It is these functions, coupled with the relative ease of rhetor- convey Irrelevance-Of-Order, well known from algebra and ical-scheme detection, that make rhetorical figures so prom- predicate calculus: ising for computational tasks in which comprehension is 6.!m + n = n + m (m / n) [Traditional; commutative central, like argument mining and text summarization. principle] Again, however, there are challenges. They are not as There are other ways to express the principle of commuta- thorny as the challenges of most tropes because they con- tion, but none as natural and iconic as formulae like 6. Op- cern surface analysis, not semantic plumbing. But they exist. posite sequences of the same variables, on either side of the In particular, figures rarely come in isolation. The Kennedy- same operator, pivoted by a predication of identity, equiva- Sorenson maxim, for instance (Sentence 1), is an antime- lence, or equality inescapably means that neither sequence tabole (you / your country). But it is also an antithesis (ask not X / ask X). It is, thirdly, a mesodiplosis (clause-medial 26 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) repetition; here, can do occurs in the middle of both claus- early stages, but we believe it holds considerable promise, es). and we believe machine-learning corpus studies can be ex- We call this phenomenon, when figures co-occur and tremely helpful, especially for the challenges and opportuni- mutually reinforce each other, stacking. It presents both a ties of stacking. challenge and an opportunity. It is a challenge because ra- Figural stacking, as we come to understand the func- ther than detecting a single figure or multiple independent tional combinatorics better, is perhaps the greatest promise figures, we need to detect overlapping figures. It is an op- of rhetorical figures for computational understanding of portunity because the functions are enhanced and stabilized natural language. Our paradigm example, which stacks the under stacking. When two or more figures coincide in the schemes antimetabole, mesodiplosis (both entailing ploche), same utterance, the functions they convey are highly con- and the trope antithesis provides a pitch-perfect example of sistent. Formal stacking breeds a functional conspiracy. the rhetorical function, Reject-Replace. A computational For instance, when antimetabole stacks with antithesis analysis of Kennedy's inaugural address tuned to the work- (conjoined or highly proximal opposite predications), the ings of rhetorical figures could tell us what the address was joint function is primarily to reject the negated predication about—namely, the rejection of an ethos of entitlement and utterly and replace it with the positive predication. Again, its replacement with an ethos of responsibility—virtually on Sentence 1 is our paradigm, but here are two more: the basis of this particular stacking (along with, of course, the lexical semantics of you, your country, and so on) Reject-Replace We can, and should, rely on rhetoricians to tell us what 15.! We don't build services to make money; we make the functions of certain figures and certain figure-stacks are, money to build better services. (services / money) at least in these early stages. But the rhetorical tradition is [Mark Zuckerberg, qtd in Magid 2012] haphazard, and sometimes conflicting. The terminology 16.! Plain statement must be defined in terms of meta- alone is forbidding. As much as computational argument phor, not metaphor in terms of plain statement. studies can benefit from a better understanding of rhetorical (plain statement / metaphor) [Buck 1899: 69] figures, rhetorical figures can benefit from computational The stacking of antithesis with the Reciprocal Specification studies of form and meaning. (And, yes, that sentence was function of antimetabole, however, generates a very specific an antimetabole, stacked with mesodiplosis; the rhetorical Subclassification function, as in Sentences 17 and18, which function is Reciprocal Force, modulated by the possibility say, respectively, that ultrabooks are a class of laptop, and modality of can.) compounds are a class of molecules: The path forward is to bootstrap rhetoricians' Subcatetorization knowledge by way of annotation, marked-up text corpora, 17.! Ultrabooks are laptops after all, but not all laptops and machine learning, so that computationally mined data are ultrabooks. (ultrabooks / laptops) [Unknown can start to tell them what functions figures have, through 2013, "Ultrabooks vs Laptops"] confirmation, through refinement, and through new discov- 18.! All compounds are molecules (since compounds eries, all of which we have good reason to anticipate. consist of two or more atoms), but not all mole- We can discover the proportionality of certain stackings cules are compounds (since some molecules con- (anecdotally, both antithesis and mesodiplosis strongly co- tain only atoms of the same element). (compounds occur with antimetabole), the correlation of the stackings / molecules) [Volpe 1975:7] with the rhetorical functions (as specified above, on the ba- Some instances of stacking are so common and so predicta- sis of limited and anecdotal research). At its best, this work ble as to be entailments. Ploche, for instance, is simple lexi- can revolutionize Computation Argumentation studies and cal repetition, so it always stacks with antimetabole (reverse rhetoric in the way corpus linguistics revolutionized lexi- lexical repetition). If you find the latter, you always find the cography and established ontologies like WordNet and former. Rhetorically, ploche conveys the pragmatic func- Framenet. But even at its least productive, we are very con- tion, Identity-Of-Reference, which is always embedded in fident of finding important form/function correlations that the functions of antimetabole (if you have reciprocal force can importantly inform Computation Argumentation and or reciprocal specification, for instance, you have identical discourse studies, in novel ways. entities in a reciprocal relationship). Further, mesodiplosis clause-medial lexical repetition) also entails ploche as well, 3 Figure Detection conveying an identical force when the mesodiplosis is a transitive verb (e.g., Sentences 3, 7, and 8), identical speci- There have been limited successes in figure detection over fication when it is a copula verb (e.g., Sentences 4, 9, and the past several years due to strict figure mappings and 10). some unreported data [Gawryjolek 2009; Gawryjolek, Har- ris, and DiMarco 2009; Hromada 2011; Strommer 2011; We do not pretend to have a full and complete mapping Alliheedi 2012; Alliheedi and DiMarco 2012; Dubremetz of form to function, however. This work is still in the very 27 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) and Nivre 2015]. But it has been restricted both in method tingly, because of the way he defined antimetabole. He was and in scope and has been unconcerned with function. unaware he was doing so and does not report his results. Hromada's [2011] work, for instance, was very success- Dubremetz and Nivre [2015] found some antitheses, be- ful at the detection of antimetabole, but he defined antime- cause they were using negation as a correlative of antime- tabole in an overdetermined way. Using the Waterloo Figure tabole (which markedly improved their success), but they Representation Notation [Harris and DiMarco 2009]3 were not looking for them and did not report their results. (where W stands for Word, the subscripts indicate identity, Only Gawryjolek [2009] looked for stacked figures, but that and "…" represents other linguistic matter, extraneous to the was not his focus. He did not interpret the stacking at all, figure, possibly null), Hromada defines antimetabole as nor report on the statistics. He was merely looking for mul- , whereas a more accurate defi- A B C C B A tiple figures in the same corpus, many of which overlapped. nition (as in Harris and DiMarco [2009]) is simply [W]a … . And, of course, detecting rhetorical figures is the be- [W]b … [W]b … [W]a. That is, Hromada searched only for ginning of the story. We know, from millennia of human- antimetaboles when they stacked with mesodiplosis (clause istic research, that linguistic forms correlate with rhetorical medial repetition), when there was no additional linguistic functions—that figures do communicative work beyond matter. 'mere aesthetics'—and we can thank Fahnestock for collat- Most of these researchers did not look for stacked fig- ing and expanding this research so clearly in the contempo- ures, except accidentally. Hromada [2011] looked for other rary era. On the basis of this research, we can use the de- figures (anadiplosis, epanaphora, and epiphora), but only in tected figures to help chart meanings—sometimes very fun- isolation.4 Conversely, he 'searched' for mesodiplosis unwit- damental meanings, like the Reject-Replace antithetical antimetabole of Example 1, which diagnoses the exact ten- ure of Kennedy's inaugural address. 3 Hromada [2011] calls this notation, Rhetoric Figure Repre- sentation Formalism or RFRF, which he adapts from Harris and But how well do the form-function couplings that hu- DiMarco [2009]. Harris and DiMarco did not label their formalism manists have found stand up beyond the small sampling of in their paper, but we use their term for it here. The WFRN is a discourse that humanists have been able to explore—in the formalism for the general structure of rhetorical schemes, but it conversations, news stories, opinion pieces, blogs, review does not represent functions at all. For this we need a richer sys- articles, short stories, tweets, scientific arguments, and so tem, which may be provided by Construction Grammar (e.g., on, that populate the vast sea of everyday and specialist hu- Hoffmann and Trousdale 2013). For an argument to this effect, see man discourse? We don't know, but corpus studies should [Turner 1997:55-60]. Certainly, there are idiomatic deployments of tell us. Do Reciprocal Force antimetaboles collate with tran- these patterns that fit the Construction Grammar mandate fairly sitive verbs, for instance? Do Reciprocal Specification and well. For instance, the well-known antimetabolic Easier-to-take- the-A-out-of-B-than-the-B-out-of-A catchphrase is the sort of ex- Subcategorization antimetaboles collate with copulas? Do pression that preoccupies Construction Grammarians: Irrelevance-of-Order antimetaboles collate with conjunc- i.![I]t was easier to take the girl out of the brothel than to take tions and disjunctions? How frequently does mesodiplosis the brothel out of the girl. [Walker 2011: 72] collate with antimetabole? What other stackings are there, ii.!It was much easier to take Kuhn out of Harvard than Harvard with what functional implications? We have intuitions, and out of Kuhn. [Fuller 2001: 387] much particularized research (that is, specific works of rhe- iii.!It was found easier to take the evacuee out of the slum than to torical criticism), but intuitions and particularized research take the slum out of the evacuee. [Waller 1940: 30] need to be tested on copora. iv.!After twenty-five years in the field. I've traded the front seat of a 4 x 4 for a swivel chair and a desk. The change did not How do figures cluster in terms of genres? Do individ- come easily for me. As the old saying goes — it's a lot eas- ual authors have identifiable figure proclivities? Is sentiment ier to take the man out of the field than to take the field out a trigger for certain figures? Do certain argument types fa- of the man. [Unknown 1995, Oklahoma DWC 1995: 61] vour certain figures? Are there author-genre figural effects? v.!I could take Tarzan out of the jungle. Could I take the jungle Argument-sentiment figural effects? Author-sentiment? out of Tarzan? [Maxwell 2012: 254] Again, intuitions and particularized research suggest an- 4 Anadiplosis is clause-final-clause-initial lexical repetition swers; again, these need to be tested. (< … Wx >< Wx … >). Epanaphora is clause-initial lexical repeti- tion (< Wx … >< W x … >). Epiphora is clause-final lexical repeti- When multiple figures co-occur, as they almost always tion (< … Wx >< … Wx >). Note that these researchers use some- do, which functions stack, which remain independent, which what different terminology. Hromada uses anaphora for our epa- naphora, while Dubremetz and Nivre also use chiasmus for our antimetabole. In the first case, we avoid anaphora (a synonym in configurations corresponding to the same label, and with some the rhetorical tradition for epanaphora) because of its more promi- linguistic activity that really isn't figurative labeled as figures. The nent designation in Computational Linguistics, for pronouns. In the taxonomy of figures is, in short, a mess. We have developed a second, we prefer the more specialized terms. It is worth noting much more rigorous, consistent, and principled taxonomy of fig- that the terminology of rhetorical figures, resulting from over two ures at Waterloo. See Chien and Harris [2010]; Harris [2013:571- millennia of research, is highly inconsistent, with different labels 575]. for the same linguistic configurations, with multiple linguistic 28 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ones take precedence in the possibility of a conflict? Are advantages that we have adopted it in our recent work. It is there functional differences between "accidental" figures especially valuable for the flexibility it provides in creating and "designed" figures. If figures are form-function cou- one's own tags and attributes. plings, does it even make sense to speak of 'accidental' fig- Our original markup focused on the names of tags and ures (we don't speak of accidental predications or passive did not include attributes. This is adequate, using a general clauses; they just are)? markup template like the one in 19, for simple instances of This work can undoubtedly be strengthened by machine isolated ploche, such as 20a (annotated as 20b): learning. We have developed a format for annotating rhetor- 19.! ! ! ical figures, in parallel to the annotation formalisms devel- ...text... oped for part-of-speech tagging, speech-act annotation, and ! !
! so on. Corpora annotated with rhetorical figures can be used ...text... to train systems on new and more sophisticated detection ! ! !
! tasks, especially for stackings and functional correlations. ...text... ! ! !
!! 4 Challenges and Solutions ...text... ! !
! We want to come at the detection problem for rhetorical figures from the other end. There is a "serious bottleneck … ...text... !
! [from] the lack of annotated data" [Dubremetz and Nivre 2015]. We believe that texts curated by rhetoricians, marked 20.! a. He hated white oppression and white domina- up for all occurrences of certain rhetorical figures, will pro- tion, not white people themselves. (white) vide rich data for machine learning, and we have developed b. ! He hated ! ! ! ! ! ! white! ! ! ! oppression and ! ! ! white!! ! ! ! domination, not ! ! ! white! ! ! ! ! ! people themselves. ! an annotation scheme to structure the data. The labels in our figure annotation scheme are in effect features pertaining to The container tag marks off the beginning of the figure identification and classification. Algorithms trained text while the
tag reveals the beginning posi- on such data will, in turn, be more fully equipped for auto- tion of the figure. The vital tags of this markup are the tags which encompass the defining The Extensible Markup Language (XML) is widely features of a figure. In Example 20b they are . Figure 1 illustrates the hierarchical nature of the markup ures. The main challenges of using such an annotation for 20b. These markers provide information about elements scheme is in the intricacies that figure-rich texts present. such as letter groups (A-Z are the same across tags if the These intricacies include stacking figures and interpenetrat- content of the tag has the same word or group of letters) and ing figures. The annotation methods developed in this paper relative positioning (1 to 3). Issues with this markup arise address these two issues. The desire is to develop an annota- quickly, but the main idea of marking defining elements still tion scheme that will highlight the structure of rhetorical has its uses. figures allowing them to be more easily understood by There are syntactic and semantic issues that form when computational learning-based algorithms while keeping applying the markup to more figure-rich texts. By analyzing figures intact. Now, using XML we analyze the develop- an example (1, repeated here for convenience), we demon- ment process of a suitable markup. strate the problems. (A fully formatted example is given in We have used HTML in the past for annotating fig- Figure 2.) ures—specifically, JANTOR (Java ANnotation Tool Of 21.! a. Ask not what your country can do for you. Ask Rhetoric) allowed for "manual and automated annotation of what you can do for your country. (your country / files in HTML format" [Gawryjolek 2009; Gawryjolek, Har- you; ask not x / ask y) [Kennedy (and Sorensen) ris, and DiMarco 2009]—but XML presents such obvious 1961] 29 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) b. ...! ......! ......! ... Ask not what Figure 3: Problems arise from ending the antithesis tag be- fore ending the antimetabole tag. your country Figure 3 displays the complexity of this version of the annotation scheme. The dashed arrows represent the conse- can do for quences of tagging when you need to mark the end of the you antithesis before the end of the antimetabole; there is no hierarchy, or perhaps only a partial and .fragmentary hierar- Ask chy, but it creates havoc. The nesting, if we can even call it that, is incomplete, falling outside XML's basic capacities. what Hierarchy problems also become apparent as you element/number>!tags are sub-tags of .! can do for The improved annotation scheme recognizes the above problems and attempts to resolve them. It focuses on high- your country lighting the defining elements of figures. A general markup is shown in number 22 (a fully formatted example for this markup is provided in Figure 5, given between the Conclu- sion and the Acknowledgements for purposes of layout): 22.! ...! Figure 2: The full hierarchical structure of Example 21b. text...! Bolding indicates syntax errors. If one wanted to create a hierarchy, say in the instance that figure1 always accompanies figure2 meaning figure1 is a subpart of figure2, this is still possible. The XML from the A syntax issue arises in Example 21b where multiple figure example would look like: … tags close in the incorrect parent tag. For example, we have . …! ……. Figure 3 below archy. shows the other figures that also fall to this error. As Figure 4 reveals, the improved markup focuses on The syntax of XML does not allow the interpenetration tagging parts of strings and providing them with more in- of tags. When considering this problem, it becomes apparent formation. The figure focusses on antithesis, antimetabole that the tags marking off the beginning and endings of fig- and ploche, where ploche referes to ploche1. Notice how we ures are causing the most trouble. Further analysis reveals that these tags are unnecessary. The key components of a figure are their defining elements such as repeating or con- trasting elements (words, sounds). The semantic complication has to do with nesting XML tags. Arbitrary hierarchies can form when some figures hap- pen to appear inside others. Rhetorical figures may, howev- er, contain other rhetorical figures which do observe hierar- chical properties. Thus we require a method that is more explicit about creating hierarchies. This is achievable with the introduction of attributes and thus the creation of a new annotation scheme. 30 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ABBA structure. To differentiate between them we write the position attribute of the first A as Antimetabole-1 and the second as Antimetabole-2. Using these tags and attributes to annotate rhetorical figures in text would create the required computational structure for figure analysis. 5 Conclusion The computational uses of rhetorical figures are indisputa- ble. We can clearly see their ability to enhance fields such as author and genre detection, NLP systems, and argumenta- tion mining. We also know how intricate they can become. Figure 4: Improved annotation scheme, tagging parts of Stacking and intersecting with one another, many figures strings and providing more information. can be overlooked as observed in the previous works men- tioned here. To exploit their uses, yet overcome their intri- are able to combine the antimetabole and ploche tags into cacy, a rhetorical figure markup becomes imperative and one attribute and avoid a hierarchy should be thought of as such. Using attributes also helps to separate information Our annotation scheme represents the first move in about a tag providing algorithms with easier access. The what we hope will be a line of research that others will find lettergroup attribute grants information on which tags sur- profitable to join. The outline of the annotation scheme has round the same word or, as the names suggests, groupings been developed, and now the flexibility of XML allows oth- of letters. If the letters inside the tag are the same as inside ers to improve and customize the mechanism for their own another tag the attribute will end in the same character. The uses. The eventual goal is to develop a markup scheme that position attribute clarifies the location of the letter group in provides computationally accessible information for all rhe- the figure. For example, antimetabole has two A's in its torical figures. Figure 5: The full hierarchical structure of Sentence 1 (repeated as 21a), in accord with the tagging specified in 22. Kelly, Isabel Li, Ricky Rong, and Terry Stewart; our inter- Acknowledgements national colleagues, including Cliff (again), Marie Du- We would like to thank Cliff O'Reilly for valuable bremetz, Jelena Mitrovic, Chris Reed, and James Wynn; and XML advice, as well as our colleagues at the University of the Social Sciences and Humanities Research Council of Waterloo, including Elena Afros, Adam Bradley, Ashley Canada for financial assistance. We also thank three anon- 31 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ymous reviewers for CMNA, for their helpful queries and ing rhetorical figures. Proceedings, CMNA IX (Computa- suggestions. Our figure-annotation research is part of an tional Models of Natural Argument), held with IJCAI-09, overall project of Computational Rhetoric at the University Pasadena, CA, July 13. of Waterloo, organized around a comprehensive OWL- Gladkova, Olga, Chrysanne DiMarco and Randy Harris, based ontology of rhetorical figures. 2016. Argumentative meanings and their stylistic configura- tions in clinical research publications. Argument & Compu- References tation 6.3: 310-346. Alexis, Andre. 2015. Fifteen dogs. Toronto: Coach House Gladkova, Olga, Randy Allen Harris and Chrysanne Di- Books. Marco. 2011. Schematic organization of clinical decision- Alliheedi, Mohammed. 2012. Multi-document summari- making: Findings from qualitative corpus analysis. Proceed- zation system using rhetorical information. Master of Math- ings, CMNA XI (Computational Models of Natural Argu- ematics thesis, Cheriton School of Computing, University of ment), 7 August 11, San Francisco, CA. Waterloo. [Supervised by Chrysanne DiMarco; Randy Allen Grasso, Floriana. 2002a. Towards a framework for rhetor- Harris, Second Reader.] ical argumentation. EDILOG 2002 - Proceedings of the 6th Alliheedi, Mohammed, and Chrysanne DiMarco. 2014. Workshop on the Semantics and Pragmatics of Dialogue, J. Rhetorical figuration as a metric in text summarization. Bos, M.E. Foster and C. Matheson (eds), Edinburgh, UK, 4- Proceedings, 2014 Canadian Artificial Intelligence Confer- 6 September 2002, p. 53-60. ence, Montreal, QC, May 6-9. Grasso, Floriana. 2002b. Towards computational rhetoric. Buck, Gertrude. 1899. The metaphor: A study in the psy- Informal Logic 29.3: 195-229. chology of rhetoric. in Contributions to Rhetorical Theory. Green, Nancy. 2010. Representation of argumentation in Ed. Fred Newton Scott. Ann Arbor: Inland. text with Rhetorical Structure Theory. Argumentation Bush, George W [and David Frum]. 2001. Address before 24.2:181-196. a joint session of the congress on the United States response Green, Nancy. 2015. Identifying argumentation schemes to the terrorist attacks of September 11. The American Pres- in genetics research articles. In Proceedings of the Second idency Project. Gerhard Peters and John T. Woolley. Workshop on Argumentation mining, North American Con- http://www.presidency.ucsb.edu/ws/?pid=64731 ference of the Association for Computational Linguistics Chien, Lynn, and Randy Allen Harris. 2010. Scheme (NAACL), 12-21, Denver, CO, 2015. trope chroma chengyu: Figuration in Chinese four-character Greer, Germaine. 1988. The proper study of womankind. idioms. Cognitive Semiotics 10.6:155-178. Times Literary Supplement (3-9 June). Clinton, Hilary. 2013. Statement for the Americans for Harris, Randy Allen. 2013. Figural logic in Mendel's Ex- Marriage Equality campaign. Human Rights Campaign. (18 periments on plant hybrids. Philosophy and Rhetoric 46.4: March.) http://www.hrc.org/videos/videos-hillary-clinton- 570-602. supports-marriage-equality#.UXAbPys4Xvl Harris, Randy Allen, and Chrysanne DiMarco. 2009. Crosswhite, James. 2000. Rhetoric and computation. Constructing a rhetorical figuration ontology. Symposium on Symposium on Argument and Computation. Bonskeid Persuasive Technology and Digital Behaviour Intervention, House, Perthshire, Scotland. June 27. Convention of the Society for the Study of Artificial Intelli- Dubremetz, Marie, and Joakim Nivre. 2015. Rhetorical gence and Simulation of Behaviour (AISB), Edinburgh, figure detection: the case of chiasmus. Proceedings of Scotland, April. NAACL-HLT Fourth Workshop on Computational Linguis- Hertzberg, Hendrik. 2008. The spat. New Yorker (Febru- tics for Literature, Denver, CO, June 4. ary 11). Fahnestock, Jeanne. 1999. Rhetorical figures in scientific Hoffmann, Thomas, and Graeme Trousdale, eds. 2013. argumentation. New York: Oxford University Press. The Oxford handbook of Construction Grammar. New Fuller, Steve. 2001. Thomas Kuhn: A philosophical histo- York: Oxford University Press. ry for our times. Chicago: Chicago University Press. Hromada, Daniel Devatman. 2011. Initial experiments Gawryjolek, Jakub J. 2009. Automated annotation and with multilingual extraction of rhetoric figures by means of visualization of rhetorical figures. Master of Mathematics PERL-compatible regular expressions. Proceedings of the thesis, Cheriton School of Computing, University of Water- Second Student Research Workshop associated with RANLP loo. [Supervised by Chrysanne DiMarco; Randy Allen Har- 2011, Hissar, Bulgaria. ris, Second Reader.] Kanoksilapatham, Budsaba. 2003. A corpus-based inves- Gawryjolek, Jakub J., Randy Allen Harris, and Chrysanne tigation of scientific research articles: Linking move analy- DiMarco. 2009. An annotation tool for automatically detect- sis with multidimensional analysis. PhD dissertation, De- partment of Linguistics, Georgetown University. 32 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) Kanoksilapatham, Budsaba. 2005. Rhetorical structure of tion, Cheriton School of Computing, University of Water- biochemistry research articles. English for Specific Purposes loo. [Supervised by Chrysanne DiMarco; Randy Allen Har- 24.3:269-292. ris, Committee Member.] Kennedy, John F. [and Theodor Sorensen]. 1961. Inaugu- Tindale, Christopher W. 2000. Acts of arguing: A rhetori- ral Address. cal model of argument. Albany, NY: State University of http://www.presidency.ucsb.edu/ws/index.php?pid=8032 New York Press. Magid, Larry. 2012. Zuckerberg claims we don't build Teufel, Simone, J. Carletta and M. Moens. 1999. An an- services to make money. Forbes (1 February) notation scheme for discourse-level argumentation in re- http://www.forbes.com/sites/larrymagid/2012/02/01/zuckerb search articles. In Proceedings of the Ninth Conference on erg-claims-we-dont-build-services-to-make-money/ European Chapter of the Association for Computational Mann, William C., and Sandra A. Thompson. 1988. Rhe- Linguistics, Stroudsburg, PA, 110-117. torical Structure Theory: Toward a functional theory of text Teufel, Simone. 2010. The structure of scientific articles: organization. Text 8.3:243-281. Applications to citation indexing and summarization. San Matthews, P. H. 2007. Syntactic relations: A critical sur- Francisco: CSLI Publications. vey. Cambridge Studies in Linguistics 114. Cambridge: Turner, Mark. 1997. Figure. In Figurative language and Cambridge University Press. thought, Cristina Cacciari, Ray Gibbs, Jr., Albert Katz, and Maxwell, Robin. 2012. Jane: The woman who loved Tar- Mark Turner, eds. New York: Oxford University Press, 44- zan. New York: Macmillan. 87. Newton, Sir Isaac. 1803 [1687]. The mathematical prin- Unknown author. Unknown date. Western Spanglish lan- ciples of natural philosophy. Three volumes. Trans. by An- guage: The United States unofficial language. Western drew Motte. London. H.D. Symonds. women in leadership and innovation: Discovering the well- springs of metaphorical voices. O'Reilly, Cliff. 2010. Lassoing rhetoric with OWL and http://westernwomenleadershipinnovation.net/western- SWRL. Unpublished MSc dissertation. Available: spanglish-language.html http://computationalrhetoricworkshop.uwaterloo.ca/wp- con- Unknown. 1995. Oklahoma Department Of Wildlife Con- tent/uploads/2016/06/LassoingRhetoricWithOWLAndSWR servation. Outdoor Oklahoma. Volumes 51-52. L.pdf Unknown. 2013. Ultrabooks vs Laptops. 2013. Java (Jan- O'Reilly, Cliff, and Shamima Paurobally. 2010. Lassoing uary 26). http://java- rhetoric with OWL and SWRL. Unpublished. Available maheshyadav.blogspot.ca/2013/01/ultrabooks-vs- http://www.academia.edu/2095469/Lassoing_Rhetoric_with laptops.html. _OWL_and_SWRL Volpe, Peter E. 1975. Man, nature, and society: An intro- Perelman, Chaïm, and Lucie Olbrecht-Tyteca. 1969. The duction to biology. Dubuque IA: W. C. Brown Company. new rhetoric: A treatise on argumentation. Translated by Walker, Daniel. 2011. God in a brothel: An undercover John Wilkinson. Notre Dame: Notre Dame University Press. journey into sex trafficking and rescue. Downers Grove, IL: Reed, Chris, and G.W.A. Rowe. 2004. Araucaria: Soft- InterVarsity Press. ware for argument analysis, diagramming and representa- Waller, Willard. 1940. War and the family. Hinsdale, Il: tion. International Journal of AI Tools 13.4):961-980. The Dryden press. Reed, Chris , and Timothy J. Norman, editors. 2003. Ar- gumentation machines: New frontiers in argument and computation. Dordrecht, The Netherlands: Kluwer. Rubinelli, Sara. 2006. The ancient argumentative game: Topoi and loci in action. Argumentation 20.3:253-272. Sartwell, Crispin. 2014. The left-right political spectrum is bogus. The Atlantic (June 20). http://www.theatlantic.com/politics/archive/2014/06/the- left-right-political-spectrum-is-bogus/373139/ Seuss, Dr. [Theodore S. Geisel.] 1940. Horton hatches the egg. New York: Random House. Strommer, Claus. 2011. Using rhetorical figures and shal- low attributes as a metric of intent in text. Doctoral Disserta- 33