=Paper=
{{Paper
|id=Vol-1419/paper0034
|storemode=property
|title=Sentence Trimming in Service of Verb Phrase Ellipsis Resolution
|pdfUrl=https://ceur-ws.org/Vol-1419/paper0034.pdf
|volume=Vol-1419
|dblpUrl=https://dblp.org/rec/conf/eapcogsci/McShaneNB15
}}
==Sentence Trimming in Service of Verb Phrase Ellipsis Resolution==
Sentence Trimming in Service of Verb Phrase Ellipsis Resolution Marjorie McShane (margemc34@gmail.com) Sergei Nirenburg (zavedomo@gmail.com) Petr Babkin (petr.a.babkin@gmail.com) Cognitive Science Department, Rensselaer Polytechnic Institute 110 8th Street, Troy, NY, USA Abstract This partial-coverage approach has potential benefits for two communities. For mainstream NLP, treating at least We describe two methods of improving the coverage of a system that automatically detects and resolves verb phrase some elided VPs is preferable to not treating any. For the ellipsis. Both methods involve recognizing non-core sentence intelligent agent community, we believe it is essential for constituents, thereby making the core constituents more easily agents to be able to judge their own confidence in all manipulated by the ellipsis detection and resolution functions. aspects of language processing, then use those confidence A system evaluation shows increases both in the number of estimates to guide their next move. So, in cases of high sentences in which ellipsis is detected, and in the percentage confidence in language analysis, the system can boldly of elliptical sentences that can be treated by the system’s proceed to decision-making and action, whereas in cases of methods. low confidence, it should seek clarification from its human Keywords: ellipsis; VP ellipsis; natural language processing; collaborator. sentence trimming; syntactic pruning Although the initial evaluation of our system (McShane and Babkin 2015) was promising, one area for improvement Introduction was low coverage of examples, both with respect to Ellipsis is defined as the non-expression of linguistic detecting ellipsis and with respect to selecting which material that can be reconstructed by the interlocutor. The examples to resolve. Both of these are improved upon in the work reported here focuses on detecting and resolving verb enhanced system reported here. However, to understand the phrase (VP) ellipsis that is licensed by a modal or auxiliary nature of the improvements, one must first understand the verb. For example, in (1) the modal verb can licenses basics of the original system. ellipsis of the meaning of its scope, get bragging rights. Detection of VP ellipsis was carried out very simply: any (Elided categories are indicated by [e]; their sponsors – modal or auxiliary verb directly preceding a hard discourse typically, antecedents – are indicated in italics.)1 break – defined as a period, semi-colon or colon – was considered an ellipsis licensor (cf. (1)). The reason for (1) And you try to get bragging rights if you can [e]. orienting around hard discourse breaks was practical: for our initial system development, we sought a cheap, fast McShane and Babkin (2015) report a VP ellipsis method of finding elliptical examples in a large corpus resolution system that is novel in three ways. First, NLP without too many false positives. Although this method did (natural language processing) systems tend not to treat many offer high precision, it had less than optimal recall. kinds of ellipsis since it is resistant to the currently In the new version of the system, we expand the detection dominant method of supervised machine learning, which heuristics to also include modal and auxiliary verbs relies on annotations of visible (not elided) text strings. occurring before a soft discourse break, defined as a comma, Second, our development methodology is knowledge-based, dash, or open parenthesis. However, this detection heuristic leveraging human-oriented linguistic insights as heuristic is more error-prone because “[modal] + [soft discourse evidence. In essence, we are trying to teach the machine to break]” does not always signal ellipsis: the modal’s do what people do by modeling (to some degree) how complement can actually occur later on in the sentence. people seem to do it. This places the work squarely in the E.g., in (2) the scope of tried to is check with other several paradigm of AI-NLP (artificial-intelligence-inspired NLP). sources. Third, since both detecting and resolving ellipsis are difficult problems, the system is configured to (2) “I've always tried to, when we get intelligence, check independently select which examples it believes it can treat with other several sources, ...” with reasonably high precision, and treat only those. To weed out false positives, we implemented 1 All cited examples except for (4), (22a) and (23a) – which parenthetical detection functions that attempt to determine were invented – are from the Gigaword corpus (Graff and Cieri the role of each soft discourse break that follows a modal or 2003), which was used for system evaluation. Both the Gigaword auxiliary verb. The punctuation mark could either (a) corpus and the COCA corpus (Davies 2008-) were used for introduce a parenthetical that is then followed by the scope linguistic analysis. 228 of the modal/auxiliary (i.e., there is no VP ellipsis) or (b) dependency, conj(wanted-2, did-10), and three gray-listed not introduce a parenthetical, in which case the structure is dependencies – xcomp(wanted-2, try-4), xcomp(try-4, start- likely elliptical. To summarize, the first advancement 6), xcomp(start-6, juggle-8). reported here is the use of parenthetical detection strategies that permit the system to detect ellipsis before soft discourse (4) John wanted to try to start to juggle and did [e]. breaks; this increases system coverage at the stage of ellipsis detection. Once the system detects a Simple Parallel Configuration, As concerns resolution, the system attempts to resolve it still needs to resolve the ellipsis. Here, the decision space only those cases of ellipsis that it believes it can treat with can be complex. Although the whitelisted dependency reasonable confidence. Below we briefly describe two of its indicates which clause contains the sponsor, the system still resolution strategies. must determine which elements from that clause should Pattern matching. We have recorded nine broadly-defined participate in the resolution: e.g., are modal verbs and phrasal patterns (which divide into many more subpatterns) adverbs part of the sponsor or not? (For example, in (4) the that include VP ellipsis, along with their ellipsis resolution leftmost member of the sponsor might be interpreted as try strategies. For example, (3) matched the pattern what NP or start). In the reported evaluation, the system is *can2 and the system correctly indicated that the sponsor responsible for selecting only the correct verbal head of the was say. sponsoring VP. So, whereas it is responsible for decisions about including/excluding modal verbs like want to, try to, (3) Vincent Schmid, the vicar of the cathedral, said prayer and start to in (4), it is not responsible for decisions about and music would say what words could not [e]. other non-head elements, such as adverbs. Orienting around Simple Parallel Configurations captures We will not detail the pattern-matching strategy here, the intuition that some elliptical contexts are quite simple since we have no enhancements to report; however, it is and straightforward, whereas others are not. It makes sense important to understand that pattern matching is the first to prepare agents to resolve the simpler cases in the near ellipsis resolution strategy to fire, and it takes care of many term as we work toward conquering the more difficult cases cases of VP ellipsis. over time. The Simple Parallel Configuration. Another strategy for Making more contexts look Simple Parallel. Some treating VP ellipsis is to identify contexts that we call elliptical sentences that are not Simple Parallel are truly Simple Parallel Configurations, which are structurally difficult. For example, (5) offers several competing simple enough to be treated without the need for deep candidate sponsors and requires both world knowledge and reasoning or world knowledge. We operationalized the close attention by a human to resolve the ellipsis. notion of Simple Parallel Configuration in terms of Stanford CoreNLP (Manning et al. 2014) dependency parses. (5) The former Massachusetts governor called on United Configurations are deemed Simple Parallel if they contain: Nations Secretary General Ban Ki-moon to revoke Ahmadinejad’s invitation to the assembly and warned • exactly one instance of a “whitelisted” dependency – Washington should reconsider support for the world i.e., a conj, advcl or parataxis dependency that links body if he did not [e]. the modal/auxiliary element licensing the ellipsis with an element from the sponsor clause;3 Our system does not currently attempt to treat contexts like • no instances of a “blacklisted dependency'” – i.e., a these. ccomp, rcmod, dep or complm dependency, all of But other non-Simple Parallel examples look very much which indicate various types of embedded verbal like Simple Parallel Configurations if only some parts were structures that complicate matters by offering omitted. For example, the boldface portion of (6) would be competing candidate sponsors; very straightforward for ellipsis resolution if only the • one or more instances of a “gray-listed” dependency, portion formatted using strikethrough would disappear (the defined as an xcomp or aux dependency that takes as portion after the quoted speech is irrelevant for the process its arguments matrix and/or main verbs from the of ellipsis resolution). sponsor clause. (6) “We're celebrating the fact that we’re living in a time For example, the parse for (4) includes one whitelisted where, when we want to be in the kitchen, we can [e],” says Tamara Cohen, Ma’yan program director. 2 The asterisk indicates any inflectional form of this verb or This leads us to the second advancement reported here, select related verbs. 3 Conj dependencies that take non-verbal arguments are ignored, which is the use of sentence trimming strategies that permit since they can reflect, e.g., nominal conjunction structures such as the system to transform complex sentences into simpler ones Lulu and Fido. Definitions of the dependencies can be found in that can be treated as Simple Parallel Configurations. Stanford CoreNLP dependencies manual, found here: Sentence trimming follows the psychologically motivated http://nlp.stanford.edu/software/dependencies_manual.pdf. 229 hypothesis that some sentence constituents are more salient break: an elided VP is posited after the modal and the post- to the meaning of the utterance than others. Focusing on the punctuation portion of the sentence is disregarded for core ones can have useful side-effects for the difficult task subsequent processing. of automatic ellipsis resolution. Of course, parenthetical detection can be framed as a Sentence Trimming subclass of sentence trimming, since one way to trim a sentence is to detect and remove parenthetical information. To simplify complex sentences into, ideally, Simple Parallel However, since parenthetical detection and overall sentence Configurations, we implemented 7 sentence trimming trimming are exploited at different points and to different procedures, which rely on the output of Stanford CoreNLP ends in the system, we treat them separately in the narrative parsing. The procedures can, individually or in combination, below. transform a complex context into one that can be treated as a Simple Parallel Configuration. We briefly describe each trimming strategy in turn. Illustrative examples indicate the Parenthetical Detection trimmed part using strikethrough. To reiterate, when we expanded our ellipsis detection method to include the detection of elided VPs before soft 1. Strip sentence adverbs. We created a list of over 500 discourse breaks (in addition to hard discourse breaks), we sentence adverbs, based on a combination of introspection had to introduce a parenthetical detection strategy to avoid and searches using the online version of the COCA corpus false positives. This strategy operates on the output of (Davies 2008-).4 Stanford CoreNLP parsing and is comprised of 12 functions, presented below with examples. Note that one (8) Even after that I was thinking about sprinting and cannot rely on paired punctuation marks to delineate being in front, but I could not [e]. parentheticals since one or both punctuation marks can be omitted. 2. Strip pre-punctuation clause. The system walks backwards through the text. If it encounters a comma, dash, 1. The prn dependency in the Stanford CoreNLP parse semi-colon or colon, it strips it off along with the preceding detects some cases of parentheticals directly: , they context. If the remaining portion is a Simple Parallel wondered, configuration, it resolves the ellipsis. If not, it continues 2. Conjunction + (NPSUBJECT) + modal verb: and did, and walking back through the text to the next punctuation mark. need not, or wishes to, and one should not 3. Prepositional phrase: among others, at any price (9) I was OK, I tried to find my game but I couldn’t [e]. 4. Adverb: however, therefore, potentially 5. Adverbial phrase: absolutely not, more than ever 3. Strip speech/thought verb and preceding context. The 6. Conjunction + clause: as he put it, as you know system walks backwards through the text. If it encounters 7. (Quasi-)Idiom: as is the case/situation with one of a listed inventory of speech/thought verbs, it removes 8. Conjunction + subjectless past-participial clause: if that verb and all preceding content and evaluates whether untreated, as previously announced, if given in the remaining structure is Simple Parallel. If it is, the system sufficient doses, if needed, as put so eloquently resolves the ellipsis. 9. Conjunction + adjective: if possible 10. Clause without object or complement: it seems, you (10) Barak told Israel TV that the agents asked if he know, NPSUBJ feelif he could [e]. 11. Gerund phrase: gritting our teeth, following a review 12. Two modals “share: a scope, both appearing elliptical 4. Strip pre-conjunction material. The system walks at the surface but having a textual postcedent, as backwards through the text to the first encountered shown in (7). conjunction. If it is among our listed 28 conjunctions, and if the associated dependency takes verbal arguments, then the (7) “The possibility for events to spiral rapidly out of system determines whether the latter conjunct is a Simple control in circumstances of darkness, high emotions, Parallel configuration. If yes, the system resolves the low trust and official uncertainty cannot, and should ellipsis. If not, it continues to walk back through the text to not, be underestimated,” DeGolyer said in a report determine if adding another conjunct will result in a Simple published last July. Parallel Configuration. For example, when encountering and in (11) the system When the system detects “[modal/aux.] + [soft discourse evaluates whether I couldn’t is Simple Parallel: it is not. So break] + [parenthetical]”, it considers the context to be non- the system continues walking back to the next conjunction, elliptical since the scope of the modal/aux. generally follows the parenthetical. In all other cases, the soft 4 For example, we searched for frequent single words, and 2- discourse break is treated as if it were a hard discourse and 3-word collocations, occurring between a period and a comma. 230 because, and prunes off the text prior to it. Since what HDB and SDB contexts. The Simple Parallel column remains is a Simple Parallel Configuration, the system indicates how many of the examples were treated as Simple resolves the ellipsis. Parallel Configurations, without trimming and with trimming (Column 2 indicates whether trimming was (11) My legs make the serve because you need to bend applied). Recall indicates this number of examples treated your knees and I couldn’t [e].5 as a percentage of total examples. Head precision refers to accuracy of detecting the correct head of the sponsor. 5. Strip sentence-initial PPs and adverbs. These are detected from the parse tree. Table 1. Evaluation of sentences that were confirmed to be elliptical. (12) In the swimming test, inosine-treated rats by week eight were able to properly control their forepaws, DB Trim Elliptical Simple Recall Head Precision while the untreated rats could not [e]. Examples Parallel hard no 105 28 27% 71% yes 48 46% 71% 6. Strip parentheticals. The approach to stripping soft no 109 13 12% 77% parentheticals is essentially the same as described earlier; yes 20 18% 75% however, in this case, the parenthetical need not be preceded by “[modal/aux. verb] + [soft discourse break]”. Without trimming, the system treated 28/105 HDB examples (27%) and 13/109 SDB examples (12%). Next we (13) By winning a second term, Bush has accomplished applied trimming procedures to the untreated sentences, what his father ─ defeated in 1992 by Democrat Bill which increased recall to 48/105 (46%) for HDB examples Clinton ─ could not [e]. and 20/109 (18%) for SDB examples. Resolution accuracy was about the same with and without trimming. 7. Strip non-quotative NP said/was told, etc. The Phase 2 of the evaluation observes the system in fully collocations NP said, NP was told and paraphrases thereof automatic mode: i.e., we did not manually verify that the are often inserted into propositions that are not direct extracted examples actually were elliptical. Table 2 shows quotes, as in (14). the percentage of examples the system could treat under each of the four experimental conditions as well as the (14) Belu said he wanted to protest, but was told he could number of examples treated by our inventory of elliptical not [e]. phrasal patterns, which were run before the Simple Parallel engine was launched. Although our pattern-based methods were not described in depth in this paper, this count helps to Evaluation convey the relative proportion that each system module contributes to the overall goal of resolving VP ellipsis. This evaluation measured the changes in the coverage of elliptical examples due to the enhancements described Table 2. Evaluation of the system in fully automatic mode, above, and also measured the precision of resolution for all from detection through resolution. experimental runs. Evaluation was carried out on a random sample of the Gigaword Corpus (Graff and Cieri 2003). It DB Trim Examples Simple Head Precision must be noted that samples of this same corpus were used Parallel for linguistic investigation of component phenomena and phrasals N/A 150 N/A 83% testing of algorithms – i.e., before engaging in development hard no 95 13 77% work, we did not set aside a dedicated evaluation segment. yes 32 72% soft no 144 23 78% However, we believe the evaluation results are still valid yes 31 71% since this is a very large corpus and we did not seek to tune our approach to cover any individual examples. Note that Table 2 does not include a Recall column – We carried out two phases of evaluation. Phase 1 focused instead, we orient around how many of the examples that primarily on the effects of trimming procedures. First we the system thought were elliptical could be treated by our semi-automatically – i.e., automatically followed by manual methods, and what percentage of those resolved were checking – identified examples of VP ellipsis before a hard resolved correctly. The reason for not including a formal discourse break (HDB) and before a soft discourse break measure of “recall” is that there is no clean, useful definition (SDB). We then ran the Simple Parallel Configuration of that in this system configuration, since there can be false detector over those examples to determine how many it positives at the extraction stage. The system should not be could treat. Column 3 of Table 1 shows the number of penalized for failing to resolve an instance of “ellipsis” that actually elliptical examples that were evaluated for both was actually never ellipsis to begin with. Moreover, some of the contexts in this corpus were essentially word salad, 5 The fact that the resolution requires sloppy identity of the object – i.e., bend MY knees – will not be treated in this paper. 231 uninterpretable even by people. If the system chose not to treat such sentences, that was appropriate. In some cases, our structurally-oriented rules misfire for reasons that can only be understood with the help of Interpretation of Evaluation Results semantic analysis. For example, in (22) the actual sponsor is in the preceding context; but if we slightly edit the sentence Orienting evaluation strictly around numbers does not to the form in (22a), our rule would have fired correctly. convey the full picture for knowledge-based systems, where error analysis is key to improvements. So let us give just a (22) “Even if we can [e], we can’t afford it.” brief taste of what that process revealed. First, we should emphasize that the system arrived at (22a) “Even if we want to [e], we can’t buy it.” many impressive results, such as its correct treatment of examples (15)-(18). A similar understandable but incorrect resolution occurred in (23). (23a) is a structurally similar context in which the (15) “We have shown that we can play exciting football system’s resolution would have been appropriate. and should have had that game won but you just can not afford to switch off for even a second and I am (23) He appealed to Indonesians to respect national afraid we did [e]. stability and threatened to call out the army if they did not [e]. (16) Airline analysts said the Mesa Air initiative may have prompted Northwest, which already owns a large (23a) He threatened to call out the army if they did not [e]. chunk of Mesaba and has executives on its board of directors, to jump in with an offer before Mesa did Returning to the big picture, this system is being tasked with a [e]. difficult challenge: it must both detect and resolve ellipsis; it takes as input sentences that might be non-normative or (17) Prosecutors say they try to avoid calling journalists to semantically difficult; and it uses as parse that, naturally, can testify, but sometimes they must [e]. include unexpected results. This is a problem space that has been undertreated in computer systems to date, and we believe (18) “If we must [e], we can allow 80 or 100 officers to that the approaches we have described here are a strong first retire, on condition that they be replaced by officers step. capable of leading an army.” Related Work Sentences (15) and (16) include many candidate sponsors to be selected from. Sentence (17) requires the system to strip One relevant related work on VP ellipsis is Hardt’s (1997) try to avoid from the sponsor, leaving calling as the head of VP ellipsis system. However, whereas that system requires a the ellipsis resolution. And sentence (18) requires the perfect (manually corrected) syntactic parse, ours uses the system to find a postcedent, rather than the more typical results of automatic parsing. antecedent (this resolution strategy is formulated as a Extensive work has been devoted to the automatic phrasal pattern). resolution of overt referring expressions, with a recent One source of errors, which is the focus of ongoing work, notable contribution being Lee et al. (2013). is the treatment of structurally embedded categories: e.g., in As concerns sentence trimming, much of the past work (19) the system selected capable (underlined) as the head of has been in service of text summarization. For example, the sponsor rather than its complement, increasing; and in Knight and Marcu (2002) implement two approaches to (20) it should have stripped would not from the actual sentence compression (a noisy-channel, probabilistic sponsor, happen. approach, and a decision-tree, deterministic one) using a methodology that involves aligning sentences from a source (19) Khelil, speaking in an interview with OPECNA, said document (called ‘Text’) with sentences from manually he was not sure the members of OPEC were capable generated abstracts of the document (called ‘Abstract’), then of easily increasing production, even if they wanted using these tuples to learn how to trim to [e]. Texts into Abstracts. Gagnon and Da Sylva (2005) trim sentences based on a dependency parse, removing subtrees (20) They said the elections would not happen, and they that represent certain types of relations, such as did [e]. prepositional complements of the verb, subordinate clauses and noun appositions. Apart from summarization, sentence Another common error involves cases in which the actual trimming has been applied to headline generation, event antecedent is not within the given sentence, but the given extraction and subtitling. Zajic et al.’s (2004) Hedge sentence contains what appears to be a valid sponsor. Trimmer system produced headlines by compressing the lead sentence of an article and removing constituents (21) “But I feel good that if I need to [e], I will.” (articles, prepositional phrases, auxiliary have/be, etc.) in a 232 particular order until the desired length threshold was Acknowledgments reached. Buyko et al.’s (2011) trimmer supported event This research was supported in part by Grant N00014-09-1- extraction by pruning what they call “informationally 1029 from the U.S. Office of Naval Research. All opinions irrelevant lexical material” (such as auxiliary and modal and findings expressed in this material are those of the verbs) from dependency graphs in order to focus on authors and do not necessarily reflect the views of the semantically rich dependencies. Office of Naval Research. Perhaps the closest precedent to our approach is the one reported in Vanderwende et al. (2007), which involves 5 trimming patterns. Three directly trim nodes generated by References the parser (noun appositive, gerund clause, nonrestrictive Buyko, E., Faessler, E., Wermter, J. & Hahn, U. (2011). relative clause). The fourth pattern is the deletion of lead Syntactic simplification and semantic enrichment— conjunctions and adverbials (of time and manner only), trimming dependency graphs for event extraction. which relies on a parser feature indicating time/manner Computational Intelligence 27(4). adverbials. The final pattern, intra-sentential attribution Davies, Mark. (2008-) The Corpus of Contemporary (e.g., “…the report said that…”) requires direct American English: 450 million words, 1990-present. manipulation of the parse. Interestingly enough, the Available online at http://corpus.byu.edu/coca/. summarization engine that this process served often selected Gagnon, M. & Da Sylva, L. (2005). Text summarization by the non-trimmed variants of sentences, in some cases quite sentence extraction and syntactic pruning. Proceedings correctly since the trimmed version lost important content, of Computational Linguistics in the North East, either due to parser error or overtrimming. Gatineau, Québec, Canada, 26 August 2005. Graff, D., & Cieri, C. (2003). English Gigaword. Linguistic Final Thoughts Data Consortium. Philadelphia. Three insights guided the work presented here. (1) Although Hardt, D. (1997). An empirical approach to VP ellipsis. resolving some instances of VP ellipsis requires Computational Linguistics 23(4): 525-541. sophisticated semantic and pragmatic reasoning, not all Knight, K. & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to cases are so difficult. (2) The “difficult/simple” judgment sentence compression. Artificial Intelligence, 139(1). can be operationalized by exploiting linguistic principles Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, that can be implemented within the current state of the art. M. & Jurafsky, D. (2013). Deterministic coreference (3) Many complex contexts can be automatically simplified, resolution based on entity-centric, precision-ranked with the simplified versions being treatable by our ellipsis rules. Computational Linguistics 39(4): 885-916. resolution methods. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, The decision to permit the system to select which S.J. & McClosky, D. (2014). The Stanford CoreNLP examples to treat and which to leave untreated is not typical Natural Language Processing Toolkit. Proceedings of in current NLP. Systems that treat overt referring the 52nd Annual Meeting of the Association for expressions more typically function in one of two different Computational Linguistics: System Demonstrations (pp. modes: either they orient around an annotated corpus, which 55-60). indicates which entities must be treated (the so-called McShane, M. & Babkin, P. (2015). Automatic ellipsis “markables”); or they attempt to treat all instances of a resolution: Recovering covert information from text. given string. Our interest in permitting the system to select Proceedings of the Twenty-Ninth AAAI Conference on which contexts to treat derives from the agent-building Artificial Intelligence (AAAI-15). paradigm. Given an input, the agent must decide if it Vanderwende, L., Suzuki, H., Brockett, C. & Nenkova, A. understands it sufficiently to proceed to decision-making (2007). Beyond SumBasic: Task-focused and action. Endowing agents with estimates of language summarization with sentence simplification and lexical processing confidence will, we believe, contribute to expansion. Information Processing and Management, making them better collaborators with humans in the near 43(6): 1606–1618. future. Zajic, D., Dorr, B., & Schwartz, R. (2004). BBN/UMD at As a contribution to cognitive science, this approach to DUC-2004: Topiary. Proceedings of DUC-2004. agent modeling operationalizes the notion of a “simple” context – i.e., one involving a minimal cognitive load for the agent. Orienting around a psychologically-plausible foothold like this is, we believe, essential when attempting to treat difficult linguistic phenomena such as ellipsis. 233