=Paper=
{{Paper
|id=Vol-2174/paper3
|storemode=property
|title=Cues, Scope, and Focus: Annotating Negation in Spanish Corpora
|pdfUrl=https://ceur-ws.org/Vol-2174/paper3.pdf
|volume=Vol-2174
|authors=Lucia Donatelli
}}
==Cues, Scope, and Focus: Annotating Negation in Spanish Corpora==
Cues, Scope, and Focus: Annotating Negation in Spanish Corpora Inductores, Ámbito, y Foco: La Anotación de la Negación en los Corpus de Español Lucia Donatelli Georgetown University Bunn Intercultural Center, 403A 37th and O Streets, N.W. Washington, DC 20057 led66@georgetown.edu Abstract: The objective of NEGES Task 1 is to establish a standard for the anno- tation of negation in Spanish-language corpora. Specifically, the task analyzes gui- delines used for such annotation in five projects over three domains: news (Sandoval and Salazar, 2013), clinical reports (Oronoz et al., 2015; Cruz et al., 2017; Marimon, Vivaldi, and Bel, 2017), and product reviews (Jiménez-Zafra et al., 2018b). Here, an assessment of these various guidelines is presented, with the goals of helping esta- blish a standard set of guidelines for annotating negation in Spanish across domains, and of contributing to the workshop’s overall conversation about the treatment of negation in computational linguistics. Keywords: Linguistic annotation, negation, Spanish Resumen: El objetivo de la Tarea 1 de NEGES es establecer un estándar para anotar la negación en los corpus en español. En particular, la tarea analiza las direc- trices empleadeas para tal anotación en tres dominios repartidas en cinco proyectos de corpus: noticias (Sandoval and Salazar, 2013), informes clı́nicos (Oronoz et al., 2015; Cruz et al., 2017; Marimon, Vivaldi, and Bel, 2017), y opiniones de produc- tos (Jiménez-Zafra et al., 2018b). Aquı́ se presenta una evaluación de las distintas directrices, con los fines de ayudar a establecer una norma de directrices que se de- ben seguir para la anotación de negación en español entre dominios, y de contribuir a la conversación más amplia del taller sobre el tratamiento de la negación en la lingüı́stica computacional. Palabras clave: Anotación lingüı́stica, negación, español 1 Introduction rial polyvalence, appearing as a prefix, verb, Negation is fundamental to sentence mea- determiner, adverbial, particle, idiom, cons- ning, bearing on the key questions of “what truction, and more (Bosque Muñoz and Gu- happened (and what did not)”, “who did (or tiérrez-Rexach, 2009; Herburger, 2018). Of- did not do) what to whom,” and “what exis- ten, the scope, or span of utterance upon ted (or did not).” In other words, negation which negation acts, is variable even within helps establish what is fact and what is not, the same sentence; some argue that scope due to its ability to affect the truth value of is different when determined at logical form a sentence (Horn, 1989). This is important to (LF) (the syntax/semantics interface) as op- tasks in Natural Language Processing (NLP) posed to pragmatically (Moeschler, 2010), based on the accurate identification and re- forcing resulting annotation to distinguish presentation of meaning encoded in langua- between underlying semantics and more ge- ge, such as information extraction, question neral speaker meaning. Additionally, nega- answering, and sentiment analysis. tion may carry different force depending on The annotation of negation is not a tri- context; this is particularly true for Spa- vial task. Negation acts at the syntactic, se- nish, in which negative concord may offer mantic, and pragmatic levels (Sandoval and gradations of meaning (Jiménez-Zafra et al., Salazar, 2013); and it exhibits asymmetry 2018b). This may be seen by the variety of between a uniform semantics and a catego- negative expressions in the following senten- 29 Proceedings of NEGES 2018: Workshop on Negation in Spanish, pages 29-34 Seville, Spain, September, 18, 2018 ces, with negative linguistic elements in bold 3 Review of Guidelines (Española-RAE, 2010): 3.1 UAM Spanish TreeBank (1) Ella no dijo nada. ‘She did not say anything.’ Sandoval and Salazar (2013) present findings (2) Nadie le hacı́a caso. ‘Nobody paid at- from annotating the UAM Spanish Treebank tention to him/her.’ (Moreno et al., 2003), which consists of 1501 sentences taken from newspapers (El Paı́s (3) Ni de una forma ni de otra consiguie- and Compra Muestra) and annotated syn- ron convencerla. ‘They couldn’t convince her tactically following guidelines from the Penn one way or another.’ Treebank (Marcus, Marcinkiewicz, and San- (4) En mi vida he visto cosa igual. ‘I ha- torini, 1993). 10.67 % of the sentences were ven’t seen anything similar in my life.’ found to contain negation (160 sentences). (5) No hables tanto. ‘Don’t talk so much.’ Given that the UAM Treebank is in xml (6) ¿No son ya las dos? ‘Isn’t it already format, the authors adapt their annotation two o’clock?’ scheme to be compatible. The authors distin- Example (1) negates the action of spea- guish two levels of annotation for negation: king; no functions as a negation cue, and na- sentence-level and lexical. The former is furt- da as a negative polarity item (NPI) within her divided into sentential and phrase-level the scope of the cue in the form of a negative negation; the latter is divided into pronouns indefinite. (2) affirms the matter that no one, and adverbs. Annotations thus mark both ne- signaled by the indefinite pronoun nadie, par- gation cues and the scope of negation. ticipated in an attention-giving activity. (3) The use of a syntactically annotated cor- exhibits two negative conjunctions, ni, that pus is helpful for the overall annotation sche- describe an ineffective manner of convincing. me; this is especially true given the wides- (4) exhibits an adverbial phrase, en mi vida, pread knowledge of the Penn Treebank, as that equates to the negative temporal adverb well as the concurrent annotation of lexi- nunca ‘never’. Finally, (5) and (6) utilize ne- cal features that specify POS. Theoretically, gation to support speech acts: (5) as a nega- such annotation will help identify the lexical tive command, and (6) as a leading question. category of negation marker as well as its syn- Given this complexity in mind, the process tactic scope of negation. Algorithms trained of annotating negation in Spanish is daun- on such data will then be able to recognize ting. Nevertheless, it is essential to anno- patterns of use of negation for varying parts tate corpora with such information in or- of speech and compare how syntactic scope der to train algorithms to perform to hu- relates to semantic and pragmatic scope. man capacity. Ideally, an annotation frame- As the authors themselves note, there is work could be designed to capture negation much room for improvement in the annota- patterns cross-linguistically. However, since tion scheme presented. First, the type of ne- such patterns are quite varied and distinct, gation is not specified apart fromthe task here focuses on annotating negation and (to mark scope), and in Spanish as a first step towards potentially (to mark negation cues). This fails to cap- broader research. ture gradations of negation (i.e. whether it is an assertion or a speech act that is nega- ted (Moeschler, 2010)), differences in inten- 2 Task Definition sity of negation (Jiménez-Zafra et al., 2018b), NEGES task 1 seeks to reach an agreement and the function of the NPI within the scope on the guidelines to follow for the annota- of negation (Herburger, 2018). Additionally, tion of negation in Spanish building off pre- there is no specification of event negation, ne- vious, domain-specific guidelines used to an- gation expressed morphologically, or negative notate corpora built around news, clinical re- discourse and sentence connectors. ports and product reviews (Jiménez-Zafra et al., 2018a). Here I present brief summaries 3.2 IxaMed-GS and analyses of the guidelines presented in Oronoz et al. (2015) focus on the identifi- the five projects in question, followed by a cation of entities and events in clinical re- complete evaluation of the guidelines across ports with the goal of automatic extraction projects. of adverse drug reaction events using machine 30 learning. Annotators were experts in pharma- 3.3 SFU ReviewSP-NEG cology and pharmacovigilance, a notable dif- ference from annotators trained in linguistics Jiménez-Zafra et al. (2018b) present the SFU for other corpora in their knowledge of the ReviewSP-NEG corpus, the first Spanish cor- clinical domain yet informal training in iden- pus that includes event negation as part of tifying functional linguistic elements such as the annotation scheme as well as the annota- negation. tion of discontinuous negation markers. The corpus was also the first to have defined a The authors collected 142,154 anonymous typology of patterns involving negation spe- discharge reports from the outpatient con- cific to Spanish. In this SFU ReviewSP-NEG sultations of the Galdake-Usansolo Hospital corpus, syntactic negation, scope, focus, and from 2008 to 2012. Negation (and specula- event were annotated. Yet, annotations on tion) was annotated as a modifier of a di- the event and on how negation affects the po- sorder or drug, and individual cues were left larity of the words within its scope were in- unmarked. As such, the text span sin otras cluded for whether there is a complete change alergias medicamentosas “without other drug in the polarity of the span in question, or an allergies” would possess negation on alergias increment or reduction of its value. medicamentosas while sin would be left ba- The Spanish SFU Review corpus, origi- re. This practice was used to maintain con- nally intended for work on sentiment analy- sistency in the domain: a disease-entity such sis, consists of 400 reviews extracted from as afebril “afebrile” was also marked as ne- the website ciao.es. The reviews span 8 diffe- gative in the absence of surrounding negative rent produce areas: cars, hotels, washing ma- lexical material. chines, books, cell phones, music, computers, Four entity types were annotated: disea- and movies. For each product area there are ses, allergies, drugs, and procedures. For di- 50 positive and 50 negative reviews, which seases and allergies, a distinction was made provides an informative context for how to in- between negated entity, speculated entity and terpret the effects of negation at the discourse entity (for non-speculative and non-negated level (Taboada, Anthony, and Voll, 2006). entities). 2,362 diseases were annotated, out For the SFU ReviewSP-NEG corpus, each of which 490 (20.75 %) were tagged as ne- review was automatically annotated at the gated diseases and 40 (1.69 %) as speculated token level with POS-tags and lemmas; nega- diseases. 404 allergy entities were identified, tion cues and their corresponding scopes and of which 273 (67.57 %) were negated and 13 events were manually annotated at the sen- (3.22 %), speculated. The quality of the anno- tence level. The annotations were performed tation process was assessed by measuring the by two senior researchers with in-depth expe- inter-annotator agreement (IAA), which was rience in corpus annotation who supervised 90.53 % for entities and 82.86 % for events. the two trained annotators who carried out This annotation scheme needs to be adap- the annotation task. The final corpus is com- ted in order to extend beyond the clinical do- posed of 9,455 sentences, out of which 3,022 main. First and foremost, negation needs to sentences (31.97 %) contain at least one nega- be treated linguistically and broken down in- tion marker. The Kappa coefficient for IAA to its components apart from the disorders was of 0.97 for negation cues, 0.95 for negated and drugs it acts on. Nevertheless, the recog- events and 0.94 for scopes. nition that entities and events may be marked Similar to the UAM Spanish Treebank, for negation in distinct ways (i.e. syntacti- the annotation scheme for SFU ReviewSP- cally versus morphologically, as in the above NEG ought to be broadened to include morp- examples), as well as the intuition that some hological, lexical, and discourse-oriented ne- entities and events may possess qualities of gation. Nevertheless, the inclusion of gra- negation without being explicitly marked for dient interpretations of negation within its it, is an important contribution to developing scope captures the subtle meaning differen- a comprehensive annotation scheme. Additio- ces that negation markers, and their combi- nally, the distinction between negation and nations, may produce. In fact, this gradation speculation is important to consider, as the of meaning may be expanded to include even linguistic interaction between negation and finer-grained distinctions of meaning in futu- modality is complicated, yet merits attention. re work. 31 3.4 UHU-HUVR of clinical texts. The corpus contains 3,194 Cruz et al. (2017) annotate a corpus com- sentences, out of which 1,093 (34.22 %) were posed of 604 clinical reports from the Vir- annotated with negation cues. In this corpus, gen del Rocı́o Hospital in Sevilla, Spain. 276 syntactic negation and lexical negation we- of this clinical documents correspond to ra- re annotated; morphological negation was ex- diology reports and 328 to the personal his- cluded. Annotators were three computational tory of anamnesis reports written in free text. linguists annotators, advised by a clinician. Two domain expert annotators closely follo- Annotators did not include the negation wed the Thyme corpus guidelines (Styler IV cue nor the subject in its scope as part of an- et al., 2014), developed for the annotation of notation, unless the subject was located af- English clinical record. In the anamnesis re- ter the verb. This practice seems to be based ports, 1,079 sentences (35.20 %) were found on linear order alone, and does not take into to contain negations out of 3,065 sentences. account the semantics of scope nor the possi- On the other hand, 1,219 sentences (22.80 %) bility of backwards scope (Hoeksema, 2000). out of 5,347 sentences were annotated with Additionally, it seems necessary to mark the negations in the radiology reports. The Dice negation cue in some manner to signal whe- coefficient for IAA was higher than 0.94 for re the negation is coming from in order to negation markers and higher than 0.72 for better train algorithms. This aside, the an- negated events. notation of scope seemed to be quite precise In this corpus, all types of negation we- (for example, annotating scope over verb ph- re annotated: syntactic, morphological (affi- rase versus just over adverb), and the project xal negation), and lexical. Negation was mar- on the whole was presented very thoroughly. ked both linguistically and as a modifier of a The authors note that they did not annotate disorder of a drug, i.e. whether or not the certain verbs with negative polarity (desapa- drug was effective. Similar to (Oronoz et al., recer, retirar, suspenderse, eradicar, negar ) 2015), full words that expressed negative po- on the basis that such verbs still denoted fac- larity were marked in their entirety (afebril tuality. Such interactions between negation “afebrile”) rather than just their negative af- and factuality seems worth while to discuss fix (a-). for future annotation efforts. Either due to the domain of application or Similar to (Oronoz et al., 2015) and (Cruz presentation format, the annotation guideli- et al., 2017), the IULA was biased towards nes presented for the UHU-HUVR corpus of- the clinical domain. Thus, teasing apart the ten seemed unclear. For example, non-clinical effects of negative affixation (for example, in experts may have trouble differentiated nega- the adjective asintomático “asymptomatic”) tive test results from negative clinical events, will be necessary for future work to both be as the annotation scheme does. Additionally, faithful to linguistic negation yet still express while some negation affixes are marked as ne- the desired level of factuality for clinical use. gating symptoms (a-febril “afebrile”) others are not, considered positive symptoms unto 4 Discussion and Preliminary themselves (in-continencia urinaria “urinary Proposal incontinence”). Finally, the authors’ treat- With any linguistic annotation task, striking ment of coordination as a single unit of ne- a balance between linguistic precision and an- gation ought to be revised. notation feasibility is an inevitable and essen- tial question. For the annotation of negation 3.5 IULA Spanish Clinical Record in Spanish, several components of the propo- The IULA Spanish Clinical Record corpus sals discussed above may be combined into (Marimon, Vivaldi, and Bel, 2017) contains a set of complex guidelines that is both lin- 300 anonymized clinical records from several guistically accurate and domain neutral. Here services of one of the main hospitals in Barce- I summarize the main components I find to lona, Spain. The corpus was annotated with be worth annotating. negation markers and their scopes with the Most basically, the semantics of negation ultimate goal of extracting factual knowled- is represented (and ought to be annotated) ge from textual data; subgoals included auto- through (i) the identification of the negation matic encoding of clinical records; diagnosis cue (the lexical element expressing nega- support; term extraction; and general study tion); (ii) its scope (the text section that 32 is negated); (iii) its focus (that part of the 5 Conclusion scope that is prominently or explicitly nega- This paper has presented an analysis and eva- ted); and, if present, (iv) its reinforcement luation of existing guidelines for the anno- (an auxiliary negation or NPI) (Altuna, tation of negation in several domain-specific Minard, and Speranza, 2017). This may Spanish corpora. A preliminary proposal is be understood in an example such as the given for how to combine linguistically accu- following: rate and precise annotation with more practi- cal concerns regarding domain of application (7) Juan no come [carne] sino verduras. and ease of annotation. Future work points in particular towards refining the subtle mea- The negation cue (no) is represented in ning effects negation can have on words and bold; the scope (come carne sino verduras) phrase meaning, as well as its interaction is in italics; the focus (carner) is in brackets; with modality for ultimate interpretation of and the NPI (sino) in bold and italics. Ne- event factuality. gation markers that do not carry negative polarity semantic information (nada más “as Acknowledgments soon as”) can be marked as such (for exam- Many thanks to Elena Herburger, one of my ple, as instead of (Jiménez- doctoral advisors and an expert on negation. Zafra et al., 2018b). Thanks as well to Claire Bonial and the team Following (Morante, Schrauwen, and Dae- at Army Research Lab (ARL) for supporting lemans, 2011), negation cues could be limi- the work behind this paper. ted to just adverbs (no, nadie, ninguno, nun- ca/jamás). However, it seems that annota- References ting morphological cues (prefixes such as a-, Altuna, B., A.-L. Minard, and M. Speranza. in/im-, de(s)-, anti-) as well as negative po- 2017. The Scope and Focus of Negation: larity verbs (retirar, deaparecer, suspenderse, A Complete Annotation Framework for etc.) is worth while for application to clinical Italian. In Proceedings of the Workshop domains. Computational Semantics Beyond Events This could be accomplished with both the and Roles, pages 34–42. annotation of the cues themselves and a lin- Bosque Muñoz, I. and J. Gutiérrez-Rexach. king to some sort of lexical definition or mo- 2009. Fundamentos de sintaxis formal. dal effect of the cue, as some combination Ediciones Akal. of (Marimon, Vivaldi, and Bel, 2017) and (Jiménez-Zafra et al., 2018b) could produ- Cruz, N., R. Morante, M. J. M. López, J. M. ce. Figure 2 of (Jiménez-Zafra et al., 2018b) Vázquez, and C. L. P. Calderón. 2017. seems adequately suited to capturing the la- Annotating negation in Spanish clinical yers of complexity of negation. This, in com- texts. In Proceedings of the Workshop bination with the distinction a NegPred (for Computational Semantics Beyond Events (1), comer ‘to eat’), Negmarker (for (1), and Roles, pages 53–58. no ‘does not’), and NegPolItem (for (1), Española-RAE, R. A. 2010. Nueva gramáti- sino ‘but’) from (Marimon, Vivaldi, and Bel, ca de la lengua española. Manual. Madrid: 2017) could provide substantial coverage. It Espasa. seems that an additional feature such as [+/- realis] may be helpful to distinguish levels of Herburger, E. 2018. What it means to be an factuality of events in question, as well. NPI. Linguistic Variation and Language Architecture Workshop, June. As a closing point, the Brat annotation tool (Marimon, Vivaldi, and Bel, 2017) seems Hoeksema, J. 2000. Negative polarity items: suitable for any comprehensive annotation Triggering, scope and c-command. Nega- task involving negation. The multi-colored, tion and polarity, pages 115–146. layered format is accessible online, facili- Horn, L. 1989. A natural history of negation. tating collaborative annotation efforts and the potential implementation of pilot anno- Jiménez-Zafra, S. M., N. P. Cruz-Dı́az, tation tasks to gauge inter-annotator agree- R. Morante, and M. T. Martı́n-Valdivia. ment (IAA) as guidelines are developed. 2018a. Tarea 1 del Taller NEGES 2018: 33 Guı́as de Anotación. In Proceedings of Taboada, M., C. Anthony, and K. D. Voll. NEGES 2018: Workshop on Negation in 2006. Methods for Creating Semantic Spanish, volume 2174, pages 15–21. Orientation Dictionaries. In LREC, pages Jiménez-Zafra, S. M., M. Taulé, M. T. 427–432. Martı́n-Valdivia, L. A. Ureña-López, and M. A. Martı́. 2018b. Sfu Review SP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of ne- gation patterns. Language Resources and Evaluation, 52(2):533–569. Marcus, M. P., M. A. Marcinkiewicz, and B. Santorini. 1993. Building a lar- ge annotated corpus of English: The Penn Treebank. Computational linguis- tics, 19(2):313–330. Marimon, M., J. Vivaldi, and N. Bel. 2017. Annotation of negation in the iula Spa- nish clinical record corpus. In Proceedings of the Workshop Computational Seman- tics Beyond Events and Roles, pages 43– 52. Moeschler, J. 2010. Negation, scope and the descriptive/metalinguistic distinction. Generative Grammar in Geneva, 6:29–48. Morante, R., S. Schrauwen, and W. Daele- mans. 2011. Annotation of negation cues and their scope: Guidelines v1. Compu- tational linguistics and psycholinguistics technical report series, CTRS-003. Moreno, A., S. López, F. Sánchez, and R. Grishman. 2003. Developing a syn- tactic annotation scheme and tools for a Spanish treebank. In Treebanks. Springer, pages 149–163. Oronoz, M., K. Gojenola, A. Pérez, A. D. de Ilarraza, and A. Casillas. 2015. On the creation of a clinical gold standard corpus in spanish: Mining adverse drug reactions. Journal of biomedical informatics, 56:318– 332. Sandoval, A. M. and M. G. Salazar. 2013. La anotación de la negación en un corpus escrito etiquetado sintácticamente anno- tation of negation in a written treebank. Revista Iberoamericana de Linguistica, 8. Styler IV, W. F., S. Bethard, S. Finan, M. Palmer, S. Pradhan, P. C. de Groen, B. Erickson, T. Miller, C. Lin, G. Savova, et al. 2014. Temporal annotation in the clinical domain. Transactions of the As- sociation for Computational Linguistics, 2:143. 34