=Paper= {{Paper |id=Vol-2174/paper3 |storemode=property |title=Cues, Scope, and Focus: Annotating Negation in Spanish Corpora |pdfUrl=https://ceur-ws.org/Vol-2174/paper3.pdf |volume=Vol-2174 |authors=Lucia Donatelli }} ==Cues, Scope, and Focus: Annotating Negation in Spanish Corpora== https://ceur-ws.org/Vol-2174/paper3.pdf
                  Cues, Scope, and Focus:
           Annotating Negation in Spanish Corpora
                  Inductores, Ámbito, y Foco:
     La Anotación de la Negación en los Corpus de Español
                                    Lucia Donatelli
                                 Georgetown University
                Bunn Intercultural Center, 403A 37th and O Streets, N.W.
                                 Washington, DC 20057
                                 led66@georgetown.edu

      Abstract: The objective of NEGES Task 1 is to establish a standard for the anno-
      tation of negation in Spanish-language corpora. Specifically, the task analyzes gui-
      delines used for such annotation in five projects over three domains: news (Sandoval
      and Salazar, 2013), clinical reports (Oronoz et al., 2015; Cruz et al., 2017; Marimon,
      Vivaldi, and Bel, 2017), and product reviews (Jiménez-Zafra et al., 2018b). Here, an
      assessment of these various guidelines is presented, with the goals of helping esta-
      blish a standard set of guidelines for annotating negation in Spanish across domains,
      and of contributing to the workshop’s overall conversation about the treatment of
      negation in computational linguistics.
      Keywords: Linguistic annotation, negation, Spanish
      Resumen: El objetivo de la Tarea 1 de NEGES es establecer un estándar para
      anotar la negación en los corpus en español. En particular, la tarea analiza las direc-
      trices empleadeas para tal anotación en tres dominios repartidas en cinco proyectos
      de corpus: noticias (Sandoval and Salazar, 2013), informes clı́nicos (Oronoz et al.,
      2015; Cruz et al., 2017; Marimon, Vivaldi, and Bel, 2017), y opiniones de produc-
      tos (Jiménez-Zafra et al., 2018b). Aquı́ se presenta una evaluación de las distintas
      directrices, con los fines de ayudar a establecer una norma de directrices que se de-
      ben seguir para la anotación de negación en español entre dominios, y de contribuir
      a la conversación más amplia del taller sobre el tratamiento de la negación en la
      lingüı́stica computacional.
      Palabras clave: Anotación lingüı́stica, negación, español

1   Introduction                                              rial polyvalence, appearing as a prefix, verb,
Negation is fundamental to sentence mea-                      determiner, adverbial, particle, idiom, cons-
ning, bearing on the key questions of “what                   truction, and more (Bosque Muñoz and Gu-
happened (and what did not)”, “who did (or                    tiérrez-Rexach, 2009; Herburger, 2018). Of-
did not do) what to whom,” and “what exis-                    ten, the scope, or span of utterance upon
ted (or did not).” In other words, negation                   which negation acts, is variable even within
helps establish what is fact and what is not,                 the same sentence; some argue that scope
due to its ability to affect the truth value of               is different when determined at logical form
a sentence (Horn, 1989). This is important to                 (LF) (the syntax/semantics interface) as op-
tasks in Natural Language Processing (NLP)                    posed to pragmatically (Moeschler, 2010),
based on the accurate identification and re-                  forcing resulting annotation to distinguish
presentation of meaning encoded in langua-                    between underlying semantics and more ge-
ge, such as information extraction, question                  neral speaker meaning. Additionally, nega-
answering, and sentiment analysis.                            tion may carry different force depending on
   The annotation of negation is not a tri-                   context; this is particularly true for Spa-
vial task. Negation acts at the syntactic, se-                nish, in which negative concord may offer
mantic, and pragmatic levels (Sandoval and                    gradations of meaning (Jiménez-Zafra et al.,
Salazar, 2013); and it exhibits asymmetry                     2018b). This may be seen by the variety of
between a uniform semantics and a catego-                     negative expressions in the following senten-

                                                        29

                      Proceedings of NEGES 2018: Workshop on Negation in Spanish, pages 29-34
                                        Seville, Spain, September, 18, 2018
ces, with negative linguistic elements in bold         3     Review of Guidelines
(Española-RAE, 2010):
                                                       3.1    UAM Spanish TreeBank
    (1) Ella no dijo nada. ‘She did not say
anything.’                                             Sandoval and Salazar (2013) present findings
    (2) Nadie le hacı́a caso. ‘Nobody paid at-         from annotating the UAM Spanish Treebank
tention to him/her.’                                   (Moreno et al., 2003), which consists of 1501
                                                       sentences taken from newspapers (El Paı́s
    (3) Ni de una forma ni de otra consiguie-
                                                       and Compra Muestra) and annotated syn-
ron convencerla. ‘They couldn’t convince her
                                                       tactically following guidelines from the Penn
one way or another.’
                                                       Treebank (Marcus, Marcinkiewicz, and San-
    (4) En mi vida he visto cosa igual. ‘I ha-         torini, 1993). 10.67 % of the sentences were
ven’t seen anything similar in my life.’               found to contain negation (160 sentences).
    (5) No hables tanto. ‘Don’t talk so much.’             Given that the UAM Treebank is in xml
    (6) ¿No son ya las dos? ‘Isn’t it already          format, the authors adapt their annotation
two o’clock?’                                          scheme to be compatible. The authors distin-
    Example (1) negates the action of spea-            guish two levels of annotation for negation:
king; no functions as a negation cue, and na-          sentence-level and lexical. The former is furt-
da as a negative polarity item (NPI) within            her divided into sentential and phrase-level
the scope of the cue in the form of a negative         negation; the latter is divided into pronouns
indefinite. (2) affirms the matter that no one,        and adverbs. Annotations thus mark both ne-
signaled by the indefinite pronoun nadie, par-         gation cues and the scope of negation.
ticipated in an attention-giving activity. (3)             The use of a syntactically annotated cor-
exhibits two negative conjunctions, ni, that           pus is helpful for the overall annotation sche-
describe an ineffective manner of convincing.          me; this is especially true given the wides-
(4) exhibits an adverbial phrase, en mi vida,          pread knowledge of the Penn Treebank, as
that equates to the negative temporal adverb           well as the concurrent annotation of lexi-
nunca ‘never’. Finally, (5) and (6) utilize ne-        cal features that specify POS. Theoretically,
gation to support speech acts: (5) as a nega-          such annotation will help identify the lexical
tive command, and (6) as a leading question.           category of negation marker as well as its syn-
    Given this complexity in mind, the process         tactic scope of negation. Algorithms trained
of annotating negation in Spanish is daun-             on such data will then be able to recognize
ting. Nevertheless, it is essential to anno-           patterns of use of negation for varying parts
tate corpora with such information in or-              of speech and compare how syntactic scope
der to train algorithms to perform to hu-              relates to semantic and pragmatic scope.
man capacity. Ideally, an annotation frame-                As the authors themselves note, there is
work could be designed to capture negation             much room for improvement in the annota-
patterns cross-linguistically. However, since          tion scheme presented. First, the type of ne-
such patterns are quite varied and distinct,           gation is not specified apart from 
the task here focuses on annotating negation           and  (to mark scope), and 
in Spanish as a first step towards potentially         (to mark negation cues). This fails to cap-
broader research.                                      ture gradations of negation (i.e. whether it
                                                       is an assertion or a speech act that is nega-
                                                       ted (Moeschler, 2010)), differences in inten-
2   Task Definition                                    sity of negation (Jiménez-Zafra et al., 2018b),
NEGES task 1 seeks to reach an agreement               and the function of the NPI within the scope
on the guidelines to follow for the annota-            of negation (Herburger, 2018). Additionally,
tion of negation in Spanish building off pre-          there is no specification of event negation, ne-
vious, domain-specific guidelines used to an-          gation expressed morphologically, or negative
notate corpora built around news, clinical re-         discourse and sentence connectors.
ports and product reviews (Jiménez-Zafra et
al., 2018a). Here I present brief summaries            3.2    IxaMed-GS
and analyses of the guidelines presented in            Oronoz et al. (2015) focus on the identifi-
the five projects in question, followed by a           cation of entities and events in clinical re-
complete evaluation of the guidelines across           ports with the goal of automatic extraction
projects.                                              of adverse drug reaction events using machine
                                                  30
learning. Annotators were experts in pharma-           3.3   SFU ReviewSP-NEG
cology and pharmacovigilance, a notable dif-
ference from annotators trained in linguistics         Jiménez-Zafra et al. (2018b) present the SFU
for other corpora in their knowledge of the            ReviewSP-NEG corpus, the first Spanish cor-
clinical domain yet informal training in iden-         pus that includes event negation as part of
tifying functional linguistic elements such as         the annotation scheme as well as the annota-
negation.                                              tion of discontinuous negation markers. The
                                                       corpus was also the first to have defined a
    The authors collected 142,154 anonymous            typology of patterns involving negation spe-
discharge reports from the outpatient con-             cific to Spanish. In this SFU ReviewSP-NEG
sultations of the Galdake-Usansolo Hospital            corpus, syntactic negation, scope, focus, and
from 2008 to 2012. Negation (and specula-              event were annotated. Yet, annotations on
tion) was annotated as a modifier of a di-             the event and on how negation affects the po-
sorder or drug, and individual cues were left          larity of the words within its scope were in-
unmarked. As such, the text span sin otras             cluded for whether there is a complete change
alergias medicamentosas “without other drug            in the polarity of the span in question, or an
allergies” would possess negation on alergias          increment or reduction of its value.
medicamentosas while sin would be left ba-
                                                           The Spanish SFU Review corpus, origi-
re. This practice was used to maintain con-
                                                       nally intended for work on sentiment analy-
sistency in the domain: a disease-entity such
                                                       sis, consists of 400 reviews extracted from
as afebril “afebrile” was also marked as ne-
                                                       the website ciao.es. The reviews span 8 diffe-
gative in the absence of surrounding negative
                                                       rent produce areas: cars, hotels, washing ma-
lexical material.
                                                       chines, books, cell phones, music, computers,
   Four entity types were annotated: disea-            and movies. For each product area there are
ses, allergies, drugs, and procedures. For di-         50 positive and 50 negative reviews, which
seases and allergies, a distinction was made           provides an informative context for how to in-
between negated entity, speculated entity and          terpret the effects of negation at the discourse
entity (for non-speculative and non-negated            level (Taboada, Anthony, and Voll, 2006).
entities). 2,362 diseases were annotated, out              For the SFU ReviewSP-NEG corpus, each
of which 490 (20.75 %) were tagged as ne-              review was automatically annotated at the
gated diseases and 40 (1.69 %) as speculated           token level with POS-tags and lemmas; nega-
diseases. 404 allergy entities were identified,        tion cues and their corresponding scopes and
of which 273 (67.57 %) were negated and 13             events were manually annotated at the sen-
(3.22 %), speculated. The quality of the anno-         tence level. The annotations were performed
tation process was assessed by measuring the           by two senior researchers with in-depth expe-
inter-annotator agreement (IAA), which was             rience in corpus annotation who supervised
90.53 % for entities and 82.86 % for events.           the two trained annotators who carried out
    This annotation scheme needs to be adap-           the annotation task. The final corpus is com-
ted in order to extend beyond the clinical do-         posed of 9,455 sentences, out of which 3,022
main. First and foremost, negation needs to            sentences (31.97 %) contain at least one nega-
be treated linguistically and broken down in-          tion marker. The Kappa coefficient for IAA
to its components apart from the disorders             was of 0.97 for negation cues, 0.95 for negated
and drugs it acts on. Nevertheless, the recog-         events and 0.94 for scopes.
nition that entities and events may be marked              Similar to the UAM Spanish Treebank,
for negation in distinct ways (i.e. syntacti-          the annotation scheme for SFU ReviewSP-
cally versus morphologically, as in the above          NEG ought to be broadened to include morp-
examples), as well as the intuition that some          hological, lexical, and discourse-oriented ne-
entities and events may possess qualities of           gation. Nevertheless, the inclusion of gra-
negation without being explicitly marked for           dient interpretations of negation within its
it, is an important contribution to developing         scope captures the subtle meaning differen-
a comprehensive annotation scheme. Additio-            ces that negation markers, and their combi-
nally, the distinction between negation and            nations, may produce. In fact, this gradation
speculation is important to consider, as the           of meaning may be expanded to include even
linguistic interaction between negation and            finer-grained distinctions of meaning in futu-
modality is complicated, yet merits attention.         re work.
                                                  31
3.4   UHU-HUVR                                          of clinical texts. The corpus contains 3,194
Cruz et al. (2017) annotate a corpus com-               sentences, out of which 1,093 (34.22 %) were
posed of 604 clinical reports from the Vir-             annotated with negation cues. In this corpus,
gen del Rocı́o Hospital in Sevilla, Spain. 276          syntactic negation and lexical negation we-
of this clinical documents correspond to ra-            re annotated; morphological negation was ex-
diology reports and 328 to the personal his-            cluded. Annotators were three computational
tory of anamnesis reports written in free text.         linguists annotators, advised by a clinician.
Two domain expert annotators closely follo-                 Annotators did not include the negation
wed the Thyme corpus guidelines (Styler IV              cue nor the subject in its scope as part of an-
et al., 2014), developed for the annotation of          notation, unless the subject was located af-
English clinical record. In the anamnesis re-           ter the verb. This practice seems to be based
ports, 1,079 sentences (35.20 %) were found             on linear order alone, and does not take into
to contain negations out of 3,065 sentences.            account the semantics of scope nor the possi-
On the other hand, 1,219 sentences (22.80 %)            bility of backwards scope (Hoeksema, 2000).
out of 5,347 sentences were annotated with              Additionally, it seems necessary to mark the
negations in the radiology reports. The Dice            negation cue in some manner to signal whe-
coefficient for IAA was higher than 0.94 for            re the negation is coming from in order to
negation markers and higher than 0.72 for               better train algorithms. This aside, the an-
negated events.                                         notation of scope seemed to be quite precise
    In this corpus, all types of negation we-           (for example, annotating scope over verb ph-
re annotated: syntactic, morphological (affi-           rase versus just over adverb), and the project
xal negation), and lexical. Negation was mar-           on the whole was presented very thoroughly.
ked both linguistically and as a modifier of a          The authors note that they did not annotate
disorder of a drug, i.e. whether or not the             certain verbs with negative polarity (desapa-
drug was effective. Similar to (Oronoz et al.,          recer, retirar, suspenderse, eradicar, negar )
2015), full words that expressed negative po-           on the basis that such verbs still denoted fac-
larity were marked in their entirety (afebril           tuality. Such interactions between negation
“afebrile”) rather than just their negative af-         and factuality seems worth while to discuss
fix (a-).                                               for future annotation efforts.
    Either due to the domain of application or              Similar to (Oronoz et al., 2015) and (Cruz
presentation format, the annotation guideli-            et al., 2017), the IULA was biased towards
nes presented for the UHU-HUVR corpus of-               the clinical domain. Thus, teasing apart the
ten seemed unclear. For example, non-clinical           effects of negative affixation (for example, in
experts may have trouble differentiated nega-           the adjective asintomático “asymptomatic”)
tive test results from negative clinical events,        will be necessary for future work to both be
as the annotation scheme does. Additionally,            faithful to linguistic negation yet still express
while some negation affixes are marked as ne-           the desired level of factuality for clinical use.
gating symptoms (a-febril “afebrile”) others
are not, considered positive symptoms unto              4   Discussion and Preliminary
themselves (in-continencia urinaria “urinary                Proposal
incontinence”). Finally, the authors’ treat-            With any linguistic annotation task, striking
ment of coordination as a single unit of ne-            a balance between linguistic precision and an-
gation ought to be revised.                             notation feasibility is an inevitable and essen-
                                                        tial question. For the annotation of negation
3.5   IULA Spanish Clinical Record                      in Spanish, several components of the propo-
The IULA Spanish Clinical Record corpus                 sals discussed above may be combined into
(Marimon, Vivaldi, and Bel, 2017) contains              a set of complex guidelines that is both lin-
300 anonymized clinical records from several            guistically accurate and domain neutral. Here
services of one of the main hospitals in Barce-         I summarize the main components I find to
lona, Spain. The corpus was annotated with              be worth annotating.
negation markers and their scopes with the                 Most basically, the semantics of negation
ultimate goal of extracting factual knowled-            is represented (and ought to be annotated)
ge from textual data; subgoals included auto-           through (i) the identification of the negation
matic encoding of clinical records; diagnosis           cue (the lexical element expressing nega-
support; term extraction; and general study             tion); (ii) its scope (the text section that
                                                   32
is negated); (iii) its focus (that part of the           5   Conclusion
scope that is prominently or explicitly nega-            This paper has presented an analysis and eva-
ted); and, if present, (iv) its reinforcement            luation of existing guidelines for the anno-
(an auxiliary negation or NPI) (Altuna,                  tation of negation in several domain-specific
Minard, and Speranza, 2017). This may                    Spanish corpora. A preliminary proposal is
be understood in an example such as the                  given for how to combine linguistically accu-
following:                                               rate and precise annotation with more practi-
                                                         cal concerns regarding domain of application
   (7) Juan no come [carne] sino verduras.               and ease of annotation. Future work points
                                                         in particular towards refining the subtle mea-
    The negation cue (no) is represented in              ning effects negation can have on words and
bold; the scope (come carne sino verduras)               phrase meaning, as well as its interaction
is in italics; the focus (carner) is in brackets;        with modality for ultimate interpretation of
and the NPI (sino) in bold and italics. Ne-              event factuality.
gation markers that do not carry negative
polarity semantic information (nada más “as             Acknowledgments
soon as”) can be marked as such (for exam-               Many thanks to Elena Herburger, one of my
ple, as  instead of  (Jiménez-              doctoral advisors and an expert on negation.
Zafra et al., 2018b).                                    Thanks as well to Claire Bonial and the team
    Following (Morante, Schrauwen, and Dae-              at Army Research Lab (ARL) for supporting
lemans, 2011), negation cues could be limi-              the work behind this paper.
ted to just adverbs (no, nadie, ninguno, nun-
ca/jamás). However, it seems that annota-               References
ting morphological cues (prefixes such as a-,            Altuna, B., A.-L. Minard, and M. Speranza.
in/im-, de(s)-, anti-) as well as negative po-              2017. The Scope and Focus of Negation:
larity verbs (retirar, deaparecer, suspenderse,             A Complete Annotation Framework for
etc.) is worth while for application to clinical            Italian. In Proceedings of the Workshop
domains.                                                    Computational Semantics Beyond Events
    This could be accomplished with both the                and Roles, pages 34–42.
annotation of the cues themselves and a lin-             Bosque Muñoz, I. and J. Gutiérrez-Rexach.
king to some sort of lexical definition or mo-             2009. Fundamentos de sintaxis formal.
dal effect of the cue, as some combination                 Ediciones Akal.
of (Marimon, Vivaldi, and Bel, 2017) and
(Jiménez-Zafra et al., 2018b) could produ-              Cruz, N., R. Morante, M. J. M. López, J. M.
ce. Figure 2 of (Jiménez-Zafra et al., 2018b)             Vázquez, and C. L. P. Calderón. 2017.
seems adequately suited to capturing the la-               Annotating negation in Spanish clinical
yers of complexity of negation. This, in com-              texts. In Proceedings of the Workshop
bination with the distinction a NegPred (for               Computational Semantics Beyond Events
(1), comer ‘to eat’), Negmarker (for (1),                  and Roles, pages 53–58.
no ‘does not’), and NegPolItem (for (1),                 Española-RAE, R. A. 2010. Nueva gramáti-
sino ‘but’) from (Marimon, Vivaldi, and Bel,               ca de la lengua española. Manual. Madrid:
2017) could provide substantial coverage. It               Espasa.
seems that an additional feature such as [+/-
realis] may be helpful to distinguish levels of          Herburger, E. 2018. What it means to be an
factuality of events in question, as well.                 NPI. Linguistic Variation and Language
                                                           Architecture Workshop, June.
    As a closing point, the Brat annotation
tool (Marimon, Vivaldi, and Bel, 2017) seems             Hoeksema, J. 2000. Negative polarity items:
suitable for any comprehensive annotation                  Triggering, scope and c-command. Nega-
task involving negation. The multi-colored,                tion and polarity, pages 115–146.
layered format is accessible online, facili-
                                                         Horn, L. 1989. A natural history of negation.
tating collaborative annotation efforts and
the potential implementation of pilot anno-              Jiménez-Zafra, S. M., N. P. Cruz-Dı́az,
tation tasks to gauge inter-annotator agree-                R. Morante, and M. T. Martı́n-Valdivia.
ment (IAA) as guidelines are developed.                     2018a. Tarea 1 del Taller NEGES 2018:
                                                    33
  Guı́as de Anotación. In Proceedings of            Taboada, M., C. Anthony, and K. D. Voll.
  NEGES 2018: Workshop on Negation in                  2006. Methods for Creating Semantic
  Spanish, volume 2174, pages 15–21.                   Orientation Dictionaries. In LREC, pages
Jiménez-Zafra, S. M., M. Taulé, M. T.                427–432.
   Martı́n-Valdivia, L. A. Ureña-López, and
   M. A. Martı́. 2018b. Sfu Review SP-NEG:
   a Spanish corpus annotated with negation
   for sentiment analysis. A typology of ne-
   gation patterns. Language Resources and
   Evaluation, 52(2):533–569.
Marcus, M. P., M. A. Marcinkiewicz, and
  B. Santorini. 1993. Building a lar-
  ge annotated corpus of English: The
  Penn Treebank. Computational linguis-
  tics, 19(2):313–330.
Marimon, M., J. Vivaldi, and N. Bel. 2017.
  Annotation of negation in the iula Spa-
  nish clinical record corpus. In Proceedings
  of the Workshop Computational Seman-
  tics Beyond Events and Roles, pages 43–
  52.
Moeschler, J. 2010. Negation, scope and
  the descriptive/metalinguistic distinction.
  Generative Grammar in Geneva, 6:29–48.
Morante, R., S. Schrauwen, and W. Daele-
  mans. 2011. Annotation of negation cues
  and their scope: Guidelines v1. Compu-
  tational linguistics and psycholinguistics
  technical report series, CTRS-003.
Moreno, A., S. López, F. Sánchez, and
  R. Grishman. 2003. Developing a syn-
  tactic annotation scheme and tools for a
  Spanish treebank. In Treebanks. Springer,
  pages 149–163.
Oronoz, M., K. Gojenola, A. Pérez, A. D.
  de Ilarraza, and A. Casillas. 2015. On the
  creation of a clinical gold standard corpus
  in spanish: Mining adverse drug reactions.
  Journal of biomedical informatics, 56:318–
  332.
Sandoval, A. M. and M. G. Salazar. 2013.
  La anotación de la negación en un corpus
  escrito etiquetado sintácticamente anno-
  tation of negation in a written treebank.
  Revista Iberoamericana de Linguistica, 8.
Styler IV, W. F., S. Bethard, S. Finan,
   M. Palmer, S. Pradhan, P. C. de Groen,
   B. Erickson, T. Miller, C. Lin, G. Savova,
   et al. 2014. Temporal annotation in the
   clinical domain. Transactions of the As-
   sociation for Computational Linguistics,
   2:143.
                                                34