Negation Cues Detection Using CRF on Spanish Product Review Texts Detección de Claves de Negación Usando CRF en El Texto de Revisión de Productos en Español Henry Loharja1 , Lluı́s Padró1 , and Jordi Turmo1 1 Universitat Politècnica de Catalunya https://www.upc.edu/ {loharja, padro, turmo}@cs.upc.edu Abstract: This article describes the negation cue detection approach designed and built by UPC’s team participating in NEGES 2018 Workshop on Negation in Span- ish. The approach uses supervised CRFs as the base for training the model with several features engineered to tackle the task of negation cue detection in Spanish. The result is evaluated by the means of precision, recall, and F1 score in order to measure the performance of the approach. The approach was ranked in 1st posi- tion in the official testing results with average precision around 91%, average recall around 82%, and average F1 score around 86%. Keywords: negation cue detection, conditional random field, product review Resumen: Este artı́culo describe el enfoque de detección de claves de negación diseñado y construido por el equipo de la UPC que participa en textit Taller NEGES 2018: Identificación de Claves de Negación. El enfoque usa el CRF su- pervisado como la base para el entrenamiento del modelo con varias caracterı́sticas diseñadas para resolver la tarea de detección de claves de negación en español. El resultado se evalúa mediante el método de precisión, exhaustividad y Valor-F para medir el rendimiento del enfoque. El enfoque se clasificó en primero posición en los resultados de las pruebas oficiales con una media de precisión cerca del 91 %, una media de exhaustividad cerca del 82 % y una media de Valor-F cerca del 86 %. Palabras clave: detección de clave de negación, campo aleatorio condicional, re- visión del producto 1 Introduction we use NLTK1 as the toolkit to build the sys- tem. The result is measured using the widely This paper describes the negation cue detec- used performance measurement of precision, tion model approaches presented by UPC’s recall, and F1 score. team for the NEGES 2018 workshop task 2 (negation cues detection) (Jiménez-Zafra et The article is organized as follows. Sec- al., 2018a). The aim of the task is to au- tion 2 describes the approach used to learn tomatically detect negation cues in product the negation cues detection model. Section 3 review texts in Spanish. To do this, the par- describes the system built based on the ap- ticipants must develop a system able to iden- proach explained in the previous section and tify all the negation cues present in the doc- the details of the implementation. The re- uments. The SFU ReviewSP-NEG corpus sults achieved by our approach are presented (Jiménez-Zafra et al., 2018b) will be used to and briefly analyzed in Section 4. Finally, train and test the systems. The approach we Section 5 gives conclusion about the work develop relies on a supervised learned model that has been done. using Conditional Random Fields (written as CRF in the following contents) as the core with specially engineered features for the de- tection of negation cues in Spanish. The ap- 1 NLTK – the Natural Language Toolkit (Bird, proach is then implemented in Python and Klein, and Loper, 2009) 49 Proceedings of NEGES 2018: Workshop on Negation in Spanish, pages 49-54 Seville, Spain, September, 18, 2018 2 Negation Cues Detection Approach N X Before describing the approach, let us be- max l(λ) = max p(y (i) |x(i) ) λ gin by addressing some definitions. A neg- i=1 ative sentence n is defined as a vector of . words (w1 , w2 , ..., wn ) containing one or more where N is the number of observation se- negation cues, where the latter can be a quences x(i) and label sequences y (i) . word (e.g. no), a morpheme (e.g.in-capaz) Training CRFs might be time-consuming or a multi-word expression (e.g. ya no, to- for some tasks since the time needed for train- davı́a no) which inherently expressing nega- ing depends quadratically on the number of tion. The goal of negation cue detection is to class labels and linearly on the number of predict vector c given the sentence n where training instances and the average sequence c ∈ {1, 0}|n| is a vector of length same with length. However, state-of-the-art solutions the length of n so that ci = 1 if wi is part of use CRF models for many NLP tasks where the negation cue and ci = 0 otherwise. time consumption is still tolerable. It is possible that more than one negation As discussed before, the goal of nega- cue can appear inside a sentence. In Spanish, tion cue detection is to obtain vectors which one of the special characteristic of negation represent the sentence and give information cue is that a cue can consist of more than whether the token or words which are part one word, not necessarily consecutive. This of the sentence is a part of the negation cue special characteristic increases the complex- in a value. Using the knowledge of named ity of detecting whether two words recognized entity recognition, we can infer that nega- as cue are indeed two separated cues or are tion detection is a type of NER in which we actually the same non-contiguous cue. This would like to recognize entities that are parts also makes negation cue detection in Spanish of negation. In other words, we would like a more challenging task compared to nega- to classify whether each words inside a sen- tion cue detection in English because that tence is part of negation cue or not a part of case is scarce in English. it. From this, we define a three-class classi- The approach we use for this work will fication problem for each word which we ob- be one of the state of the art approach: a serve: Begin-Cue(B-C), Inside-Cue(I-C), or CRF based negation detection. We try to re- Out(O). A word classified as Out is not part produce the approach used by previous works of a cue. In order to handle cues which con- (Agarwal and Yu, 2010) which are using CRF sist of more than one word, we give two kind as its base and we use the corpus given by the of classification for the cues which are Begin- task in order to see how the approach perform Cue for the first words that identify that start with the data provided in Spanish. Condi- of a cue and Inside-Cue for the rest of the tional random fields (CRFs) are a type of dis- words of a cue which are not the first word criminative undirected probabilistic graphi- but is still identified as part of the same cue. cal model used for structured prediction (Laf- ferty, McCallum, and Pereira, 2001). The 3 Negation Cues Detection most important feature of a CRF model is System that it can take context into account: the 3.1 Data Preprocessing linear chain CRF predicts sequences of labels For the preliminary, we do some preprocess- for sequences of input samples. Thus, the ing to the data in the corpus provided in or- model does not work with local probabilities der to match the input format of the system like p(yt |xt ) where t is the position of x within we built. The corpus provided by this task is the sequence, instead, it estimates the condi- using CoNLL format. Each line corresponds tional probability of the whole sequence: to a token or word and each annotation is 1 XXK provided in a column with empty lines indi- p(y|x) = exp{ λj fj (x, yt , yt−1 )} cate end of sentence. We produce two set Z(x) t j=1 of data with different format with respect to their usage for each step: The estimation of weights (λj ) for each feature fj is carried out by maximizing the 1. Data format with BIO tagging conditional log likelihood: (Ramshaw and Marcus, 1999) in 50 order to be used as input for training. the frequency from the most to the least, we The annotated token is tagged with ”B- chose the top 25 words with the most fre- C” if it is in the beginning of negation quency. These 25 words became the dictio- cue; tagged with ”I-C” if it is part of nary of negation cues in our baseline system. the negation cue but not the first word We also developed several rules to capture of the cue; and tagged with ”O” if it is the characteristics of negation cues in Span- outside of the cue. One of the examples ish which we have explained before. These of sentence in this format is: rules are used to decide whether more than one cues which appear in a sentence is actu- • El|O coche|O funciona|O ally part of a cue or separated cues. The rule estupendamente|O ,|O checked whether a word is in a list of spe- es|O muy|O manejable|O cial word which we created and then check ,|O por|O cierto|O ,|O if it fulfill condition of having another cues casi|B-C no|I-C consume|O that precede it. Here are the algorithm from gasolina|O algo|O que|O baseline system which describe the rule: para|O mi|O es|O muy|O importante|O .|O if word in DICTIONARY then if word in SPECIAL then 2. Raw data format without any tagging in if exist cue before then order to be used for testing input. word is part of cue • Las ruedas a los 15000 end if kms las tuve que cam- else biar , todas , las cuatro word is new cue , por ser de una marca end if coreana , que no da mucho end if resultado . After having implemented the combina- After the preprocessing is done, the docu- tion of dictionary lookup and rules we de- ments is ready to be used as input for the veloped, we then use the baseline system to next respective steps. This preprocessing tag the documents from development testing part did not alter any important information dataset. We use the result as the prelimi- contained in the data as the purpose is only nary result to be later compared with the to change the format in order to make it eas- result from the system we developed using ier to be used in the following steps. our proposed approach. By doing this, we 3.2 Baseline System could see whether the approach we have can give more advantage compared to using sim- Before implementing the approach we have ple techniques. explained before in the system, we developed a baseline system to be used as starting point 3.3 Learning The Model for and a comparison. Our aim is to see whether Negation Cue Detection using the approach we have will perform better CRF than a baseline approach which used simple The system we built use a toolkit named techniques. To reach this, we use the base- NLTK which is a Python based toolkit for line system as comparison with the system building Python programs to work with hu- we develop using the approach we propose. man language data. NLTK provides easy- The baseline system we developed uses sim- to-use text processing libraries for classifica- ple techniques which are common such as dic- tion, tokenization, stemming, tagging, pars- tionary lookup combined with some rules for ing, and semantic reasoning. One of the mod- detecting negation cues. ules in NLTK is an CRF tagger which can The first thing we did in this baseline was be used for the tagging of text using Python to create a dictionary based on the training CRFSuite 2 as it’s core. This module are what dataset from the corpus. We collected all the we mainly used in our approach for nega- words which are tagged as negation cues from tion detection by adapting a point of view all the documents in the training dataset to- gether with their frequency. After having 2 Python CRFSuite -Python bindings to CRFSuite sorted the collected negation cues based on (Okazaki, 2007) 51 of named entity recognition. There are two 6. HAS CAP: word contains capitalized letter. 7. HAS DASH: word contains dash (-). 8. HAS US: word contains underscore ( ). 9. PUNCTUATION: word contains punc- tuation. 10. SUFn: suffixes in the n character length ranged from two to four. 11. PREFn: prefixes in the n character Figure 1: Flow that describe negation detec- length ranged from two to four. tion approach in the system. 12. 2GRAMBEFORE: bigram of up to 6 main parts in the system we built: Training word before the observed word. and testing. Training is the part in which we 13. 2GRAMAFTER: bigram of up to 1 word use the CRF tagger module in NLTK to train after the observed word. a model by using the training data which we have prepared before. The result of the train- 14. BEFOREPOS: the information of part ing part is a model for detecting negation. of speech of up to 6 word before the ob- The training process will use orthographic served word. feature set which is designed for negation cue 15. AFTERPOS: the information of part of detection and to capture the characteristics speech of up to 1 word after the observed of negation cue in Spanish. The simplest and word. most clear feature set is the vocabulary from the training data. We also include the infor- 16. SPECIAL: word is one of the special mation about part of speech as feature in or- words in the special dictionary. The der to enrich the feature set. Generalizations words we included as special words over how the words written (capitalization, are: ”nada”, ”ni”, ”nunca”, ”ningun”, affixes, etc.) are also important information ”ninguno”, ”ninguna”, ”alguna”, ”ape- that are included as features. The present nas”, ”para nada”, and ”ni siquiera”. approach includes training vocabulary, sev- These words have more tendency to be eral orthographic features based on regular part of negation cue with multiple words. expressions as well as prefixes and suffixes in This feature is included in order to cap- the character length ranged from two to four. ture the characteristic of negation cue To model localization context, neighboring that can consist of more than one words words in the window [-6,1] are also added as which are separated by other non-cue features. This size of window is selected from words in between. several experiment using various window size By using the features mentioned above, to acquire optimum result. We use bigram we do the training using the given data and in the process of including the information CRF module in NLTK to produce the model about localization of six word before and one which can be used to detect the negation word after the word being observed. Here cue in Spanish. The parameters for train- are the complete set of features used in the ing the CRF are the default parameters used training: in NLTK toolkit. This model will be used 1. WORD: the vocabulary of word. as one of the input for the next step which is testing. Testing is the process of detecting 2. POS: the information of part of speech negation from the testing data (data in which of the word. negations are not annotated or raw data) by 3. INIT CAP: word starts with capitaliza- using the model which we get from the train- tion. ing process as the knowledge base. The result of the testing process is an annotated ver- 4. ALPHANUM: word consists of alphanu- sion of testing data in which words in each meric characters. sentence are classified into either part cue or 5. HAS NUM: word contains number. outside of them. The result we obtain after 52 the testing process will be in the format of Domain Precision Recall F1 BIO tagged since this is the format which we Coches 83.33 74.47 78.65 use to represent our data. Related to that, we Hoteles 96.08 80.33 87.5 do some post-processing to change the format Lavadoras 97.3 80 87.81 of the result into the same original format as Moviles 95.1 88.99 91.94 the input (training data). We use the origi- Musica 83.33 96.15 89.28 nal data format of CoNLL and then add the Ordenadores 89.13 78.85 83.68 information of negation cue which we obtain Libros 90.85 89.58 90.21 from the testing. Peliculas 92.93 83.64 88.04 4 Results Micro Average 91.97 84.85 88.14 After finished with the testing process, we will obtain the result of the negation cue de- Table 2: Measurement result of development tection as annotated documents of testing testing using CRF based approach data we provide as input. In order to eval- uate the performance of the approach used which use simple techniques. The result also in the system, we will use recall, precision, gives a fairly high value of performance with and f1 score measurement. We use the eval- most of them reach over 80%. Especially in uation script provided by the organizers to precision, the average reach more than 90%. make sure that our output match the require- This is possible due to a fairly simple task of ment. In the first phase, we use the develop- detecting negation cue detection. Most of the ment testing data which is provided in order cues consist of word such as ”no”,”ni”,”nada” to measure the performance of our system. and several other words which describe nega- We perform the testing on each document tion with little variability of vocabulary. This in the development testing dataset which are leads to a fairly easy detection of cues and divided based on the domain. Each docu- the small number of false positives. On the ment is processed separately and also eval- other hand, the recall have much lower re- uated separately. To give a general view of sult with some reach even lower than 80%. the performance, we also calculate the micro This happens due to the higher number of average of the whole result from the devel- false negatives caused by the difficulty of de- opment testing. Table 1 shows the result of tecting non-contiguous multi-token cues. In baseline system we have obtained using the most of the cases of false negative, our sys- development testing data meanwhile Table 2 tem has difficulties to detect such cases, for shows the result of system based on our pro- example: posed approach using the same development dataset. • No es cosa del paralelo ni del equili- brado. Domain Precision Recall F1 Coches 88.89 85.11 86.96 In the example, no...ni is a negation cue Hoteles 86 70.49 77.48 meanwhile our system recognize them as two Lavadoras 94.74 80 86.75 separated cues. Another kind of false nega- Moviles 94.9 85.32 89.86 tive is the opposite, where two separated cues Musica 79.31 88.46 83.64 is recognized as one cue. Those cases con- Ordenadores 85.71 69.23 76.59 tribute to most of the false negatives in the Libros 88.65 86.81 87.72 development testing result. Peliculas 92.55 79.09 85.29 The official testing result measurement Micro Average 90.06 81.31 85.32 can be observed in Table 3. This result is obtained using the model we have and the official testing dataset provided by the orga- Table 1: Measurement result of development nizers. The evaluation is done directly by testing using baseline system the organizers and we receive the measure- ment result as can be seen in Table 3 after As can be observed from Table 1 and 2, we submit our testing result. the result using our proposed approach gives Based on the evaluation from organizers, better result compared to the baseline system our result is ranked first compared to other 53 Domain Precision Recall F1 AEI/FEDER,UE.) Coches 95.08 85.29 89.92 References Hoteles 94 79.66 86.24 Lavadoras 94.74 78.26 85.72 Agarwal, S. and H. Yu. 2010. Biomedical Moviles 89.8 77.19 83.02 negation scope detection with conditional Musica 92.96 75.86 83.54 random fields. 17:696–701, 11. Ordenadores 91.36 91.36 91.36 Bird, S., E. Klein, and E. Loper. 2009. Libros 84.19 84.52 84.35 Natural language processing with Python: Peliculas 89.68 85.28 87.42 analyzing text with the natural language Average 91.48 82.18 86.45 toolkit. ” O’Reilly Media, Inc.”. Jiménez-Zafra, S. M., N. P. Cruz-Dı́az, Table 3: Measurement result of official test- R. Morante, and M. T. Martı́n-Valdivia. ing 2018a. Resumen de la Tarea 2 del Taller NEGES 2018: Detección de Claves de Ne- gación. In Proceedings of NEGES 2018: participants in the same task. As can be seen Workshop on Negation in Spanish, volume on the table, the official testing result follows 2174, pages 35–41. the same pattern as the development testing result with higher precision and lower recall. Jiménez-Zafra, S. M., M. Taulé, M. T. Even though we can’t observe the cases hap- Martı́n-Valdivia, L. A. Ureña-López, and pening in official testing result, we can in- M. A. Martı́. 2018b. SFU Review SP- fer that similar cases in development testing NEG: a Spanish corpus annotated with probably also can be found by looking at the negation for sentiment analysis. A typol- result. The percentage also have almost sim- ogy of negation patterns. Language Re- ilar value with precision reach around 91%, sources and Evaluation, 52(2):533–569. recall around 82%, and F1 score around 86%. Lafferty, J., A. McCallum, and F. C. Pereira. The average of result in official testing has 2001. Conditional random fields: Proba- slightly lower value compared to the one in bilistic models for segmenting and labeling development testing but the difference is not sequence data. significant. Okazaki, N. 2007. CRFsuite: a fast imple- 5 Conclusion mentation of Conditional Random Fields (CRFs). In this article we have described the approach and system we built for the participation Ramshaw, L. A. and M. P. Marcus. 1999. in NEGES 2018: Workshop on Negation in Text chunking using transformation-based Spanish task 2 of negation cues detection for learning. In Natural language processing Spanish product review texts. Our approach using very large corpora. Springer, pages to detect the negation cues consisted of a su- 157–176. pervised approach combining CRF and sev- eral features for negation cue detection in Spanish for training the model. The model will then be used to classify whether a word in the observed data or testing data is a part of negation cue or not. This approach was ranked in 1st position in the official testing results with average precision around 91%, average recall around 82%, and average F1 score around 86%. Acknowledgements This works has been partially funded by the Spanish Goverment and by the European Union through GRAPHMED project (TIN2016-77820-C3-3-R and 54